parallel group cluster: Topics by Science.gov

Sample records for parallel group cluster

Access and visualization using clusters and other parallel computers

NASA Technical Reports Server (NTRS)

Katz, Daniel S.; Bergou, Attila; Berriman, Bruce; Block, Gary; Collier, Jim; Curkendall, Dave; Good, John; Husman, Laura; Jacob, Joe; Laity, Anastasia;

2003-01-01

JPL's Parallel Applications Technologies Group has been exploring the issues of data access and visualization of very large data sets over the past 10 or so years. this work has used a number of types of parallel computers, and today includes the use of commodity clusters. This talk will highlight some of the applications and tools we have developed, including how they use parallel computing resources, and specifically how we are using modern clusters. Our applications focus on NASA's needs; thus our data sets are usually related to Earth and Space Science, including data delivered from instruments in space, and data produced by telescopes on the ground.

Accessing and visualizing scientific spatiotemporal data

NASA Technical Reports Server (NTRS)

Katz, Daniel S.; Bergou, Attila; Berriman, G. Bruce; Block, Gary L.; Collier, Jim; Curkendall, David W.; Good, John; Husman, Laura; Jacob, Joseph C.; Laity, Anastasia;

2004-01-01

This paper discusses work done by JPL's Parallel Applications Technologies Group in helping scientists access and visualize very large data sets through the use of multiple computing resources, such as parallel supercomputers, clusters, and grids.

A fast parallel clustering algorithm for molecular simulation trajectories.

PubMed

Zhao, Yutong; Sheong, Fu Kit; Sun, Jian; Sander, Pedro; Huang, Xuhui

2013-01-15

We implemented a GPU-powered parallel k-centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
Research on retailer data clustering algorithm based on Spark

NASA Astrophysics Data System (ADS)

Huang, Qiuman; Zhou, Feng

2017-03-01

Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design

PubMed Central

Li, Fan; Gallis, John A.; Prague, Melanie; Murray, David M.

2017-01-01

In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis. PMID:28426295
Review of Recent Methodological Developments in Group-Randomized Trials: Part 1-Design.

PubMed

Turner, Elizabeth L; Li, Fan; Gallis, John A; Prague, Melanie; Murray, David M

2017-06-01

In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis.
West Virginia US Department of Energy experimental program to stimulate competitive research. Section 2: Human resource development; Section 3: Carbon-based structural materials research cluster; Section 3: Data parallel algorithms for scientific computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1994-02-02

This report consists of three separate but related reports. They are (1) Human Resource Development, (2) Carbon-based Structural Materials Research Cluster, and (3) Data Parallel Algorithms for Scientific Computing. To meet the objectives of the Human Resource Development plan, the plan includes K--12 enrichment activities, undergraduate research opportunities for students at the state`s two Historically Black Colleges and Universities, graduate research through cluster assistantships and through a traineeship program targeted specifically to minorities, women and the disabled, and faculty development through participation in research clusters. One research cluster is the chemistry and physics of carbon-based materials. The objective of thismore » cluster is to develop a self-sustaining group of researchers in carbon-based materials research within the institutions of higher education in the state of West Virginia. The projects will involve analysis of cokes, graphites and other carbons in order to understand the properties that provide desirable structural characteristics including resistance to oxidation, levels of anisotropy and structural characteristics of the carbons themselves. In the proposed cluster on parallel algorithms, research by four WVU faculty and three state liberal arts college faculty are: (1) modeling of self-organized critical systems by cellular automata; (2) multiprefix algorithms and fat-free embeddings; (3) offline and online partitioning of data computation; and (4) manipulating and rendering three dimensional objects. This cluster furthers the state Experimental Program to Stimulate Competitive Research plan by building on existing strengths at WVU in parallel algorithms.« less
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm

NASA Astrophysics Data System (ADS)

Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian

2017-03-01

DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
Finite-sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials.

PubMed

Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B

2017-04-01

Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.
Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena

NASA Astrophysics Data System (ADS)

Pankratius, V.; Gowanlock, M.; Blair, D. M.

2015-12-01

Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).
An Island Grouping Genetic Algorithm for Fuzzy Partitioning Problems

PubMed Central

Salcedo-Sanz, S.; Del Ser, J.; Geem, Z. W.

2014-01-01

This paper presents a novel fuzzy clustering technique based on grouping genetic algorithms (GGAs), which are a class of evolutionary algorithms especially modified to tackle grouping problems. Our approach hinges on a GGA devised for fuzzy clustering by means of a novel encoding of individuals (containing elements and clusters sections), a new fitness function (a superior modification of the Davies Bouldin index), specially tailored crossover and mutation operators, and the use of a scheme based on a local search and a parallelization process, inspired from an island-based model of evolution. The overall performance of our approach has been assessed over a number of synthetic and real fuzzy clustering problems with different objective functions and distance measures, from which it is concluded that the proposed approach shows excellent performance in all cases. PMID:24977235
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

NASA Astrophysics Data System (ADS)

Decyk, V. K.; Dauger, D. E.

We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms.

PubMed

Chen, Chunlei; He, Li; Zhang, Huixiang; Zheng, Hao; Wang, Lei

2017-01-01

Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

NASA Astrophysics Data System (ADS)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and Diamantaras, K.: 'Programming and architecture of parallel processing systems', 1st Edition, Eds. Kleidarithmos, 2011 [4] NVIDIA.: 'NVidia CUDA C Programming Guide', version 5.0, NVidia (reference book) [5] Konstantaras, A.: 'Classification of Distinct Seismic Regions and Regional Temporal Modelling of Seismicity in the Vicinity of the Hellenic Seismic Arc', IEEE Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6 (4), pp. 1857-1863, 2013 [6] Konstantaras, A. Varley, M.R.,. Valianatos, F., Collins, G. and Holifield, P.: 'Recognition of electric earthquake precursors using neuro-fuzzy models: methodology and simulation results', Proc. IASTED International Conference on Signal Processing Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 2002, pp 303-308, 2002 [7] Konstantaras, A., Katsifarakis, E., Maravelakis, E., Skounakis, E., Kokkinos, E. and Karapidakis, E.: 'Intelligent Spatial-Clustering of Seismicity in the Vicinity of the Hellenic Seismic Arc', Earth Science Research, vol. 1 (2), pp. 1-10, 2012 [8] Georgoulas, G., Konstantaras, A., Katsifarakis, E., Stylios, C.D., Maravelakis, E. and Vachtsevanos, G.: '"Seismic-Mass" Density-based Algorithm for Spatio-Temporal Clustering', Expert Systems with Applications, vol. 40 (10), pp. 4183-4189, 2013 [9] Konstantaras, A. J.: 'Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters', Earth Science Informatics, 2015 (In Press, see: www.scopus.com) [10] Drakatos, G. and Latoussakis, J.: 'A catalog of aftershock sequences in Greece (1971-1997): Their spatial and temporal characteristics', Journal of Seismology, vol. 5, pp. 137-145, 2001
ParallABEL: an R library for generalized parallelization of genome-wide association studies.

PubMed

Sangket, Unitsa; Mahasirimongkol, Surakameth; Chantratita, Wasun; Tandayya, Pichaya; Aulchenko, Yurii S

2010-04-29

Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL.
Why not make a PC cluster of your own? 5. AppleSeed: A Parallel Macintosh Cluster for Scientific Computing

NASA Astrophysics Data System (ADS)

Decyk, Viktor K.; Dauger, Dean E.

We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
Reducing the Mental Health-Related Stigma of Social Work Students: A Cluster RCT

ERIC Educational Resources Information Center

Rubio-Valera, Maria; Aznar-Lou, Ignacio; Vives-Collet, Mireia; Fernández, Ana; Gil-Girbau, Montserrat; Serrano-Blanco, Antoni

2018-01-01

The aim of this study was to evaluate the impact of a social contact and education intervention to improve attitudes to mental illness in first-year social work students. This was a 3-month cluster randomized controlled trial with two parallel arms: intervention (87) and control group (79). The intervention was a workshop led by an OBERTAMENT…
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms

PubMed Central

He, Li; Zheng, Hao; Wang, Lei

2017-01-01

Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions. PMID:29123546
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.

PubMed

Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B

2011-08-15

Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.

ParallABEL: an R library for generalized parallelization of genome-wide association studies

PubMed Central

2010-01-01

Background Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Results Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. Conclusions Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL. PMID:20429914
STABILITY OF SMALL SELF-INTERSTITIAL CLUSTERS IN TUNGSTEN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Setyawan, Wahyu; Nandipati, Giridhar; Kurtz, Richard J.

2015-12-31

Density functional theory was employed to explore the stability of interstitial clusters in W up to size seven. For each cluster size, the most stable configuration consists of parallel dumbbells. For clusters larger than size three, parallel dumbbells prefer to form in a multilayer fashion, instead of a planar structure. For size-7 clusters, the most stable configuration is a complete octahedron. The binding energy of a [111] dumbbell to the most stable cluster increases with cluster size, namely 2.49, 3.68, 4.76, 4.82, 5.47, and 6.85 eV for clusters of size 1, 2, 3, 4, 5, and 6, respectively. For amore » size-2 cluster, collinear dumbbells are still repulsive at the maximum allowable distance of 13.8 Å (the fifth neighbor along [111]). On the other hand, parallel dumbbells are strongly bound together. Two parallel dumbbells in which the axis-to-axis distance is within a cylindrical radius of 5.2 Å still exhibit a considerable binding of 0.28 eV. The most stable cluster in each size will be used to explore interactions with transmutation products.« less
Parallel k-means++

DOE Office of Scientific and Technical Information (OSTI.GOV)

A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
Reconsidering the "Good Divorce"

PubMed

Amato, Paul R; Kane, Jennifer B; James, Spencer

2011-12-01

This study attempted to assess the notion that a "good divorce" protects children from the potential negative consequences of marital dissolution. A cluster analysis of data on postdivorce parenting from 944 families resulted in three groups: cooperative coparenting, parallel parenting, and single parenting. Children in the cooperative coparenting (good divorce) cluster had the smallest number of behavior problems and the closest ties to their fathers. Nevertheless, children in this cluster did not score significantly better than other children on 10 additional outcomes. These findings provide only modest support for the good divorce hypothesis.
Reconsidering the “Good Divorce”

PubMed Central

Amato, Paul R.; Kane, Jennifer B.; James, Spencer

2011-01-01

This study attempted to assess the notion that a “good divorce” protects children from the potential negative consequences of marital dissolution. A cluster analysis of data on postdivorce parenting from 944 families resulted in three groups: cooperative coparenting, parallel parenting, and single parenting. Children in the cooperative coparenting (good divorce) cluster had the smallest number of behavior problems and the closest ties to their fathers. Nevertheless, children in this cluster did not score significantly better than other children on 10 additional outcomes. These findings provide only modest support for the good divorce hypothesis. PMID:22125355
NAS Requirements Checklist for Job Queuing/Scheduling Software

NASA Technical Reports Server (NTRS)

Jones, James Patton

1996-01-01

The increasing reliability of parallel systems and clusters of computers has resulted in these systems becoming more attractive for true production workloads. Today, the primary obstacle to production use of clusters of computers is the lack of a functional and robust Job Management System for parallel applications. This document provides a checklist of NAS requirements for job queuing and scheduling in order to make most efficient use of parallel systems and clusters for parallel applications. Future requirements are also identified to assist software vendors with design planning.
Institutional Computing Executive Group Review of Multi-programmatic & Institutional Computing, Fiscal Year 2005 and 2006

DOE Office of Scientific and Technical Information (OSTI.GOV)

Langer, S; Rotman, D; Schwegler, E

The Institutional Computing Executive Group (ICEG) review of FY05-06 Multiprogrammatic and Institutional Computing (M and IC) activities is presented in the attached report. In summary, we find that the M and IC staff does an outstanding job of acquiring and supporting a wide range of institutional computing resources to meet the programmatic and scientific goals of LLNL. The responsiveness and high quality of support given to users and the programs investing in M and IC reflects the dedication and skill of the M and IC staff. M and IC has successfully managed serial capacity, parallel capacity, and capability computing resources.more » Serial capacity computing supports a wide range of scientific projects which require access to a few high performance processors within a shared memory computer. Parallel capacity computing supports scientific projects that require a moderate number of processors (up to roughly 1000) on a parallel computer. Capability computing supports parallel jobs that push the limits of simulation science. M and IC has worked closely with Stockpile Stewardship, and together they have made LLNL a premier institution for computational and simulation science. Such a standing is vital to the continued success of laboratory science programs and to the recruitment and retention of top scientists. This report provides recommendations to build on M and IC's accomplishments and improve simulation capabilities at LLNL. We recommend that institution fully fund (1) operation of the atlas cluster purchased in FY06 to support a few large projects; (2) operation of the thunder and zeus clusters to enable 'mid-range' parallel capacity simulations during normal operation and a limited number of large simulations during dedicated application time; (3) operation of the new yana cluster to support a wide range of serial capacity simulations; (4) improvements to the reliability and performance of the Lustre parallel file system; (5) support for the new GDO petabyte-class storage facility on the green network for use in data intensive external collaborations; and (6) continued support for visualization and other methods for analyzing large simulations. We also recommend that M and IC begin planning in FY07 for the next upgrade of its parallel clusters. LLNL investments in M and IC have resulted in a world-class simulation capability leading to innovative science. We thank the LLNL management for its continued support and thank the M and IC staff for its vision and dedicated efforts to make it all happen.« less
Plasma Physics Calculations on a Parallel Macintosh Cluster

NASA Astrophysics Data System (ADS)

Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter

2000-03-01

We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Plasma Physics Calculations on a Parallel Macintosh Cluster

NASA Astrophysics Data System (ADS)

Decyk, Viktor K.; Dauger, Dean E.; Kokelaar, Pieter R.

We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 Mflops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Efficiency of parallel direct optimization

NASA Technical Reports Server (NTRS)

Janies, D. A.; Wheeler, W. C.

2001-01-01

Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
Hydrophobic hydration driven self-assembly of curcumin in water: Similarities to nucleation and growth under large metastability, and an analysis of water dynamics at heterogeneous surfaces

NASA Astrophysics Data System (ADS)

Hazra, Milan Kumar; Roy, Susmita; Bagchi, Biman

2014-11-01

As the beneficial effects of curcumin have often been reported to be limited to its small concentrations, we have undertaken a study to find the aggregation properties of curcumin in water by varying the number of monomers. Our molecular dynamics simulation results show that the equilibrated structure is always an aggregated state with remarkable structural rearrangements as we vary the number of curcumin monomers from 4 to 16 monomers. We find that the curcumin monomers form clusters in a very definite pattern where they tend to aggregate both in parallel and anti-parallel orientation of the phenyl rings, often seen in the formation of β-sheet in proteins. A considerable enhancement in the population of parallel alignments is observed with increasing the system size from 12 to 16 curcumin monomers. Due to the prevalence of such parallel alignment for large system size, a more closely packed cluster is formed with maximum number of hydrophobic contacts. We also follow the pathway of cluster growth, in particular the transition from the initial segregated to the final aggregated state. We find the existence of a metastable structural intermediate involving a number of intermediate-sized clusters dispersed in the solution. We have constructed a free energy landscape of aggregation where the metatsable state has been identified. The course of aggregation bears similarity to nucleation and growth in highly metastable state. The final aggregated form remains stable with the total exclusion of water from its sequestered hydrophobic core. We also investigate water structure near the cluster surface along with their orientation. We find that water molecules form a distorted tetrahedral geometry in the 1st solvation layer of the cluster, interacting rather strongly with the hydrophilic groups at the surface of the curcumin. The dynamics of such quasi-bound water molecules near the surface of curcumin cluster is considerably slower than the bulk signifying a restricted motion as often found in protein hydration layer.
Reconsidering the "Good Divorce"

ERIC Educational Resources Information Center

Amato, Paul R.; Kane, Jennifer B.; James, Spencer

2011-01-01

This study attempted to assess the notion that a "good divorce" protects children from the potential negative consequences of marital dissolution. A cluster analysis of data on postdivorce parenting from 944 families resulted in three groups: cooperative coparenting, parallel parenting, and single parenting. Children in the cooperative coparenting…
Study protocol of Prednisone in episodic Cluster Headache (PredCH): a randomized, double-blind, placebo-controlled parallel group trial to evaluate the efficacy and safety of oral prednisone as an add-on therapy in the prophylactic treatment of episodic cluster headache with verapamil

PubMed Central

2013-01-01

Background Episodic cluster headache (ECH) is a primary headache disorder that severely impairs patient’s quality of life. First-line therapy in the initiation of a prophylactic treatment is verapamil. Due to its delayed onset of efficacy and the necessary slow titration of dosage for tolerability reasons prednisone is frequently added by clinicians to the initial prophylactic treatment of a cluster episode. This treatment strategy is thought to effectively reduce the number and intensity of cluster attacks in the beginning of a cluster episode (before verapamil is effective). This study will assess the efficacy and safety of oral prednisone as an add-on therapy to verapamil and compare it to a monotherapy with verapamil in the initial prophylactic treatment of a cluster episode. Methods and design PredCH is a prospective, randomized, double-blind, placebo-controlled trial with parallel study arms. Eligible patients with episodic cluster headache will be randomized to a treatment intervention with prednisone or a placebo arm. The multi-center trial will be conducted in eight German headache clinics that specialize in the treatment of ECH. Discussion PredCH is designed to assess whether oral prednisone added to first-line agent verapamil helps reduce the number and intensity of cluster attacks in the beginning of a cluster episode as compared to monotherapy with verapamil. Trial registration German Clinical Trials Register DRKS00004716 PMID:23889923
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis.

PubMed

Turner, Elizabeth L; Prague, Melanie; Gallis, John A; Li, Fan; Murray, David M

2017-07-01

In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have updated that review with developments in analysis of the past 13 years, with a companion article to focus on developments in design. We discuss developments in the topics of the earlier review (e.g., methods for parallel-arm GRTs, individually randomized group-treatment trials, and missing data) and in new topics, including methods to account for multiple-level clustering and alternative estimation methods (e.g., augmented generalized estimating equations, targeted maximum likelihood, and quadratic inference functions). In addition, we describe developments in analysis of alternative group designs (including stepped-wedge GRTs, network-randomized trials, and pseudocluster randomized trials), which require clustering to be accounted for in their design and analysis.
Vipie: web pipeline for parallel characterization of viral populations from multiple NGS samples.

PubMed

Lin, Jake; Kramna, Lenka; Autio, Reija; Hyöty, Heikki; Nykter, Matti; Cinek, Ondrej

2017-05-15

Next generation sequencing (NGS) technology allows laboratories to investigate virome composition in clinical and environmental samples in a culture-independent way. There is a need for bioinformatic tools capable of parallel processing of virome sequencing data by exactly identical methods: this is especially important in studies of multifactorial diseases, or in parallel comparison of laboratory protocols. We have developed a web-based application allowing direct upload of sequences from multiple virome samples using custom parameters. The samples are then processed in parallel using an identical protocol, and can be easily reanalyzed. The pipeline performs de-novo assembly, taxonomic classification of viruses as well as sample analyses based on user-defined grouping categories. Tables of virus abundance are produced from cross-validation by remapping the sequencing reads to a union of all observed reference viruses. In addition, read sets and reports are created after processing unmapped reads against known human and bacterial ribosome references. Secured interactive results are dynamically plotted with population and diversity charts, clustered heatmaps and a sortable and searchable abundance table. The Vipie web application is a unique tool for multi-sample metagenomic analysis of viral data, producing searchable hits tables, interactive population maps, alpha diversity measures and clustered heatmaps that are grouped in applicable custom sample categories. Known references such as human genome and bacterial ribosomal genes are optionally removed from unmapped ('dark matter') reads. Secured results are accessible and shareable on modern browsers. Vipie is a freely available web-based tool whose code is open source.
Statistical Analysis of NAS Parallel Benchmarks and LINPACK Results

NASA Technical Reports Server (NTRS)

Meuer, Hans-Werner; Simon, Horst D.; Strohmeier, Erich; Lasinski, T. A. (Technical Monitor)

1994-01-01

In the last three years extensive performance data have been reported for parallel machines both based on the NAS Parallel Benchmarks, and on LINPACK. In this study we have used the reported benchmark results and performed a number of statistical experiments using factor, cluster, and regression analyses. In addition to the performance results of LINPACK and the eight NAS parallel benchmarks, we have also included peak performance of the machine, and the LINPACK n and n(sub 1/2) values. Some of the results and observations can be summarized as follows: 1) All benchmarks are strongly correlated with peak performance. 2) LINPACK and EP have each a unique signature. 3) The remaining NPB can grouped into three groups as follows: (CG and IS), (LU and SP), and (MG, FT, and BT). Hence three (or four with EP) benchmarks are sufficient to characterize the overall NPB performance. Our poster presentation will follow a standard poster format, and will present the data of our statistical analysis in detail.
Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations.

DTIC Science & Technology

1994-11-01

inside wind musical instruments. Typical simulations achieve $80\\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed...TERMS AI, MIT, Artificial Intelligence, Distributed Computing, Workstation Cluster, Network, Fluid Dynamics, Musical Instruments 17. SECURITY...for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP
Feature Clustering for Accelerating Parallel Coordinate Descent

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh

2012-12-06

We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

Hubble's View of Little Blue Dots

NASA Astrophysics Data System (ADS)

Kohler, Susanna

2018-02-01

The recent discovery of a new type of tiny, star-forming galaxy is the latest in a zoo of detections shedding light on our early universe. What can we learn from the unique little blue dots found in archival Hubble data?Peas, Berries, and DotsGreen pea galaxies identified by citizen scientists with Galaxy Zoo. [Richard Nowell Carolin Cardamone]As telescope capabilities improve and we develop increasingly deeper large-scale surveys of our universe, we continue to learn more about small, faraway galaxies. In recent years, increasing sensitivity first enabled the detection of green peas luminous, compact, low-mass (10 billion solar masses; compare this to the Milky Ways 1 trillion solar masses!) galaxies with high rates of star formation.Not long thereafter, we discovered galaxies that form stars similarly rapidly, but are even smaller only 330 million solar masses, spanning less than 3,000 light-years in size. These tiny powerhouses were termed blueberries for their distinctive color.Now, scientists Debra and Bruce Elmegreen (of Vassar College and IBM Research Division, respectively) report the discovery of galaxies that have even higher star formation rates and even lower masses: little blue dots.Exploring Tiny Star FactoriesThe Elmegreens discovered these unique galaxies by exploring archival Hubble data. The Hubble Frontier Fields data consist of deep images of six distant galaxy clusters and the parallel fields next to them. It was in the archival data for two Frontier Field Parallels, those for clusters Abell 2744 and MAS J0416.1-2403, that the authors noticed several galaxies that stand out as tiny, bright, blue objects that are nearly point sources.Top: a few examples of the little blue dots recently identified in two Hubble Frontier Field Parallels. Bottom: stacked images for three different groups of little blue dots. [Elmegreen Elmegreen 2017]The authors performed a search through the two Frontier Field Parallels, discovering a total of 55 little blue dots with masses spanning 105.8107.4solar masses, specific star formation rates of 10-7.4, and redshifts of 0.5 z 5.4.Exploring these little blue dots, the Elmegreens find that the galaxies sizes tend to be just a few hundred light-years across. They are gas-dominated; gas currently outweighs stars in these galaxies by perhaps a factor of five. Impressively, based on the incredibly high specific star formation rates observed in these little blue dots, they appear to have formed all of their stars in the last 1% of the age of the universe for them.An Origin for Globulars?Log-log plot of star formation rate vs. mass for the three main groups of little blue dots (red, green, and blue markers), a fourth group of candidates with different properties (brown markers), and previously discovered local blueberry galaxies. The three main groups of little blue dots appear to be low-mass analogs of blueberries. [Elmegreen Elmegreen 2017]Intriguingly, this rapid star formation might be the key to answering a long-standing question: where do globular clusters come from? The Elmegreens propose that little blue dots might actually be an explanation for the origin of these orbiting, spherical, low-metallicity clusters of stars.The authors demonstrate that, if the current star formation rates observed in little blue dots were to persist for another 50 Myr before feedback or gas exhaustion halted star production, the little blue dots could form enough stars to create clusters of roughly a million solar masses which is large enough to explain the globular clusters we observe today.If little blue dots indeed rapidly produced such star clusters in the past, the clusters could later be absorbed into the halos of todays spiral and elliptical galaxies, appearing to us as the low-metallicity globular clusters that orbit large galaxies today.CitationDebra Meloy Elmegreen and Bruce G. Elmegreen 2017 ApJL 851 L44. doi:10.3847/2041-8213/aaa0ce
Aggregation and Gelation of Aromatic Polyamides with Parallel and Anti-parallel Alignment of Molecular Dipole Along the Backbone

NASA Astrophysics Data System (ADS)

Zhu, Dan; Shang, Jing; Ye, Xiaodong; Shen, Jian

2016-12-01

The understanding of macromolecular structures and interactions is important but difficult, due to the facts that a macromolecules are of versatile conformations and aggregate states, which vary with environmental conditions and histories. In this work two polyamides with parallel or anti-parallel dipoles along the linear backbone, named as ABAB (parallel) and AABB (anti-parallel) have been studied. By using a combination of methods, the phase behaviors of the polymers during the aggregate and gelation, i.e., the forming or dissociation processes of nuclei and fibril, cluster of fibrils, and cluster-cluster aggregation have been revealed. Such abundant phase behaviors are dominated by the inter-chain interactions, including dispersion, polarity and hydrogen bonding, and correlatd with the solubility parameters of solvents, the temperature, and the polymer concentration. The results of X-ray diffraction and fast-mode dielectric relaxation indicate that AABB possesses more rigid conformation than ABAB, and because of that AABB aggregates are of long fibers while ABAB is of hairy fibril clusters, the gelation concentration in toluene is 1 w/v% for AABB, lower than the 3 w/v% for ABAB.
Special Session 2: Cosmic Evolution of Groups and Clusters

NASA Astrophysics Data System (ADS)

Vrtilek, J. M.; David, L. P.

2015-03-01

During the past decade observations across the electromagnetic spectrum have led to broad progress in the understanding of galaxy clusters and their far more abundant smaller siblings, groups. From the X-rays, where Chandra and XMM have illuminated old phenomena such as cooling cores and discovered new ones such as shocks, cold fronts, bubbles and cavities, through rich collections of optical data (including vast and growing arrays of redshifts), to the imaging of AGN outbursts of various ages through radio observations, our access to cluster and group measurements has leaped forward, while parallel advances in theory and modeling have kept pace. This Special Session offered a survey of progress to this point, an assessment of outstanding problems, and a multiwavelength overview of the uses of the next generation of observatories. Holding the symposium in conjuction with the XXVIIIth General Assembly provided the significant advantage of involving not only a specialist audience, but also interacting with a broad cross-section of the world astronomical community.
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Rizk, Yehia M.

1999-01-01

The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
A flexible algorithm for calculating pair interactions on SIMD architectures

NASA Astrophysics Data System (ADS)

Páll, Szilárd; Hess, Berk

2013-12-01

Calculating interactions or correlations between pairs of particles is typically the most time-consuming task in particle simulation or correlation analysis. Straightforward implementations using a double loop over particle pairs have traditionally worked well, especially since compilers usually do a good job of unrolling the inner loop. In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD) parallelization has become essential. Avoiding memory bottlenecks is also increasingly important and requires reducing the ratio of memory to arithmetic operations. Moreover, when pairs only interact within a certain cut-off distance, good SIMD utilization can only be achieved by reordering input and output data, which quickly becomes a limiting factor. Here we present an algorithm for SIMD parallelization based on grouping a fixed number of particles, e.g. 2, 4, or 8, into spatial clusters. Calculating all interactions between particles in a pair of such clusters improves data reuse compared to the traditional scheme and results in a more efficient SIMD parallelization. Adjusting the cluster size allows the algorithm to map to SIMD units of various widths. This flexibility not only enables fast and efficient implementation on current CPUs and accelerator architectures like GPUs or Intel MIC, but it also makes the algorithm future-proof. We present the algorithm with an application to molecular dynamics simulations, where we can also make use of the effective buffering the method introduces.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Efficient implementation of parallel three-dimensional FFT on clusters of PCs

NASA Astrophysics Data System (ADS)

Takahashi, Daisuke

2003-05-01

In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Decision making by superimposing information from parallel cognitive channels

NASA Astrophysics Data System (ADS)

Aityan, Sergey K.

1993-08-01

A theory of decision making with perception through parallel information channels is presented. Decision making is considered a parallel competitive process. Every channel can provide confirmation or rejection of a decision concept. Different channels provide different impact on the specific concepts caused by the goals and individual cognitive features. All concepts are divided into semantic clusters due to the goals and the system defaults. The clusters can be alternative or complimentary. The 'winner-take-all' concept nodes firing takes place within the alternative cluster. Concepts can be independently activated in the complimentary cluster. A cognitive channel affects a decision concept by sending an activating or inhibitory signal. The complimentary clusters serve for building up complex concepts by superimposing activation received from various channels. The decision making is provided by the alternative clusters. Every active concept in the alternative cluster tends to suppress the competitive concepts in the cluster by sending inhibitory signals to the other nodes of the cluster. The model accounts for a time delay in signal transmission between the nodes and explains decreasing of the reaction time if information is confirmed by different channels and increasing of the reaction time if deceiving information received from the channels.
Enhancing Self-Determination in Health: Results of an RCT of the Ask Project, a School-Based Intervention for Adolescents with Intellectual Disability

ERIC Educational Resources Information Center

McPherson, Lyn; Ware, Robert S.; Carrington, Suzanne; Lennox, Nicholas

2017-01-01

Background: Adolescents with intellectual disability have high levels of unrecognized disease and inadequate health screening/promotion which might be addressed by improving health advocacy skills. Methods: A parallel-group cluster randomized controlled trial was conducted to investigate whether a health intervention package, consisting of…
On efficiency of fire simulation realization: parallelization with greater number of computational meshes

NASA Astrophysics Data System (ADS)

Valasek, Lukas; Glasa, Jan

2017-12-01

Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
Symmetries and stability of chimera states in small, globally-coupled networks

NASA Astrophysics Data System (ADS)

Hart, Joseph D.; Bansal, Kanika; Murphy, Thomas E.; Roy, Rajarshi

It has recently been demonstrated that symmetries in a network's topology can help predict the patterns of synchronized clusters that can emerge in a network of coupled oscillators. This and related discoveries have led to increased interest in both network symmetries and cluster synchronization. In parallel with these discoveries, interest in chimera states-dynamical patterns in which a network separates into coherent and incoherent portions-has grown, and chimeras have now been observed in a variety of experimental systems. We present an opto-electronic experiment in which both chimera states and synchronized clusters are observed in a small, globally-coupled network. We show that the symmetries and sub-symmetries of the network permit the formation of the chimera and cluster states. A recently developed group theoretical approach enables us to predict the stability of the observed chimera and cluster states, and highlights the close relationship between chimera and cluster states as belonging to the broader phenomenon of partial synchronization.
Two Fe-S clusters catalyse sulfur insertion by Radical-SAM methylthiotransferases

PubMed Central

Forouhar, Farhad; Arragain, Simon; Atta, Mohamed; Gambarelli, Serge; Mouesca, Jean-Marie; Hussain, Munif; Xiao, Rong; Kieffer-Jaquinod, Sylvie; Seetharaman, Jayaraman; Acton, Thomas B.; Montelione, Gaetano T.

2014-01-01

How living organisms create carbon-sulfur bonds during biosynthesis of critical sulphur-containing compounds is still poorly understood. The methylthiotransferases MiaB and RimO catalyze sulfur insertion into tRNAs and ribosomal protein S12, respectively. Both belong to a sub-group of Radical-SAM enzymes that bear two [4Fe-4S] clusters. One cluster binds S-Adenosylmethionine and generates an Ado• radical via a well- established mechanism. However, the precise role of the second cluster is unclear. For some sulfur-inserting Radical-SAM enzymes, this cluster has been proposed to act as a sacrificial source of sulfur for the reaction. In this paper, we report parallel enzymological, spectroscopic and crystallographic investigations of RimO and MiaB, which provide the first evidence that these enzymes are true catalysts and support a new sulfation mechanism involving activation of an exogenous sulfur co-substrate at an exchangeable coordination site on the second cluster, which remains intact during the reaction. PMID:23542644
Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

NASA Astrophysics Data System (ADS)

Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

2012-12-01

The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.
Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
The R package "sperrorest" : Parallelized spatial error estimation and variable importance assessment for geospatial machine learning

NASA Astrophysics Data System (ADS)

Schratz, Patrick; Herrmann, Tobias; Brenning, Alexander

2017-04-01

Computational and statistical prediction methods such as the support vector machine have gained popularity in remote-sensing applications in recent years and are often compared to more traditional approaches like maximum-likelihood classification. However, the accuracy assessment of such predictive models in a spatial context needs to account for the presence of spatial autocorrelation in geospatial data by using spatial cross-validation and bootstrap strategies instead of their now more widely used non-spatial equivalent. The R package sperrorest by A. Brenning [IEEE International Geoscience and Remote Sensing Symposium, 1, 374 (2012)] provides a generic interface for performing (spatial) cross-validation of any statistical or machine-learning technique available in R. Since spatial statistical models as well as flexible machine-learning algorithms can be computationally expensive, parallel computing strategies are required to perform cross-validation efficiently. The most recent major release of sperrorest therefore comes with two new features (aside from improved documentation): The first one is the parallelized version of sperrorest(), parsperrorest(). This function features two parallel modes to greatly speed up cross-validation runs. Both parallel modes are platform independent and provide progress information. par.mode = 1 relies on the pbapply package and calls interactively (depending on the platform) parallel::mclapply() or parallel::parApply() in the background. While forking is used on Unix-Systems, Windows systems use a cluster approach for parallel execution. par.mode = 2 uses the foreach package to perform parallelization. This method uses a different way of cluster parallelization than the parallel package does. In summary, the robustness of parsperrorest() is increased with the implementation of two independent parallel modes. A new way of partitioning the data in sperrorest is provided by partition.factor.cv(). This function gives the user the possibility to perform cross-validation at the level of some grouping structure. As an example, in remote sensing of agricultural land uses, pixels from the same field contain nearly identical information and will thus be jointly placed in either the test set or the training set. Other spatial sampling resampling strategies are already available and can be extended by the user.
Psychoeducational Intervention for Symptom Management of Fatigue, Pain, and Sleep Disturbance Cluster Among Cancer Patients: A Pilot Quasi-Experimental Study.

PubMed

Nguyen, Ly Thuy; Alexander, Kimberly; Yates, Patsy

2018-06-01

To assess the feasibility of conducting a trial of a psychoeducational intervention involving the provision of tailored information and coaching to improve management of a cancer-related symptom cluster (fatigue, pain, and sleep disturbance) and reduce symptom cluster impacts on patient health outcomes in the Vietnamese context and to undertake a preliminary evaluation of the intervention. A parallel-group single-blind pilot quasi-experimental trial was conducted with 102 cancer patients in one Vietnamese hospital. The intervention group received one face-to-face session and two phone sessions delivered by a nurse one week apart, and the comparison group received usual care. Patient outcomes were measured at baseline before the chemotherapy cycle and immediately preceding the next chemotherapy cycle. Separate linear mixed models were used to evaluate the impact of the intervention on total symptom cluster severity, symptom scores, functional status, depressive symptoms, and health-related quality of life. The study design was feasible with a recruitment rate of 22.6% and attrition rate of 9.8%. Compared to the control group, the intervention group showed a significant reduction in symptom cluster severity, fatigue severity, fatigue interference, sleep disturbance, depression, and anxiety. Significant differences were not observed for pain severity, pain interference, functional status, and health-related quality of life. The intervention was acceptable to the study population, with a high attendance rate of 78% and adherence rate of 95.7%. On the basis of the present study findings, future randomized controlled trials are needed to test the effectiveness of a symptom cluster psychoeducational intervention in Vietnam. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Tile-based Level of Detail for the Parallel Age

DOE Office of Scientific and Technical Information (OSTI.GOV)

Niski, K; Cohen, J D

Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
Analysis of ground-motion simulation big data

NASA Astrophysics Data System (ADS)

Maeda, T.; Fujiwara, H.

2016-12-01

We developed a parallel distributed processing system which applies a big data analysis to the large-scale ground motion simulation data. The system uses ground-motion index values and earthquake scenario parameters as input. We used peak ground velocity value and velocity response spectra as the ground-motion index. The ground-motion index values are calculated from our simulation data. We used simulated long-period ground motion waveforms at about 80,000 meshes calculated by a three dimensional finite difference method based on 369 earthquake scenarios of a great earthquake in the Nankai Trough. These scenarios were constructed by considering the uncertainty of source model parameters such as source area, rupture starting point, asperity location, rupture velocity, fmax and slip function. We used these parameters as the earthquake scenario parameter. The system firstly carries out the clustering of the earthquake scenario in each mesh by the k-means method. The number of clusters is determined in advance using a hierarchical clustering by the Ward's method. The scenario clustering results are converted to the 1-D feature vector. The dimension of the feature vector is the number of scenario combination. If two scenarios belong to the same cluster the component of the feature vector is 1, and otherwise the component is 0. The feature vector shows a `response' of mesh to the assumed earthquake scenario group. Next, the system performs the clustering of the mesh by k-means method using the feature vector of each mesh previously obtained. Here the number of clusters is arbitrarily given. The clustering of scenarios and meshes are performed by parallel distributed processing with Hadoop and Spark, respectively. In this study, we divided the meshes into 20 clusters. The meshes in each cluster are geometrically concentrated. Thus this system can extract regions, in which the meshes have similar `response', as clusters. For each cluster, it is possible to determine particular scenario parameters which characterize the cluster. In other word, by utilizing this system, we can obtain critical scenario parameters of the ground-motion simulation for each evaluation point objectively. This research was supported by CREST, JST.
Accelerating semantic graph databases on commodity clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Castellana, Vito G.; Haglin, David J.

We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

PubMed Central

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Cluster Analysis of Time-Dependent Crystallographic Data: Direct Identification of Time-Independent Structural Intermediates

PubMed Central

Kostov, Konstantin S.; Moffat, Keith

2011-01-01

The initial output of a time-resolved macromolecular crystallography experiment is a time-dependent series of difference electron density maps that displays the time-dependent changes in underlying structure as a reaction progresses. The goal is to interpret such data in terms of a small number of crystallographically refinable, time-independent structures, each associated with a reaction intermediate; to establish the pathways and rate coefficients by which these intermediates interconvert; and thereby to elucidate a chemical kinetic mechanism. One strategy toward achieving this goal is to use cluster analysis, a statistical method that groups objects based on their similarity. If the difference electron density at a particular voxel in the time-dependent difference electron density (TDED) maps is sensitive to the presence of one and only one intermediate, then its temporal evolution will exactly parallel the concentration profile of that intermediate with time. The rationale is therefore to cluster voxels with respect to the shapes of their TDEDs, so that each group or cluster of voxels corresponds to one structural intermediate. Clusters of voxels whose TDEDs reflect the presence of two or more specific intermediates can also be identified. From such groupings one can then infer the number of intermediates, obtain their time-independent difference density characteristics, and refine the structure of each intermediate. We review the principles of cluster analysis and clustering algorithms in a crystallographic context, and describe the application of the method to simulated and experimental time-resolved crystallographic data for the photocycle of photoactive yellow protein. PMID:21244840
Scalable Static and Dynamic Community Detection Using Grappolo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman

Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less
High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patlolla, Dilip R; Surendran Nair, Sujithkumar; Graves, Daniel A.

For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, themore » estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less
Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.

PubMed

Hoffmann, Thomas J

2011-03-01

It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.
A Behavior-Based Intervention That Prevents Sexual Assault: the Results of a Matched-Pairs, Cluster-Randomized Study in Nairobi, Kenya.

PubMed

Baiocchi, Michael; Omondi, Benjamin; Langat, Nickson; Boothroyd, Derek B; Sinclair, Jake; Pavia, Lee; Mulinge, Munyae; Githua, Oscar; Golden, Neville H; Sarnquist, Clea

2017-10-01

The study's design was a cluster-randomized, matched-pairs, parallel trial of a behavior-based sexual assault prevention intervention in the informal settlements. The participants were primary school girls aged 10-16. Classroom-based interventions for girls and boys were delivered by instructors from the same settlements, at the same time, over six 2-h sessions. The girls' program had components of empowerment, gender relations, and self-defense. The boys' program promotes healthy gender norms. The control arm of the study received a health and hygiene curriculum. The primary outcome was the rate of sexual assault in the prior 12 months at the cluster level (school level). Secondary outcomes included the generalized self-efficacy scale, the distribution of number of times victims were sexually assaulted in the prior period, skills used, disclosure rates, and distribution of perpetrators. Difference-in-differences estimates are reported with bootstrapped confidence intervals. Fourteen schools with 3147 girls from the intervention group and 14 schools with 2539 girls from the control group were included in the analysis. We estimate a 3.7 % decrease, p = 0.03 and 95 % CI = (0.4, 8.0), in risk of sexual assault in the intervention group due to the intervention (initially 7.3 % at baseline). We estimate an increase in mean generalized self-efficacy score of 0.19 (baseline average 3.1, on a 1-4 scale), p = 0.0004 and 95 % CI = (0.08, 0.39). This innovative intervention that combined parallel training for young adolescent girls and boys in school settings showed significant reduction in the rate of sexual assault among girls in this population.
Enhancing PC Cluster-Based Parallel Branch-and-Bound Algorithms for the Graph Coloring Problem

NASA Astrophysics Data System (ADS)

Taoka, Satoshi; Takafuji, Daisuke; Watanabe, Toshimasa

A branch-and-bound algorithm (BB for short) is the most general technique to deal with various combinatorial optimization problems. Even if it is used, computation time is likely to increase exponentially. So we consider its parallelization to reduce it. It has been reported that the computation time of a parallel BB heavily depends upon node-variable selection strategies. And, in case of a parallel BB, it is also necessary to prevent increase in communication time. So, it is important to pay attention to how many and what kind of nodes are to be transferred (called sending-node selection strategy). In this paper, for the graph coloring problem, we propose some sending-node selection strategies for a parallel BB algorithm by adopting MPI for parallelization and experimentally evaluate how these strategies affect computation time of a parallel BB on a PC cluster network.
Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hirata, So

2003-11-20

We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Embedded cluster metal-polymeric micro interface and process for producing the same

DOEpatents

Menezes, Marlon E.; Birnbaum, Howard K.; Robertson, Ian M.

2002-01-29

A micro interface between a polymeric layer and a metal layer includes isolated clusters of metal partially embedded in the polymeric layer. The exposed portion of the clusters is smaller than embedded portions, so that a cross section, taken parallel to the interface, of an exposed portion of an individual cluster is smaller than a cross section, taken parallel to the interface, of an embedded portion of the individual cluster. At least half, but not all of the height of a preferred spherical cluster is embedded. The metal layer is completed by a continuous layer of metal bonded to the exposed portions of the discontinuous clusters. The micro interface is formed by heating a polymeric layer to a temperature, near its glass transition temperature, sufficient to allow penetration of the layer by metal clusters, after isolated clusters have been deposited on the layer at lower temperatures. The layer is recooled after embedding, and a continuous metal layer is deposited upon the polymeric layer to bond with the discontinuous metal clusters.
Equalizer: a scalable parallel rendering framework.

PubMed

Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato

2009-01-01

Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

PubMed

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Implementation of the force decomposition machine for molecular dynamics simulations.

PubMed

Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka

2012-09-01

We present the design and implementation of the force decomposition machine (FDM), a cluster of personal computers (PCs) that is tailored to running molecular dynamics (MD) simulations using the distributed diagonal force decomposition (DDFD) parallelization method. The cluster interconnect architecture is optimized for the communication pattern of the DDFD method. Our implementation of the FDM relies on standard commodity components even for networking. Although the cluster is meant for DDFD MD simulations, it remains general enough for other parallel computations. An analysis of several MD simulation runs on both the FDM and a standard PC cluster demonstrates that the FDM's interconnect architecture provides a greater performance compared to a more general cluster interconnect. Copyright © 2012 Elsevier Inc. All rights reserved.
The Hubble Space Telescope UV Legacy Survey of Galactic globular clusters - XIII. ACS/WFC parallel-field catalogues

NASA Astrophysics Data System (ADS)

Simioni, M.; Bedin, L. R.; Aparicio, A.; Piotto, G.; Milone, A. P.; Nardiello, D.; Anderson, J.; Bellini, A.; Brown, T. M.; Cassisi, S.; Cunial, A.; Granata, V.; Ortolani, S.; van der Marel, R. P.; Vesperini, E.

2018-05-01

As part of the Hubble Space Telescope UV Legacy Survey of Galactic globular clusters, 110 parallel fields were observed with the Wide Field Channel of the Advanced Camera for Surveys, in the outskirts of 48 globular clusters, plus the open cluster NGC 6791. Totalling about 0.3 deg2 of observed sky, this is the largest homogeneous Hubble Space Telescope photometric survey of Galalctic globular clusters outskirts to date. In particular, two distinct pointings have been obtained for each target on average, all centred at about 6.5 arcmin from the cluster centre, thus covering a mean area of about 23 arcmin2 for each globular cluster. For each field, at least one exposure in both F475W and F814W filters was collected. In this work, we publicly release the astrometric and photometric catalogues and the astrometrized atlases for each of these fields.
The engine design engine. A clustered computer platform for the aerodynamic inverse design and analysis of a full engine

NASA Technical Reports Server (NTRS)

Sanz, J.; Pischel, K.; Hubler, D.

1992-01-01

An application for parallel computation on a combined cluster of powerful workstations and supercomputers was developed. A Parallel Virtual Machine (PVM) is used as message passage language on a macro-tasking parallelization of the Aerodynamic Inverse Design and Analysis for a Full Engine computer code. The heterogeneous nature of the cluster is perfectly handled by the controlling host machine. Communication is established via Ethernet with the TCP/IP protocol over an open network. A reasonable overhead is imposed for internode communication, rendering an efficient utilization of the engaged processors. Perhaps one of the most interesting features of the system is its versatile nature, that permits the usage of the computational resources available that are experiencing less use at a given point in time.
Accessing and Visualizing scientific spatiotemporal data

NASA Technical Reports Server (NTRS)

Katz, Daniel S.; Bergou, Attila; Berriman, Bruce G.; Block, Gary L.; Collier, Jim; Curkendall, David W.; Good, John; Husman, Laura; Jacob, Joseph C.; Laity, Anastasia;

2004-01-01

This paper discusses work done by JPL 's Parallel Applications Technologies Group in helping scientists access and visualize very large data sets through the use of multiple computing resources, such as parallel supercomputers, clusters, and grids These tools do one or more of the following tasks visualize local data sets for local users, visualize local data sets for remote users, and access and visualize remote data sets The tools are used for various types of data, including remotely sensed image data, digital elevation models, astronomical surveys, etc The paper attempts to pull some common elements out of these tools that may be useful for others who have to work with similarly large data sets.

Transformations of the FeS Clusters of the Methylthiotransferases MiaB and RimO, Detected by Direct Electrochemistry

PubMed Central

2016-01-01

The methylthiotransferases (MTTases) represent a subfamily of the S-adenosylmethionine (AdoMet) radical superfamily of enzymes that catalyze the attachment of a methylthioether (-SCH3) moiety on unactivated carbon centers. These enzymes contain two [4Fe-4S] clusters, one of which participates in the reductive fragmentation of AdoMet to generate a 5′-deoxyadenosyl 5′-radical and the other of which, termed the auxiliary cluster, is believed to play a central role in constructing the methylthio group and attaching it to the substrate. Because the redox properties of the bound cofactors within the AdoMet radical superfamily are so poorly understood, we have examined two MTTases in parallel, MiaB and RimO, using protein electrochemistry. We resolve the redox potentials of each [4Fe-4S] cluster, show that the auxiliary cluster has a potential higher than that of the AdoMet-binding cluster, and demonstrate that upon incubation of either enzyme with AdoMet, a unique low-potential state of the enzyme emerges. Our results are consistent with a mechanism whereby the auxiliary cluster is transiently methylated during substrate methylthiolation. PMID:27598886
Parallel and Scalable Clustering and Classification for Big Data in Geosciences

NASA Astrophysics Data System (ADS)

Riedel, M.

2015-12-01

Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.
Resource Provisioning in SLA-Based Cluster Computing

NASA Astrophysics Data System (ADS)

Xiong, Kaiqi; Suh, Sang

Cluster computing is excellent for parallel computation. It has become increasingly popular. In cluster computing, a service level agreement (SLA) is a set of quality of services (QoS) and a fee agreed between a customer and an application service provider. It plays an important role in an e-business application. An application service provider uses a set of cluster computing resources to support e-business applications subject to an SLA. In this paper, the QoS includes percentile response time and cluster utilization. We present an approach for resource provisioning in such an environment that minimizes the total cost of cluster computing resources used by an application service provider for an e-business application that often requires parallel computation for high service performance, availability, and reliability while satisfying a QoS and a fee negotiated between a customer and the application service provider. Simulation experiments demonstrate the applicability of the approach.
Parallel NGO networks for HIV control: risks and opportunities for NGO contracting.

PubMed

Zaidi, Shehla; Gul, Xaher; Nishtar, Noureen Aleem

2012-12-27

Policy measures for preventive and promotive services are increasingly reliant on contracting of NGOs. Contracting is a neo-liberal response relying on open market competition for service delivery tenders. In contracting of health services a common assumption is a monolithic NGO market. A case study of HIV control in Pakistan shows that in reality the NGO market comprises of parallel NGO networks having widely different service packages, approaches and agendas. These parallel networks had evolved over time due to vertical policy agendas. Contracting of NGOs for provision of HIV services was faced with uneven capacities and turf rivalries across both NGO networks. At the same time contracting helped NGO providers belonging to different clusters to move towards standardized service delivery for HIV prevention. Market based measures such as contracting need to be accompanied with wider policy measures that facilitate in bringing NGOs groups to a shared understanding of health issues and responses.
Conjunction of anti-parallel and component reconnection at the dayside MP: Cluster and Double Star coordinated observation on 6 April 2004

NASA Astrophysics Data System (ADS)

Wang, J.; Pu, Z. Y.; Fu, S. Y.; Wang, X. G.; Xiao, C. J.; Dunlop, M. W.; Wei, Y.; Bogdanova, Y. V.; Zong, Q. G.; Xie, L.

2011-05-01

Previous theoretical and simulation studies have suggested that the anti-parallel and component reconnection can occur simultaneously on the dayside magnetopause. Certain observations have also been reported to support global conjunct pattern of magnetic reconnection. Here, we show direct evidence for the conjunction of anti-parallel and component MR using coordinated observations of Double Star TC-1 and Cluster under the same IMF condition on 6 April, 2004. The global MR X-line configuration constructed is in good agreement with the “S-shape” model.

Scalability and Portability of Two Parallel Implementations of ADI

NASA Technical Reports Server (NTRS)

Phung, Thanh; VanderWijngaart, Rob F.

1994-01-01

Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Immediate versus delayed loading of strategic mini dental implants for the stabilization of partial removable dental prostheses: a patient cluster randomized, parallel-group 3-year trial.

PubMed

Mundt, Torsten; Al Jaghsi, Ahmad; Schwahn, Bernd; Hilgert, Janina; Lucas, Christian; Biffar, Reiner; Schwahn, Christian; Heinemann, Friedhelm

2016-07-30

Acceptable short-term survival rates (>90 %) of mini-implants (diameter < 3.0 mm) are only documented for mandibular overdentures. Sound data for mini-implants as strategic abutments for a better retention of partial removable dental prosthesis (PRDP) are not available. The purpose of this study is to test the hypothesis that immediately loaded mini-implants show more bone loss and less success than strategic mini-implants with delayed loading. In this four-center (one university hospital, three dental practices in Germany), parallel-group, controlled clinical trial, which is cluster randomized on patient level, a total of 80 partially edentulous patients with unfavourable number and distribution of remaining abutment teeth in at least one jaw will receive supplementary min-implants to stabilize their PRDP. The mini-implant are either immediately loaded after implant placement (test group) or delayed after four months (control group). Follow-up of the patients will be performed for 36 months. The primary outcome is the radiographic bone level changes at implants. The secondary outcome is the implant success as a composite variable. Tertiary outcomes include clinical, subjective (quality of life, satisfaction, chewing ability) and dental or technical complications. Strategic implants under an existing PRDP are only documented for standard-diameter implants. Mini-implants could be a minimal invasive and low cost solution for this treatment modality. The trial is registered at Deutsches Register Klinischer Studien (German register of clinical trials) under DRKS-ID: DRKS00007589 ( www.germanctr.de ) on January 13(th), 2015.
A unified framework for building high performance DVEs

NASA Astrophysics Data System (ADS)

Lei, Kaibin; Ma, Zhixia; Xiong, Hua

2011-10-01

A unified framework for integrating PC cluster based parallel rendering with distributed virtual environments (DVEs) is presented in this paper. While various scene graphs have been proposed in DVEs, it is difficult to enable collaboration of different scene graphs. This paper proposes a technique for non-distributed scene graphs with the capability of object and event distribution. With the increase of graphics data, DVEs require more powerful rendering ability. But general scene graphs are inefficient in parallel rendering. The paper also proposes a technique to connect a DVE and a PC cluster based parallel rendering environment. A distributed multi-player video game is developed to show the interaction of different scene graphs and the parallel rendering performance on a large tiled display wall.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

PubMed

He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

2011-12-01

Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Delineation of metabolic gene clusters in plant genomes by chromatin signatures

PubMed Central

Yu, Nan; Nützmann, Hans-Wilhelm; MacDonald, James T.; Moore, Ben; Field, Ben; Berriri, Souha; Trick, Martin; Rosser, Susan J.; Kumar, S. Vinod; Freemont, Paul S.; Osbourn, Anne

2016-01-01

Plants are a tremendous source of diverse chemicals, including many natural product-derived drugs. It has recently become apparent that the genes for the biosynthesis of numerous different types of plant natural products are organized as metabolic gene clusters, thereby unveiling a highly unusual form of plant genome architecture and offering novel avenues for discovery and exploitation of plant specialized metabolism. Here we show that these clustered pathways are characterized by distinct chromatin signatures of histone 3 lysine trimethylation (H3K27me3) and histone 2 variant H2A.Z, associated with cluster repression and activation, respectively, and represent discrete windows of co-regulation in the genome. We further demonstrate that knowledge of these chromatin signatures along with chromatin mutants can be used to mine genomes for cluster discovery. The roles of H3K27me3 and H2A.Z in repression and activation of single genes in plants are well known. However, our discovery of highly localized operon-like co-regulated regions of chromatin modification is unprecedented in plants. Our findings raise intriguing parallels with groups of physically linked multi-gene complexes in animals and with clustered pathways for specialized metabolism in filamentous fungi. PMID:26895889
The method of parallel-hierarchical transformation for rapid recognition of dynamic images using GPGPU technology

NASA Astrophysics Data System (ADS)

Timchenko, Leonid; Yarovyi, Andrii; Kokriatskaya, Nataliya; Nakonechna, Svitlana; Abramenko, Ludmila; Ławicki, Tomasz; Popiel, Piotr; Yesmakhanova, Laura

2016-09-01

The paper presents a method of parallel-hierarchical transformations for rapid recognition of dynamic images using GPU technology. Direct parallel-hierarchical transformations based on cluster CPU-and GPU-oriented hardware platform. Mathematic models of training of the parallel hierarchical (PH) network for the transformation are developed, as well as a training method of the PH network for recognition of dynamic images. This research is most topical for problems on organizing high-performance computations of super large arrays of information designed to implement multi-stage sensing and processing as well as compaction and recognition of data in the informational structures and computer devices. This method has such advantages as high performance through the use of recent advances in parallelization, possibility to work with images of ultra dimension, ease of scaling in case of changing the number of nodes in the cluster, auto scan of local network to detect compute nodes.
fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.

PubMed

Hung, Ling-Hong; Samudrala, Ram

2014-06-15

fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) © The Author 2014. Published by Oxford University Press.
Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units

NASA Astrophysics Data System (ADS)

Corral, Roque; Gisbert, Fernando; Pueblas, Jesus

2017-02-01

The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

PubMed

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Gold atoms and dimers on amorphous SiO(2): calculation of optical properties and cavity ringdown spectroscopy measurements.

PubMed

Del Vitto, Annalisa; Pacchioni, Gianfranco; Lim, Kok Hwa; Rösch, Notker; Antonietti, Jean-Marie; Michalski, Marcin; Heiz, Ulrich; Jones, Harold

2005-10-27

We report on the optical absorption spectra of gold atoms and dimers deposited on amorphous silica in size-selected fashion. Experimental spectra were obtained by cavity ringdown spectroscopy. Issues on soft-landing, fragmentation, and thermal diffusion are discussed on the basis of the experimental results. In parallel, cluster and periodic supercell density functional theory (DFT) calculations were performed to model atoms and dimers trapped on various defect sites of amorphous silica. Optically allowed electronic transitions were calculated, and comparisons with the experimental spectra show that silicon dangling bonds [[triple bond]Si(.-)], nonbridging oxygen [[triple bond]Si-O(.-)], and the silanolate group [[triple bond]Si-O(-)] act as trapping centers for the gold particles. The results are not only important for understanding the chemical bonding of atoms and clusters on oxide surfaces, but they will also be of fundamental interest for photochemical studies of size-selected clusters on surfaces.
Two schemes for rapid generation of digital video holograms using PC cluster

NASA Astrophysics Data System (ADS)

Park, Hanhoon; Song, Joongseok; Kim, Changseob; Park, Jong-Il

2017-12-01

Computer-generated holography (CGH), which is a process of generating digital holograms, is computationally expensive. Recently, several methods/systems of parallelizing the process using graphic processing units (GPUs) have been proposed. Indeed, use of multiple GPUs or a personal computer (PC) cluster (each PC with GPUs) enabled great improvements in the process speed. However, extant literature has less often explored systems involving rapid generation of multiple digital holograms and specialized systems for rapid generation of a digital video hologram. This study proposes a system that uses a PC cluster and is able to more efficiently generate a video hologram. The proposed system is designed to simultaneously generate multiple frames and accelerate the generation by parallelizing the CGH computations across a number of frames, as opposed to separately generating each individual frame while parallelizing the CGH computations within each frame. The proposed system also enables the subprocesses for generating each frame to execute in parallel through multithreading. With these two schemes, the proposed system significantly reduced the data communication time for generating a digital hologram when compared with that of the state-of-the-art system.
3D Kirchhoff depth migration algorithm: A new scalable approach for parallelization on multicore CPU based cluster

NASA Astrophysics Data System (ADS)

Rastogi, Richa; Londhe, Ashutosh; Srivastava, Abhishek; Sirasala, Kirannmayi M.; Khonde, Kiran

2017-03-01

In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Distributed computing for membrane-based modeling of action potential propagation.

PubMed

Porras, D; Rogers, J M; Smith, W M; Pollard, A E

2000-08-01

Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.
Applying the Transtheoretical Model to evaluate the effect of a call-recall program in enhancing Pap smear practice: a cluster randomized trial.

PubMed

Abdullah, Fauziah; Su, Tin Tin

2013-01-01

The objective of this study was to evaluate the effect of a call-recall approach in enhancing Pap smear practice by changes of motivation stage among non-compliant women. A cluster randomized controlled trial with parallel and un-blinded design was conducted between January and November 2010 in 40 public secondary schools in Malaysia among 403 female teachers who never or infrequently attended for a Pap test. A cluster randomization was applied in assigning schools to both groups. An intervention group received an invitation and reminder (call-recall program) for a Pap test (20 schools with 201 participants), while the control group received usual care from the existing cervical screening program (20 schools with 202 participants). Multivariate logistic regression was performed to determine the effect of the intervention program on the action stage (Pap smear uptake) at 24 weeks. In both groups, pre-contemplation stage was found as the highest proportion of changes in stages. At 24 weeks, an intervention group showed two times more in the action stage than control group (adjusted odds ratio 2.44, 95% CI 1.29-4.62). The positive effect of a call-recall approach in motivating women to change the behavior of screening practice should be appreciated by policy makers and health care providers in developing countries as an intervention to enhance Pap smear uptake. Copyright © 2013 Elsevier Inc. All rights reserved.
A parallel-processing approach to computing for the geographic sciences; applications and systems enhancements

USGS Publications Warehouse

Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Liu, Shu-Guang; Nichols, Erin; Haga, Jim; Maddox, Brian; Bilderback, Chris; Feller, Mark; Homer, George

2001-01-01

The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost, personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting information science research into parallel computing systems and applications.
GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems

NASA Astrophysics Data System (ADS)

Fukushige, Toshiyuki; Makino, Junichiro; Kawai, Atsushi

2005-12-01

In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such a configuration is particularly cost-effective in running parallel tree algorithms. Though the use of parallel tree algorithms was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and to optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days.
Evaluation of Tai Chi Yunshou exercises on community-based stroke patients with balance dysfunction: a study protocol of a cluster randomized controlled trial.

PubMed

Tao, Jing; Rao, Ting; Lin, Lili; Liu, Wei; Wu, Zhenkai; Zheng, Guohua; Su, Yusheng; Huang, Jia; Lin, Zhengkun; Wu, Jinsong; Fang, Yunhua; Chen, Lidian

2015-02-25

Balance dysfunction after stroke limits patients' general function and participation in daily life. Previous researches have suggested that Tai Chi exercise could offer a positive improvement in older individuals' balance function and reduce the risk of falls. But convincing evidence for the effectiveness of enhancing balance function after stroke with Tai Chi exercise is still inadequate. Considering the difficulties for stroke patients to complete the whole exercise, the current trial evaluates the benefit of Tai Chi Yunshou exercise for patients with balance dysfunction after stroke through a cluster randomization, parallel-controlled design. A single-blind, cluster-randomized, parallel-controlled trial will be conducted. A total of 10 community health centers (5 per arm) will be selected and randomly allocated into Tai Chi Yunshou exercise group or balance rehabilitation training group. Each community health centers will be asked to enroll 25 eligible patients into the trial. 60 minutes per each session, 1 session per day, 5 times per week and the total training round is 12 weeks. Primary and secondary outcomes will be measured at baseline and 4-weeks, 8-weeks, 12-weeks, 6-week follow-up, 12-week follow-up after randomization. Safety and economic evaluation will also be assessed. This protocol aims to evaluate the effectiveness of Tai Chi Yunshou exercise for the balance function of patients after stroke. If the outcome is positive, this project will provide an appropriate and economic balance rehabilitation technology for community-based stroke patients. Chinese Clinical Trial Registry: ChiCTR-TRC-13003641. Registration date: 22 August, 2013 http://www.chictr.org/usercenter/project/listbycreater.aspx .
Application of microarray analysis on computer cluster and cloud platforms.

PubMed

Bernau, C; Boulesteix, A-L; Knaus, J

2013-01-01

Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Innovative Growth and Defect Analysis of Group III - Nitrides for High Speed Electronics

DTIC Science & Technology

2008-02-29

nitrides have optical transitions from the infrared into the ultra violet and are used for light generation with a luminous flux of approximately 100...exist below the detection limit of X- Ray Diffraction (XRD). It has been shown, that metal clusters could cause resonance in the infrared and effect the...plasmonic (Mie) resonances and the specific interband absorption between the parallel bands in metallic indium [Har66]; the latter starts from 0.6

Chromium: A Stress-Processing Framework for Interactive Rendering on Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humphreys, G,; Houston, M.; Ng, Y.-R.

2002-01-11

We describe Chromium, a system for manipulating streams of graphics API commands on clusters of workstations. Chromium's stream filters can be arranged to create sort-first and sort-last parallel graphics architectures that, in many cases, support the same applications while using only commodity graphics accelerators. In addition, these stream filters can be extended programmatically, allowing the user to customize the stream transformations performed by nodes in a cluster. Because our stream processing mechanism is completely general, any cluster-parallel rendering algorithm can be either implemented on top of or embedded in Chromium. In this paper, we give examples of real-world applications thatmore » use Chromium to achieve good scalability on clusters of workstations, and describe other potential uses of this stream processing technology. By completely abstracting the underlying graphics architecture, network topology, and API command processing semantics, we allow a variety of applications to run in different environments.« less
A Parallel Point Matching Algorithm for Landmark Based Image Registration Using Multicore Platform

PubMed Central

Yang, Lin; Gong, Leiguang; Zhang, Hong; Nosher, John L.; Foran, David J.

2013-01-01

Point matching is crucial for many computer vision applications. Establishing the correspondence between a large number of data points is a computationally intensive process. Some point matching related applications, such as medical image registration, require real time or near real time performance if applied to critical clinical applications like image assisted surgery. In this paper, we report a new multicore platform based parallel algorithm for fast point matching in the context of landmark based medical image registration. We introduced a non-regular data partition algorithm which utilizes the K-means clustering algorithm to group the landmarks based on the number of available processing cores, which optimize the memory usage and data transfer. We have tested our method using the IBM Cell Broadband Engine (Cell/B.E.) platform. The results demonstrated a significant speed up over its sequential implementation. The proposed data partition and parallelization algorithm, though tested only on one multicore platform, is generic by its design. Therefore the parallel algorithm can be extended to other computing platforms, as well as other point matching related applications. PMID:24308014
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
Scalable Unix commands for parallel processors : a high-performance implementation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ong, E.; Lusk, E.; Gropp, W.

2001-06-22

We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.
Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

NASA Astrophysics Data System (ADS)

Georgiev, K.; Zlatev, Z.

2010-11-01

The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

2003-01-01

In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Approximate kernel competitive learning.

PubMed

Wu, Jian-Sheng; Zheng, Wei-Shi; Lai, Jian-Huang

2015-03-01

Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters

PubMed Central

Bajaj, Chandrajit

2009-01-01

Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
Trace Element Study of H Chondrites: Evidence for Meteoroid Streams.

NASA Astrophysics Data System (ADS)

Wolf, Stephen Frederic

1993-01-01

Multivariate statistical analyses, both linear discriminant analysis and logistic regression, of the volatile trace elemental concentrations in H4-6 chondrites reveal compositionally distinguishable subpopulations. Observed difference in volatile trace element composition between Antarctic and non-Antarctic H4-6 chondrites (Lipschutz and Samuels, 1991) can be explained by a compositionaily distinct subpopulation found in Victoria Land, Antarctica. This population of H4-6 chondrites is compositionally distinct from non-Antarctic H4-6 chondrites and from Antarctic H4 -6 chondrites from Queen Maud Land. Comparisons of Queen Maud Land H4-6 chondrites with non-Antarctic H4-6 chondrites do not give reason to believe that these two populations are distinguishable from each other on the basis of the ten volatile trace element concentrations measured. ANOVA indicates that these differences are not the result of trivial causes such as weathering and analytical bias. Thermoluminescence properties of these populations parallels the results of volatile trace element comparisons. Given the differences in terrestrial age between Victoria Land, Queen Maud Land, and modern H4-6 chondrite falls, these results are consistent with a variation in H4-6 chondrite flux on a 300 ky timescale. This conclusion requires the existence of co-orbital meteoroid streams. Statistical analyses of the volatile trace elemental concentrations in non-Antarctic modern falls of H4-6 chondrites also demonstrate that a group of 13 H4-6 chondrites, Cluster 1, selected exclusively for their distinct fall parameters (Dodd, 1992) is compositionally distinguishable from a control group of 45 non-Antarctic modern H4-6 chondrites on the basis of the ten volatile trace element concentrations measured. Model-independent randomization-simulations based on both linear discriminant analysis and logistic regression verify these results. While ANOVA identifies two possible causes for this difference, analytical bias and group classification, a test validation experiment verifies that group classification is the more significant cause of compositional difference between Cluster 1 and non-Cluster 1 modern H4-6 chondrite falls. Thermoluminescence properties of these populations parallels the results of volatile trace element comparisons. This suggests that these meteorites are fragments of a co-orbital meteorite stream derived from a single parent body.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster.

PubMed

Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

2018-04-20

A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
Multirate parallel distributed compensation of a cluster in wireless sensor and actor networks

NASA Astrophysics Data System (ADS)

Yang, Chun-xi; Huang, Ling-yun; Zhang, Hao; Hua, Wang

2016-01-01

The stabilisation problem for one of the clusters with bounded multiple random time delays and packet dropouts in wireless sensor and actor networks is investigated in this paper. A new multirate switching model is constructed to describe the feature of this single input multiple output linear system. According to the difficulty of controller design under multi-constraints in multirate switching model, this model can be converted to a Takagi-Sugeno fuzzy model. By designing a multirate parallel distributed compensation, a sufficient condition is established to ensure this closed-loop fuzzy control system to be globally exponentially stable. The solution of the multirate parallel distributed compensation gains can be obtained by solving an auxiliary convex optimisation problem. Finally, two numerical examples are given to show, compared with solving switching controller, multirate parallel distributed compensation can be obtained easily. Furthermore, it has stronger robust stability than arbitrary switching controller and single-rate parallel distributed compensation under the same conditions.
An automated workflow for parallel processing of large multiview SPIM recordings

PubMed Central

Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel

2016-01-01

Summary: Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. Availability and implementation: The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT. The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows. Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction. Contact: schmied@mpi-cbg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26628585
An automated workflow for parallel processing of large multiview SPIM recordings.

PubMed

Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel

2016-04-01

Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction : schmied@mpi-cbg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Scalability of a Low-Cost Multi-Teraflop Linux Cluster for High-End Classical Atomistic and Quantum Mechanical Simulations

NASA Technical Reports Server (NTRS)

Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash

2003-01-01

Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.
Thermal instability in gravitationally stratified plasmas: implications for multiphase structure in clusters and galaxy haloes

NASA Astrophysics Data System (ADS)

McCourt, Michael; Sharma, Prateek; Quataert, Eliot; Parrish, Ian J.

2012-02-01

We study the interplay among cooling, heating, conduction and magnetic fields in gravitationally stratified plasmas using simplified, plane-parallel numerical simulations. Since the physical heating mechanism remains uncertain in massive haloes such as groups or clusters, we adopt a simple, phenomenological prescription which enforces global thermal equilibrium and prevents a cooling flow. The plasma remains susceptible to local thermal instability, however, and cooling drives an inward flow of material. For physically plausible heating mechanisms in clusters, the thermal stability of the plasma is independent of its convective stability. We find that the ratio of the cooling time-scale to the dynamical time-scale tcool/tff controls the non-linear evolution and saturation of the thermal instability: when tcool/tff≲ 1, the plasma develops extended multiphase structure, whereas when tcool/tff≳ 1 it does not. (In a companion paper, we show that the criterion for thermal instability in a more realistic, spherical potential is somewhat less stringent, tcool/tff≲ 10.) When thermal conduction is anisotropic with respect to the magnetic field, the criterion for multiphase gas is essentially independent of the thermal conductivity of the plasma. Our criterion for local thermal instability to produce multiphase structure is an extension of the cold versus hot accretion modes in galaxy formation that applies at all radii in hot haloes, not just to the virial shock. We show that this criterion is consistent with data on multiphase gas in galaxy groups and clusters; in addition, when tcool/tff≳ 1, the net cooling rate to low temperatures and the mass flux to small radii are suppressed enough relative to models without heating to be qualitatively consistent with star formation rates and X-ray line emission in groups and clusters.
Delineation of metabolic gene clusters in plant genomes by chromatin signatures.

PubMed

Yu, Nan; Nützmann, Hans-Wilhelm; MacDonald, James T; Moore, Ben; Field, Ben; Berriri, Souha; Trick, Martin; Rosser, Susan J; Kumar, S Vinod; Freemont, Paul S; Osbourn, Anne

2016-03-18

Plants are a tremendous source of diverse chemicals, including many natural product-derived drugs. It has recently become apparent that the genes for the biosynthesis of numerous different types of plant natural products are organized as metabolic gene clusters, thereby unveiling a highly unusual form of plant genome architecture and offering novel avenues for discovery and exploitation of plant specialized metabolism. Here we show that these clustered pathways are characterized by distinct chromatin signatures of histone 3 lysine trimethylation (H3K27me3) and histone 2 variant H2A.Z, associated with cluster repression and activation, respectively, and represent discrete windows of co-regulation in the genome. We further demonstrate that knowledge of these chromatin signatures along with chromatin mutants can be used to mine genomes for cluster discovery. The roles of H3K27me3 and H2A.Z in repression and activation of single genes in plants are well known. However, our discovery of highly localized operon-like co-regulated regions of chromatin modification is unprecedented in plants. Our findings raise intriguing parallels with groups of physically linked multi-gene complexes in animals and with clustered pathways for specialized metabolism in filamentous fungi. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Grid Computing Environment using a Beowulf Cluster

NASA Astrophysics Data System (ADS)

Alanis, Fransisco; Mahmood, Akhtar

2003-10-01

Custom-made Beowulf clusters using PCs are currently replacing expensive supercomputers to carry out complex scientific computations. At the University of Texas - Pan American, we built a 8 Gflops Beowulf Cluster for doing HEP research using RedHat Linux 7.3 and the LAM-MPI middleware. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes that were compiled in C on the cluster using the LAM-XMPI graphics user environment. We will demonstrate a "simple" prototype grid environment, where we will submit and run parallel jobs remotely across multiple cluster nodes over the internet from the presentation room at Texas Tech. University. The Sphinx Beowulf Cluster will be used for monte-carlo grid test-bed studies for the LHC-ATLAS high energy physics experiment. Grid is a new IT concept for the next generation of the "Super Internet" for high-performance computing. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
High performance data transfer

NASA Astrophysics Data System (ADS)

Cottrell, R.; Fang, C.; Hanushevsky, A.; Kreuger, W.; Yang, W.

2017-10-01

The exponentially increasing need for high speed data transfer is driven by big data, and cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software. This has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a scalable, balanced, easy to deploy and use way while minimizing power and space utilization. In collaboration with several commercial vendors, Proofs of Concept (PoC) consisting of clusters have been put together using off-the- shelf components to test the ZX scalability and ability to balance services using multiple cores, and links. The PoCs are based on SSD flash storage that is managed by a parallel file system. Each cluster occupies 4 rack units. Using the PoCs, between clusters we have achieved almost 200Gbps memory to memory over two 100Gbps links, and 70Gbps parallel file to parallel file with encryption over a 5000 mile 100Gbps link.
Synchronous parallel spatially resolved stochastic cluster dynamics

DOE PAGES

Dunn, Aaron; Dingreville, Rémi; Martínez, Enrique; ...

2016-04-23

In this work, a spatially resolved stochastic cluster dynamics (SRSCD) model for radiation damage accumulation in metals is implemented using a synchronous parallel kinetic Monte Carlo algorithm. The parallel algorithm is shown to significantly increase the size of representative volumes achievable in SRSCD simulations of radiation damage accumulation. Additionally, weak scaling performance of the method is tested in two cases: (1) an idealized case of Frenkel pair diffusion and annihilation, and (2) a characteristic example problem including defect cluster formation and growth in α-Fe. For the latter case, weak scaling is tested using both Frenkel pair and displacement cascade damage.more » To improve scaling of simulations with cascade damage, an explicit cascade implantation scheme is developed for cases in which fast-moving defects are created in displacement cascades. For the first time, simulation of radiation damage accumulation in nanopolycrystals can be achieved with a three dimensional rendition of the microstructure, allowing demonstration of the effect of grain size on defect accumulation in Frenkel pair-irradiated α-Fe.« less

Parallel family trees for transfer matrices in the Potts model

NASA Astrophysics Data System (ADS)

Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo

2015-02-01

The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster scenario it was in the range p ∈ [ 8 , 10 ] . Because of the parallel capabilities of the algorithm, a large-scale execution of the parallel family trees strategy in a supercomputer could contribute to the study of wider strip lattices.
Automated analysis and reannotation of subcellular locations in confocal images from the Human Protein Atlas.

PubMed

Li, Jieyue; Newberg, Justin Y; Uhlén, Mathias; Lundberg, Emma; Murphy, Robert F

2012-01-01

The Human Protein Atlas contains immunofluorescence images showing subcellular locations for thousands of proteins. These are currently annotated by visual inspection. In this paper, we describe automated approaches to analyze the images and their use to improve annotation. We began by training classifiers to recognize the annotated patterns. By ranking proteins according to the confidence of the classifier, we generated a list of proteins that were strong candidates for reexamination. In parallel, we applied hierarchical clustering to group proteins and identified proteins whose annotations were inconsistent with the remainder of the proteins in their cluster. These proteins were reexamined by the original annotators, and a significant fraction had their annotations changed. The results demonstrate that automated approaches can provide an important complement to visual annotation.
Branching points in the low-temperature dipolar hard sphere fluid

NASA Astrophysics Data System (ADS)

Rovigatti, Lorenzo; Kantorovich, Sofia; Ivanov, Alexey O.; Tavares, José Maria; Sciortino, Francesco

2013-10-01

In this contribution, we investigate the low-temperature, low-density behaviour of dipolar hard-sphere (DHS) particles, i.e., hard spheres with dipoles embedded in their centre. We aim at describing the DHS fluid in terms of a network of chains and rings (the fundamental clusters) held together by branching points (defects) of different nature. We first introduce a systematic way of classifying inter-cluster connections according to their topology, and then employ this classification to analyse the geometric and thermodynamic properties of each class of defects, as extracted from state-of-the-art equilibrium Monte Carlo simulations. By computing the average density and energetic cost of each defect class, we find that the relevant contribution to inter-cluster interactions is indeed provided by (rare) three-way junctions and by four-way junctions arising from parallel or anti-parallel locally linear aggregates. All other (numerous) defects are either intra-cluster or associated to low cluster-cluster interaction energies, suggesting that these defects do not play a significant part in the thermodynamic description of the self-assembly processes of dipolar hard spheres.
Applications of colored petri net and genetic algorithms to cluster tool scheduling

NASA Astrophysics Data System (ADS)

Liu, Tung-Kuan; Kuo, Chih-Jen; Hsiao, Yung-Chin; Tsai, Jinn-Tsong; Chou, Jyh-Horng

2005-12-01

In this paper, we propose a method, which uses Coloured Petri Net (CPN) and genetic algorithm (GA) to obtain an optimal deadlock-free schedule and to solve re-entrant problem for the flexible process of the cluster tool. The process of the cluster tool for producing a wafer usually can be classified into three types: 1) sequential process, 2) parallel process, and 3) sequential parallel process. But these processes are not economical enough to produce a variety of wafers in small volume. Therefore, this paper will propose the flexible process where the operations of fabricating wafers are randomly arranged to achieve the best utilization of the cluster tool. However, the flexible process may have deadlock and re-entrant problems which can be detected by CPN. On the other hand, GAs have been applied to find the optimal schedule for many types of manufacturing processes. Therefore, we successfully integrate CPN and GAs to obtain an optimal schedule with the deadlock and re-entrant problems for the flexible process of the cluster tool.
Icosahedral and decagonal quasicrystals of intermetallic compounds are multiple twins of cubic or orthorhombic crystals composed of very large atomic complexes with icosahedral point-group symmetry in cubic close packing or body-centered packing: Structure of decagonal Al6Pd

PubMed Central

Pauling, Linus

1989-01-01

A doubly icosahedral complex involves roughly spherical clusters of atoms with icosahedral point-group symmetry, which are themselves, in parallel orientation, icosahedrally packed. These complexes may form cubic crystallites; three structures of this sort have been identified. Analysis of electron diffraction photographs of the decagonal quasicrystal Al6Pd has led to its description as involving pentagonal twinning of an orthorhombic crystal with a = 51.6 Å, b = 37.6 Å, and c = 33.24 Å, with about 4202 atoms in the unit, comprising two 1980-atom doubly icosahedral complexes, each involving icosahedral packing of 45 44-atom icosahedral complexes (at 0 0 0 and 1/2 1/2 1/2) and 242 interstitial atoms. The complexes and clusters are oriented with one of their fivefold axes in the c-axis direction. Images PMID:16594092
Icosahedral and decagonal quasicrystals of intermetallic compounds are multiple twins of cubic or orthorhombic crystals composed of very large atomic complexes with icosahedral point-group symmetry in cubic close packing or body-centered packing: Structure of decagonal Al(6)Pd.

PubMed

Pauling, L

1989-12-01

A doubly icosahedral complex involves roughly spherical clusters of atoms with icosahedral point-group symmetry, which are themselves, in parallel orientation, icosahedrally packed. These complexes may form cubic crystallites; three structures of this sort have been identified. Analysis of electron diffraction photographs of the decagonal quasicrystal Al(6)Pd has led to its description as involving pentagonal twinning of an orthorhombic crystal with a = 51.6 A, b = 37.6 A, and c = 33.24 A, with about 4202 atoms in the unit, comprising two 1980-atom doubly icosahedral complexes, each involving icosahedral packing of 45 44-atom icosahedral complexes (at 0 0 0 and 1/2 1/2 1/2) and 242 interstitial atoms. The complexes and clusters are oriented with one of their fivefold axes in the c-axis direction.
Parallel and Portable Monte Carlo Particle Transport

NASA Astrophysics Data System (ADS)

Lee, S. R.; Cummings, J. C.; Nolen, S. D.; Keen, N. D.

1997-08-01

We have developed a multi-group, Monte Carlo neutron transport code in C++ using object-oriented methods and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and α eigenvalues of the neutron transport equation on a rectilinear computational mesh. It is portable to and runs in parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities are discussed, along with physics and performance results for several test problems on a variety of hardware, including all three Accelerated Strategic Computing Initiative (ASCI) platforms. Current parallel performance indicates the ability to compute α-eigenvalues in seconds or minutes rather than days or weeks. Current and future work on the implementation of a general transport physics framework (TPF) is also described. This TPF employs modern C++ programming techniques to provide simplified user interfaces, generic STL-style programming, and compile-time performance optimization. Physics capabilities of the TPF will be extended to include continuous energy treatments, implicit Monte Carlo algorithms, and a variety of convergence acceleration techniques such as importance combing.
Understanding the cluster randomised crossover design: a graphical illustraton of the components of variation and a sample size tutorial.

PubMed

Arnup, Sarah J; McKenzie, Joanne E; Hemming, Karla; Pilcher, David; Forbes, Andrew B

2017-08-15

In a cluster randomised crossover (CRXO) design, a sequence of interventions is assigned to a group, or 'cluster' of individuals. Each cluster receives each intervention in a separate period of time, forming 'cluster-periods'. Sample size calculations for CRXO trials need to account for both the cluster randomisation and crossover aspects of the design. Formulae are available for the two-period, two-intervention, cross-sectional CRXO design, however implementation of these formulae is known to be suboptimal. The aims of this tutorial are to illustrate the intuition behind the design; and provide guidance on performing sample size calculations. Graphical illustrations are used to describe the effect of the cluster randomisation and crossover aspects of the design on the correlation between individual responses in a CRXO trial. Sample size calculations for binary and continuous outcomes are illustrated using parameters estimated from the Australia and New Zealand Intensive Care Society - Adult Patient Database (ANZICS-APD) for patient mortality and length(s) of stay (LOS). The similarity between individual responses in a CRXO trial can be understood in terms of three components of variation: variation in cluster mean response; variation in the cluster-period mean response; and variation between individual responses within a cluster-period; or equivalently in terms of the correlation between individual responses in the same cluster-period (within-cluster within-period correlation, WPC), and between individual responses in the same cluster, but in different periods (within-cluster between-period correlation, BPC). The BPC lies between zero and the WPC. When the WPC and BPC are equal the precision gained by crossover aspect of the CRXO design equals the precision lost by cluster randomisation. When the BPC is zero there is no advantage in a CRXO over a parallel-group cluster randomised trial. Sample size calculations illustrate that small changes in the specification of the WPC or BPC can increase the required number of clusters. By illustrating how the parameters required for sample size calculations arise from the CRXO design and by providing guidance on both how to choose values for the parameters and perform the sample size calculations, the implementation of the sample size formulae for CRXO trials may improve.
Parallelization of MRCI based on hole-particle symmetry.

PubMed

Suo, Bing; Zhai, Gaohong; Wang, Yubin; Wen, Zhenyi; Hu, Xiangqian; Li, Lemin

2005-01-15

The parallel implementation of multireference configuration interaction program based on the hole-particle symmetry is described. The platform to implement the parallelization is an Intel-Architectural cluster consisting of 12 nodes, each of which is equipped with two 2.4-G XEON processors, 3-GB memory, and 36-GB disk, and are connected by a Gigabit Ethernet Switch. The dependence of speedup on molecular symmetries and task granularities is discussed. Test calculations show that the scaling with the number of nodes is about 1.9 (for C1 and Cs), 1.65 (for C2v), and 1.55 (for D2h) when the number of nodes is doubled. The largest calculation performed on this cluster involves 5.6 x 10(8) CSFs.
Reliability Evaluation for Clustered WSNs under Malware Propagation

PubMed Central

Shen, Shigen; Huang, Longjun; Liu, Jianhua; Champion, Adam C.; Yu, Shui; Cao, Qiying

2016-01-01

We consider a clustered wireless sensor network (WSN) under epidemic-malware propagation conditions and solve the problem of how to evaluate its reliability so as to ensure efficient, continuous, and dependable transmission of sensed data from sensor nodes to the sink. Facing the contradiction between malware intention and continuous-time Markov chain (CTMC) randomness, we introduce a strategic game that can predict malware infection in order to model a successful infection as a CTMC state transition. Next, we devise a novel measure to compute the Mean Time to Failure (MTTF) of a sensor node, which represents the reliability of a sensor node continuously performing tasks such as sensing, transmitting, and fusing data. Since clustered WSNs can be regarded as parallel-serial-parallel systems, the reliability of a clustered WSN can be evaluated via classical reliability theory. Numerical results show the influence of parameters such as the true positive rate and the false positive rate on a sensor node’s MTTF. Furthermore, we validate the method of reliability evaluation for a clustered WSN according to the number of sensor nodes in a cluster, the number of clusters in a route, and the number of routes in the WSN. PMID:27294934
Reliability Evaluation for Clustered WSNs under Malware Propagation.

PubMed

Shen, Shigen; Huang, Longjun; Liu, Jianhua; Champion, Adam C; Yu, Shui; Cao, Qiying

2016-06-10

We consider a clustered wireless sensor network (WSN) under epidemic-malware propagation conditions and solve the problem of how to evaluate its reliability so as to ensure efficient, continuous, and dependable transmission of sensed data from sensor nodes to the sink. Facing the contradiction between malware intention and continuous-time Markov chain (CTMC) randomness, we introduce a strategic game that can predict malware infection in order to model a successful infection as a CTMC state transition. Next, we devise a novel measure to compute the Mean Time to Failure (MTTF) of a sensor node, which represents the reliability of a sensor node continuously performing tasks such as sensing, transmitting, and fusing data. Since clustered WSNs can be regarded as parallel-serial-parallel systems, the reliability of a clustered WSN can be evaluated via classical reliability theory. Numerical results show the influence of parameters such as the true positive rate and the false positive rate on a sensor node's MTTF. Furthermore, we validate the method of reliability evaluation for a clustered WSN according to the number of sensor nodes in a cluster, the number of clusters in a route, and the number of routes in the WSN.
The CAnadian NIRISS Unbiased Cluster Survey (CANUCS)

NASA Astrophysics Data System (ADS)

Ravindranath, Swara; NIRISS GTO Team

2017-06-01

CANUCS GTO program is a JWST spectroscopy and imaging survey of five massive galaxy clusters and ten parallel fields using the NIRISS low-resolution grisms, NIRCam imaging and NIRSpec multi-object spectroscopy. The primary goal is to understand the evolution of low mass galaxies across cosmic time. The resolved emission line maps and line ratios for many galaxies, with some at resolution of 100pc via the magnification by gravitational lensing will enable determining the spatial distribution of star formation, dust and metals. Other science goals include the detection and characterization of galaxies within the reionization epoch, using multiply-imaged lensed galaxies to constrain cluster mass distributions and dark matter substructure, and understanding star-formation suppression in the most massive galaxy clusters. In this talk I will describe the science goals of the CANUCS program. The proposed prime and parallel observations will be presented with details of the implementation of the observation strategy using JWST proposal planning tools.
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

PubMed Central

Azad, Ariful; Ouzounis, Christos A; Kyrpides, Nikos C; Buluç, Aydin

2018-01-01

Abstract Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. PMID:29315405
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

DOE PAGES

Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.; ...

2018-01-05

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
Sample size calculations for stepped wedge and cluster randomised trials: a unified approach

PubMed Central

Hemming, Karla; Taljaard, Monica

2016-01-01

Objectives To clarify and illustrate sample size calculations for the cross-sectional stepped wedge cluster randomized trial (SW-CRT) and to present a simple approach for comparing the efficiencies of competing designs within a unified framework. Study Design and Setting We summarize design effects for the SW-CRT, the parallel cluster randomized trial (CRT), and the parallel cluster randomized trial with before and after observations (CRT-BA), assuming cross-sectional samples are selected over time. We present new formulas that enable trialists to determine the required cluster size for a given number of clusters. We illustrate by example how to implement the presented design effects and give practical guidance on the design of stepped wedge studies. Results For a fixed total cluster size, the choice of study design that provides the greatest power depends on the intracluster correlation coefficient (ICC) and the cluster size. When the ICC is small, the CRT tends to be more efficient; when the ICC is large, the SW-CRT tends to be more efficient and can serve as an alternative design when the CRT is an infeasible design. Conclusion Our unified approach allows trialists to easily compare the efficiencies of three competing designs to inform the decision about the most efficient design in a given scenario. PMID:26344808
Effectiveness and micro-costing of the KiVa school-based bullying prevention programme in Wales: study protocol for a pragmatic definitive parallel group cluster randomised controlled trial.

PubMed

Clarkson, Suzy; Axford, Nick; Berry, Vashti; Edwards, Rhiannon Tudor; Bjornstad, Gretchen; Wrigley, Zoe; Charles, Joanna; Hoare, Zoe; Ukoumunne, Obioha C; Matthews, Justin; Hutchings, Judy

2016-02-01

Bullying refers to verbal, physical or psychological aggression repeated over time that is intended to cause harm or distress to the victims who are unable to defend themselves. It is a key public health priority owing to its widespread prevalence in schools and harmful short- and long-term effects on victims' well-being. There is a need to strengthen the evidence base by testing innovative approaches to preventing bullying. KiVa is a school-based bullying prevention programme with universal and indicated elements and an emphasis on changing bystander behaviour. It achieved promising results in a large trial in Finland, and now requires testing in other countries. This paper describes the protocol for a cluster randomised controlled trial (RCT) of KiVa in Wales. The study uses a two-arm waitlist control pragmatic definitive parallel group cluster RCT design with an embedded process evaluation and calculation of unit cost. Participating schools will be randomised a using a 1:1 ratio to KiVa plus usual provision (intervention group) or usual provision only (control group). The trial has one primary outcome, child self-reported victimisation from bullying, dichotomised as 'victimised' (bullied at least twice a month in the last couple of months) versus 'not victimised'. Secondary outcomes are: bullying perpetration; aspects of child social and emotional well-being (including emotional problems, conduct, peer relations, prosocial behaviour); and school attendance. Follow-up is at 12 months post-baseline. Implementation fidelity is measured through teacher-completed lesson records and independent school-wide observation. A micro-costing analysis will determine the costs of implementing KiVa, including recurrent and non-recurrent unit costs. Factors related to the scalability of the programme will be examined in interviews with head teachers and focus groups with key stakeholders in the implementation of school-based bullying interventions. The results from this trial will provide evidence on whether the KiVa programme is transportable from Finland to Wales in terms of effectiveness and implementation. It will provide information about the costs of delivery and generate insights into factors related to the scalability of the programme. Current Controlled Trials ISRCTN23999021 Date 10-6-13.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver

NASA Astrophysics Data System (ADS)

Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre

2014-06-01

This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
Parallel File System I/O Performance Testing On LANL Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wiens, Isaac Christian; Green, Jennifer Kathleen

2016-08-18

These are slides from a presentation on parallel file system I/O performance testing on LANL clusters. I/O is a known bottleneck for HPC applications. Performance optimization of I/O is often required. This summer project entailed integrating IOR under Pavilion and automating the results analysis. The slides cover the following topics: scope of the work, tools utilized, IOR-Pavilion test workflow, build script, IOR parameters, how parameters are passed to IOR, *run_ior: functionality, Python IOR-Output Parser, Splunk data format, Splunk dashboard and features, and future work.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.

Development and Application of a Parallel LCAO Cluster Method

NASA Astrophysics Data System (ADS)

Patton, David C.

1997-08-01

CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
Cross-species amplification of microsatellites reveals incongruence in the molecular variation and taxonomic limits of the Pilosocereus aurisetus group (Cactaceae).

PubMed

Moraes, Evandro M; Perez, Manolo F; Téo, Mariana F; Zappi, Daniela C; Taylor, Nigel P; Machado, Marlon C

2012-09-01

The Pilosocereus aurisetus group contains eight cactus species restricted to xeric habitats in eastern and central Brazil that have an archipelago-like distribution. In this study, 5-11 microsatellite markers previously designed for Pilosocereus machrisii were evaluated for cross-amplification and polymorphisms in ten populations from six species of the P. aurisetus group. The genotypic information was subsequently used to investigate the genetic relationships between the individuals, populations, and species analyzed. Only the Pmac101 locus failed to amplify in all of the six analyzed species, resulting in an 88 % success rate. The number of alleles per polymorphic locus ranged from 2 to 12, and the most successfully amplified loci showed at least one population with a larger number of alleles than were reported in the source species. The population relationships revealed clear genetic clustering in a neighbor-joining tree that was partially incongruent with the taxonomic limits between the P. aurisetus and P. machrisii species, a fact which parallels the problematic taxonomy of the P. aurisetus group. A Bayesian clustering analysis of the individual genotypes confirmed the observed taxonomic incongruence. These microsatellite markers provide a valuable resource for facilitating large-scale genetic studies on population structures, systematics and evolutionary history in this group.
Expressing Parallelism with ROOT

NASA Astrophysics Data System (ADS)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piparo, D.; Tejedor, E.; Guiraud, E.

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
A Parallel Processing Algorithm for Remote Sensing Classification

NASA Technical Reports Server (NTRS)

Gualtieri, J. Anthony

2005-01-01

A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

NASA Astrophysics Data System (ADS)

Amooie, M. A.; Moortgat, J.

2017-12-01

We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
Clustering of haemostatic variables and the effect of high cashew and walnut diets on these variables in metabolic syndrome patients.

PubMed

Pieters, Marlien; Oosthuizen, Welma; Jerling, Johann C; Loots, Du Toit; Mukuddem-Petersen, Janine; Hanekom, Susanna M

2005-09-01

We investigated the effect of a high walnut and cashew diet on haemostatic variables in people with the metabolic syndrome. Factor analysis was used to determine how the haemostatic variables cluster with other components of the metabolic syndrome and multiple regression to determine possible predictors. This randomized, control, parallel, controlled-feeding trial included 68 subjects who complied with the Third National Cholesterol Education Program Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol criteria. After a 3-week run-in following the control diet, subjects were divided into three groups receiving either walnuts or cashews (20 energy%) or a control diet for 8 weeks. The nut intervention had no significant effect on von Willebrand factor antigen, fibrinogen, factor VII coagulant activity, plasminogen activator inhibitor 1 activity, tissue plasminogen activator activity or thrombin activatable fibrinolysis inhibitor. Statistically, fibrinogen clustered with the body-mass-correlates and acute phase response factors, and factor VII coagulant activity clustered with high-density lipoprotein cholesterol (HDL-C). Tissue plasminogen activator activity, plasminogen activator inhibitor 1 activity and von Willebrand factor antigen clustered into a separate endothelial function factor. HDL-C and markers of obesity were the strongest predictors of the haemostatic variables. We conclude that high walnut and cashew diets did not influence haemostatic factors in this group of metabolic syndrome subjects. The HDL-C increase and weight loss may be the main focus of dietary intervention for the metabolic syndrome. Furthermore, diet composition may have only limited effects if weight loss is not achieved.
MOLA: a bootable, self-configuring system for virtual screening using AutoDock4/Vina on computer clusters.

PubMed

Abreu, Rui Mv; Froufe, Hugo Jc; Queiroz, Maria João Rp; Ferreira, Isabel Cfr

2010-10-28

Virtual screening of small molecules using molecular docking has become an important tool in drug discovery. However, large scale virtual screening is time demanding and usually requires dedicated computer clusters. There are a number of software tools that perform virtual screening using AutoDock4 but they require access to dedicated Linux computer clusters. Also no software is available for performing virtual screening with Vina using computer clusters. In this paper we present MOLA, an easy-to-use graphical user interface tool that automates parallel virtual screening using AutoDock4 and/or Vina in bootable non-dedicated computer clusters. MOLA automates several tasks including: ligand preparation, parallel AutoDock4/Vina jobs distribution and result analysis. When the virtual screening project finishes, an open-office spreadsheet file opens with the ligands ranked by binding energy and distance to the active site. All results files can automatically be recorded on an USB-flash drive or on the hard-disk drive using VirtualBox. MOLA works inside a customized Live CD GNU/Linux operating system, developed by us, that bypass the original operating system installed on the computers used in the cluster. This operating system boots from a CD on the master node and then clusters other computers as slave nodes via ethernet connections. MOLA is an ideal virtual screening tool for non-experienced users, with a limited number of multi-platform heterogeneous computers available and no access to dedicated Linux computer clusters. When a virtual screening project finishes, the computers can just be restarted to their original operating system. The originality of MOLA lies on the fact that, any platform-independent computer available can he added to the cluster, without ever using the computer hard-disk drive and without interfering with the installed operating system. With a cluster of 10 processors, and a potential maximum speed-up of 10x, the parallel algorithm of MOLA performed with a speed-up of 8,64× using AutoDock4 and 8,60× using Vina.
Qualitative profiles of disability.

PubMed

Annicchiarico, Roberta; Gibert, Karina; Cortés, Ulises; Campana, Fabio; Caltagirone, Carlo

2004-01-01

This study identified profiles of functional disability (FD) paralleled by increasing levels of disability. We assessed 96 subjects using the World Health Organization Disability Assessment Schedule II (WHODAS II). Clustering Based on Rules (ClBR) (a hybrid technique of Statistics and Artificial Intelligence) was used in the analysis. Four groups of subjects with different profiles of FD were ordered according to an increasing degree of disability: "Low," self-dependent subjects with no physical or emotional problems; "Intermediate I," subjects with low or moderate physical and emotional disability, with high perception of disability; "Intermediate II," subjects with moderate or severe disability concerning only physical problems related to self-dependency, without emotional problems; and "High," subjects with the highest degree of disability, both physical and emotional. The order of the four classes is paralleled by a significant difference (<0.001) in the WHODAS II standardized global score. In this paper, a new ontology for the knowledge of FD, based on the use of ClBR, is proposed. The definition of four classes, qualitatively different and with an increasing degree of FD, helps to appropriately place each patient in a group of individuals with a similar profile of disability and to propose standardized treatments for these groups.
Message Passing and Shared Address Space Parallelism on an SMP Cluster

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
Methods for sample size determination in cluster randomized trials

PubMed Central

Rutterford, Clare; Copas, Andrew; Eldridge, Sandra

2015-01-01

Background: The use of cluster randomized trials (CRTs) is increasing, along with the variety in their design and analysis. The simplest approach for their sample size calculation is to calculate the sample size assuming individual randomization and inflate this by a design effect to account for randomization by cluster. The assumptions of a simple design effect may not always be met; alternative or more complicated approaches are required. Methods: We summarise a wide range of sample size methods available for cluster randomized trials. For those familiar with sample size calculations for individually randomized trials but with less experience in the clustered case, this manuscript provides formulae for a wide range of scenarios with associated explanation and recommendations. For those with more experience, comprehensive summaries are provided that allow quick identification of methods for a given design, outcome and analysis method. Results: We present first those methods applicable to the simplest two-arm, parallel group, completely randomized design followed by methods that incorporate deviations from this design such as: variability in cluster sizes; attrition; non-compliance; or the inclusion of baseline covariates or repeated measures. The paper concludes with methods for alternative designs. Conclusions: There is a large amount of methodology available for sample size calculations in CRTs. This paper gives the most comprehensive description of published methodology for sample size calculation and provides an important resource for those designing these trials. PMID:26174515
A DFT study of the stability of SIAs and small SIA clusters in the vicinity of solute atoms in Fe

NASA Astrophysics Data System (ADS)

Becquart, C. S.; Ngayam Happy, R.; Olsson, P.; Domain, C.

2018-03-01

The energetics, defect volume and magnetic properties of single SIAs and small SIA clusters up to size 6 have been calculated by DFT for different configurations like the parallel 〈110〉 dumbbell, the non parallel 〈110〉 dumbbell and the C15 structure. The most stable configurations of each type have been further analyzed to determine the influence on their stability of various solute atoms (Ti, V, Cr, Mn, Co, Ni, Cu, Mo, W, Pd, Al, Si, P), relevant for steels used under irradiation. The results show that the presence of solute atoms does not change the relative stability order among SIA clusters. The small SIA clusters investigated can bind to both undersized and oversized solutes. Several descriptors have been considered to derive interesting trends from results. It appears that the local atomic volume available for the solute is the main physical quantity governing the binding energy evolution, whatever the solute type (undersized or oversized) and the cluster configuration (size and type).
Acute Whiplash Injury Study (AWIS): a protocol for a cluster randomised pilot and feasibility trial of an Active Behavioural Physiotherapy Intervention in an insurance private setting.

PubMed

Wiangkham, Taweewat; Duda, Joan; Haque, M Sayeed; Price, Jonathan; Rushton, Alison

2016-07-13

Whiplash-associated disorder (WAD) causes substantial social and economic burden internationally. Up to 60% of patients with WAD progress to chronicity. Research therefore needs to focus on effective management in the acute stage to prevent the development of chronicity. Approximately 93% of patients are classified as WADII (neck complaint and musculoskeletal sign(s)), and in the UK, most are managed in the private sector. In our recent systematic review, a combination of active and behavioural physiotherapy was identified as potentially effective in the acute stage. An Active Behavioural Physiotherapy Intervention (ABPI) was developed through combining empirical (modified Delphi study) and theoretical (social cognitive theory focusing on self-efficacy) evidence. This pilot and feasibility trial has been designed to inform the design of an adequately powered definitive randomised controlled trial. Two parallel phases. (1) An external pilot and feasibility cluster randomised double-blind (assessor and participants), parallel two-arm (ABPI vs standard physiotherapy) clinical trial to evaluate procedures and feasibility. Six UK private physiotherapy clinics will be recruited and cluster randomised by a computer-generated randomisation sequence. Sixty participants (30 each arm) will be assessed at recruitment (baseline) and at 3 months postbaseline. The planned primary outcome measure is the neck disability index. (2) An embedded exploratory qualitative study using semistructured indepth interviews (n=3-4 physiotherapists) and a focus group (n=6-8 patients) and entailing the recruitment of purposive samples will explore perceptions of the ABPI. Quantitative data will be analysed descriptively. Qualitative data will be coded and analysed deductively (identify themes) and inductively (identify additional themes). This trial is approved by the University of Birmingham Ethics Committee (ERN_15-0542). ISRCTN84528320. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Jun

Our group has been working with ANL collaborators on the topic bridging the gap between parallel file system and local file system during the course of this project period. We visited Argonne National Lab -- Dr. Robert Ross's group for one week in the past summer 2007. We looked over our current project progress and planned the activities for the incoming years 2008-09. The PI met Dr. Robert Ross several times such as HEC FSIO workshop 08, SC08 and SC10. We explored the opportunities to develop a production system by leveraging our current prototype to (SOGP+PVFS) a new PVFS version.more » We delivered SOGP+PVFS codes to ANL PVFS2 group in 2008.We also talked about exploring a potential project on developing new parallel programming models and runtime systems for data-intensive scalable computing (DISC). The methodology is to evolve MPI towards DISC by incorporating some functions of Google MapReduce parallel programming model. More recently, we are together exploring how to leverage existing works to perform (1) coordination/aggregation of local I/O operations prior to movement over the WAN, (2) efficient bulk data movement over the WAN, (3) latency hiding techniques for latency-intensive operations. Since 2009, we start applying Hadoop/MapReduce to some HEC applications with LANL scientists John Bent and Salman Habib. Another on-going work is to improve checkpoint performance at I/O forwarding Layer for the Road Runner super computer with James Nuetz and Gary Gridder at LANL. Two senior undergraduates from our research group did summer internships about high-performance file and storage system projects in LANL since 2008 for consecutive three years. Both of them are now pursuing Ph.D. degree in our group and will be 4th year in the PhD program in Fall 2011 and go to LANL to advance two above-mentioned works during this winter break. Since 2009, we have been collaborating with several computer scientists (Gary Grider, John bent, Parks Fields, James Nunez, Hsing-Bung Chen, etc) from HPC5 and James Ahrens from Advanced Computing Laboratory in Los Alamos National Laboratory. We hold a weekly conference and/or video meeting on advancing works at two fronts: the hardware/software infrastructure of building large-scale data intensive cluster and research publications. Our group members assist in constructing several onsite LANL data intensive clusters. Two parties have been developing software codes and research papers together using both sides resources.« less
Hierarchical Petascale Simulation Framework For Stress Corrosion Cracking

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grama, Ananth

2013-12-18

A number of major accomplishments resulted from the project. These include: • Data Structures, Algorithms, and Numerical Methods for Reactive Molecular Dynamics. We have developed a range of novel data structures, algorithms, and solvers (amortized ILU, Spike) for use with ReaxFF and charge equilibration. • Parallel Formulations of ReactiveMD (Purdue ReactiveMolecular Dynamics Package, PuReMD, PuReMD-GPU, and PG-PuReMD) for Messaging, GPU, and GPU Cluster Platforms. We have developed efficient serial, parallel (MPI), GPU (Cuda), and GPU Cluster (MPI/Cuda) implementations. Our implementations have been demonstrated to be significantly better than the state of the art, both in terms of performance and scalability.more » • Comprehensive Validation in the Context of Diverse Applications. We have demonstrated the use of our software in diverse systems, including silica-water, silicon-germanium nanorods, and as part of other projects, extended it to applications ranging from explosives (RDX) to lipid bilayers (biomembranes under oxidative stress). • Open Source Software Packages for Reactive Molecular Dynamics. All versions of our soft- ware have been released over the public domain. There are over 100 major research groups worldwide using our software. • Implementation into the Department of Energy LAMMPS Software Package. We have also integrated our software into the Department of Energy LAMMPS software package.« less
Clustering and flow around a sphere moving into a grain cloud.

PubMed

Seguin, A; Lefebvre-Lepot, A; Faure, S; Gondret, P

2016-06-01

A bidimensional simulation of a sphere moving at constant velocity into a cloud of smaller spherical grains far from any boundaries and without gravity is presented with a non-smooth contact dynamics method. A dense granular "cluster" zone builds progressively around the moving sphere until a stationary regime appears with a constant upstream cluster size. The key point is that the upstream cluster size increases with the initial solid fraction [Formula: see text] but the cluster packing fraction takes an about constant value independent of [Formula: see text]. Although the upstream cluster size around the moving sphere diverges when [Formula: see text] approaches a critical value, the drag force exerted by the grains on the sphere does not. The detailed analysis of the local strain rate and local stress fields made in the non-parallel granular flow inside the cluster allows us to extract the local invariants of the two tensors: dilation rate, shear rate, pressure and shear stress. Despite different spatial variations of these invariants, the local friction coefficient μ appears to depend only on the local inertial number I as well as the local solid fraction, which means that a local rheology does exist in the present non-parallel flow. The key point is that the spatial variations of I inside the cluster do not depend on the sphere velocity and explore only a small range around the value one.
Self-Assembly of Parallel Atomic Wires and Periodic Clusters of Silicon on a Vicinal Si(111) Surface

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sekiguchi, Takeharu; Yoshida, Shunji; Itoh, Kohei M.

2005-09-02

Silicon self-assembly at step edges in the initial stage of homoepitaxial growth on a vicinal Si(111) surface is studied by scanning tunneling microscopy. The resulting atomic structures change dramatically from a parallel array of 0.7 nm wide wires to one-dimensionally aligned periodic clusters of diameter {approx}2 nm and periodicity 2.7 nm in the very narrow range of growth temperatures between 400 and 300 deg. C. These nanostructures are expected to play important roles in future developments of silicon quantum computers. Mechanisms leading to such distinct structures are discussed.
Study of phase clustering method for analyzing large volumes of meteorological observation data

NASA Astrophysics Data System (ADS)

Volkov, Yu. V.; Krutikov, V. A.; Botygin, I. A.; Sherstnev, V. S.; Sherstneva, A. I.

2017-11-01

The article describes an iterative parallel phase grouping algorithm for temperature field classification. The algorithm is based on modified method of structure forming by using analytic signal. The developed method allows to solve tasks of climate classification as well as climatic zoning for any time or spatial scale. When used to surface temperature measurement series, the developed algorithm allows to find climatic structures with correlated changes of temperature field, to make conclusion on climate uniformity in a given area and to overview climate changes over time by analyzing offset in type groups. The information on climate type groups specific for selected geographical areas is expanded by genetic scheme of class distribution depending on change in mutual correlation level between ground temperature monthly average.
Changes in Self-Representations Following Psychoanalytic Psychotherapy for Young Adults: A Comparative Typology.

PubMed

Werbart, Andrzej; Brusell, Lars; Iggedal, Rebecka; Lavfors, Kristin; Widholm, Alexander

2016-10-01

Changes in dynamic psychological structures are often a treatment goal in psychotherapy. The present study aimed at creating a typology of self-representations among young women and men in psychoanalytic psychotherapy, to study longitudinal changes in self-representations, and to compare self-representations in the clinical sample with those of a nonclinical group. Twenty-five women and sixteen men were interviewed according to Blatt's Object Relations Inventory pretreatment, at termination, and at a 1.5-year follow-up. In the comparison group, eleven women and nine men were interviewed at baseline, 1.5 years, and three years later. Typologies of the 123 self-descriptions in the clinical group and 60 in the nonclinical group were constructed by means of ideal-type analysis for men and women separately. Clusters of self-representations could be depicted on a two-dimensional matrix with the axes Relatedness-Self-definition and Integration-Nonintegration. In most cases, the self-descriptions changed over time in terms of belonging to different ideal-type clusters. In the clinical group, there was a movement toward increased integration in self-representations, but above all toward a better balance between relatedness and self-definition. The changes continued after termination, paralleled by reduced symptoms, improved functioning, and higher developmental levels of representations. No corresponding tendency could be observed in the nonclinical group.
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience.

PubMed

Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob

2013-01-01

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

PubMed Central

Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob

2013-01-01

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
Predicting stability limits for pure and doped dicationic noble gas clusters undergoing coulomb explosion: A parallel tempering based study.

PubMed

Ghorai, Sankar; Chaudhury, Pinaki

2018-05-30

We have used a replica exchange Monte-Carlo procedure, popularly known as Parallel Tempering, to study the problem of Coulomb explosion in homogeneous Ar and Xe dicationic clusters as well as mixed Ar-Xe dicationic clusters of varying sizes with different degrees of relative composition. All the clusters studied have two units of positive charges. The simulations reveal that in all the cases there is a cutoff size below which the clusters fragment. It is seen that for the case of pure Ar, the value is around 95 while that for Xe it is 55. For the mixed clusters with increasing Xe content, the cutoff limit for suppression of Coulomb explosion gradually decreases from 95 for a pure Ar to 55 for a pure Xe cluster. The hallmark of this study is this smooth progression. All the clusters are simulated using the reliable potential energy surface developed by Gay and Berne (Gay and Berne, Phys. Rev. Lett. 1982, 49, 194). For the hetero clusters, we have also discussed two different ways of charge distribution, that is one in which both positive charges are on two Xe atoms and the other where the two charges are at a Xe atom and at an Ar atom. The fragmentation patterns observed by us are such that single ionic ejections are the favored dissociating pattern. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Job Management Requirements for NAS Parallel Systems and Clusters

NASA Technical Reports Server (NTRS)

Saphir, William; Tanner, Leigh Ann; Traversat, Bernard

1995-01-01

A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
Multi-cluster processor operating only select number of clusters during each phase based on program statistic monitored at predetermined intervals

DOEpatents

Balasubramonian, Rajeev [Sandy, UT; Dwarkadas, Sandhya [Rochester, NY; Albonesi, David [Ithaca, NY

2009-02-10

In a processor having multiple clusters which operate in parallel, the number of clusters in use can be varied dynamically. At the start of each program phase, the configuration option for an interval is run to determine the optimal configuration, which is used until the next phase change is detected. The optimum instruction interval is determined by starting with a minimum interval and doubling it until a low stability factor is reached.
Facilitating arrhythmia simulation: the method of quantitative cellular automata modeling and parallel running

PubMed Central

Zhu, Hao; Sun, Yan; Rajagopal, Gunaretnam; Mondry, Adrian; Dhar, Pawan

2004-01-01

Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods described. PMID:15339335
Visualization of unsteady computational fluid dynamics

NASA Astrophysics Data System (ADS)

Haimes, Robert

1994-11-01

A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.
Visualization of unsteady computational fluid dynamics

NASA Technical Reports Server (NTRS)

Haimes, Robert

1994-01-01

A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.
Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations

NASA Astrophysics Data System (ADS)

Hunsaker, Isaac L.

Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
The Frontier Fields: Survey Design and Initial Results

NASA Astrophysics Data System (ADS)

Lotz, J. M.; Koekemoer, A.; Coe, D.; Grogin, N.; Capak, P.; Mack, J.; Anderson, J.; Avila, R.; Barker, E. A.; Borncamp, D.; Brammer, G.; Durbin, M.; Gunning, H.; Hilbert, B.; Jenkner, H.; Khandrika, H.; Levay, Z.; Lucas, R. A.; MacKenty, J.; Ogaz, S.; Porterfield, B.; Reid, N.; Robberto, M.; Royle, P.; Smith, L. J.; Storrie-Lombardi, L. J.; Sunnquist, B.; Surace, J.; Taylor, D. C.; Williams, R.; Bullock, J.; Dickinson, M.; Finkelstein, S.; Natarajan, P.; Richard, J.; Robertson, B.; Tumlinson, J.; Zitrin, A.; Flanagan, K.; Sembach, K.; Soifer, B. T.; Mountain, M.

2017-03-01

What are the faintest distant galaxies we can see with the Hubble Space Telescope (HST) now, before the launch of the James Webb Space Telescope? This is the challenge taken up by the Frontier Fields, a Director’s discretionary time campaign with HST and the Spitzer Space Telescope to see deeper into the universe than ever before. The Frontier Fields combines the power of HST and Spitzer with the natural gravitational telescopes of massive high-magnification clusters of galaxies to produce the deepest observations of clusters and their lensed galaxies ever obtained. Six clusters—Abell 2744, MACSJ0416.1-2403, MACSJ0717.5+3745, MACSJ1149.5+2223, Abell S1063, and Abell 370—have been targeted by the HST ACS/WFC and WFC3/IR cameras with coordinated parallel fields for over 840 HST orbits. The parallel fields are the second-deepest observations thus far by HST with 5σ point-source depths of ˜29th ABmag. Galaxies behind the clusters experience typical magnification factors of a few, with small regions magnified by factors of 10-100. Therefore, the Frontier Field cluster HST images achieve intrinsic depths of ˜30-33 mag over very small volumes. Spitzer has obtained over 1000 hr of Director’s discretionary imaging of the Frontier Field cluster and parallels in IRAC 3.6 and 4.5 μm bands to 5σ point-source depths of ˜26.5, 26.0 ABmag. We demonstrate the exceptional sensitivity of the HST Frontier Field images to faint high-redshift galaxies, and review the initial results related to the primary science goals.
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.

A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less
pcircle - A Suite of Scalable Parallel File System Tools

DOE Office of Scientific and Technical Information (OSTI.GOV)

WANG, FEIYI

2015-10-01

Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

PubMed Central

Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi

2017-01-01

Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. PMID:28786986
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

PubMed

Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong

2017-01-01

Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
Scaling Semantic Graph Databases in Size and Performance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Castellana, Vito G.; Villa, Oreste

In this paper we present SGEM, a full software system for accelerating large-scale semantic graph databases on commodity clusters. Unlike current approaches, SGEM addresses semantic graph databases by only employing graph methods at all the levels of the stack. On one hand, this allows exploiting the space efficiency of graph data structures and the inherent parallelism of graph algorithms. These features adapt well to the increasing system memory and core counts of modern commodity clusters. On the other hand, however, these systems are optimized for regular computation and batched data transfers, while graph methods usually are irregular and generate fine-grainedmore » data accesses with poor spatial and temporal locality. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods and a custom, multithreaded runtime system. We introduce our stack, motivate its advantages with respect to other solutions and show how we solved the challenges posed by irregular behaviors. We present the result of our software stack on the Berlin SPARQL benchmarks with datasets up to 10 billion triples (a triple corresponds to a graph edge), demonstrating scaling in dataset size and in performance as more nodes are added to the cluster.« less
SciSpark: In-Memory Map-Reduce for Earth Science Algorithms

NASA Astrophysics Data System (ADS)

Ramirez, P.; Wilson, B. D.; Whitehall, K. D.; Palamuttam, R. S.; Mattmann, C. A.; Shah, S.; Goodman, A.; Burke, W.

2016-12-01

We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark under a NASA AIST grant (PI Mattmann). Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based Apache Hadoop by 100x in memory and by 10x on disk. SciSpark extends Spark to support Earth Science use in three ways: Efficient ingest of N-dimensional geo-located arrays (physical variables) from netCDF3/4, HDF4/5, and/or OPeNDAP URLS; Array operations for dense arrays in scala and Java using the ND4S/ND4J or Breeze libraries; Operations to "split" datasets across a Spark cluster by time or space or both. For example, a decade-long time-series of geo-variables can be split across time to enable parallel "speedups" of analysis by day, month, or season. Similarly, very high-resolution climate grids can be partitioned into spatial tiles for parallel operations across rows, columns, or blocks. In addition, using Spark's gateway into python, PySpark, one can utilize the entire ecosystem of numpy, scipy, etc. Finally, SciSpark Notebooks provide a modern eNotebook technology in which scala, python, or spark-sql codes are entered into cells in the Notebook and executed on the cluster, with results, plots, or graph visualizations displayed in "live widgets". We have exercised SciSpark by implementing three complex Use Cases: discovery and evolution of Mesoscale Convective Complexes (MCCs) in storms, yielding a graph of connected components; PDF Clustering of atmospheric state using parallel K-Means; and statistical "rollups" of geo-variables or model-to-obs. differences (i.e. mean, stddev, skewness, & kurtosis) by day, month, season, year, and multi-year. Geo-variables are ingested and split across the cluster using methods on the sciSparkContext object including netCDFVariables() for spatial decomposition and wholeNetCDFVariables() for time-series. The presentation will cover the architecture of SciSpark, the design of the scientific RDD (sRDD) data structures for N-dim. arrays, results from the three science Use Cases, example Notebooks, lessons learned from the algorithm implementations, and parallel performance metrics.
Optimisation of a parallel ocean general circulation model

NASA Astrophysics Data System (ADS)

Beare, M. I.; Stevens, D. P.

1997-10-01

This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Potential Application of a Graphical Processing Unit to Parallel Computations in the NUBEAM Code

NASA Astrophysics Data System (ADS)

Payne, J.; McCune, D.; Prater, R.

2010-11-01

NUBEAM is a comprehensive computational Monte Carlo based model for neutral beam injection (NBI) in tokamaks. NUBEAM computes NBI-relevant profiles in tokamak plasmas by tracking the deposition and the slowing of fast ions. At the core of NUBEAM are vector calculations used to track fast ions. These calculations have recently been parallelized to run on MPI clusters. However, cost and interlink bandwidth limit the ability to fully parallelize NUBEAM on an MPI cluster. Recent implementation of double precision capabilities for Graphical Processing Units (GPUs) presents a cost effective and high performance alternative or complement to MPI computation. Commercially available graphics cards can achieve up to 672 GFLOPS double precision and can handle hundreds of thousands of threads. The ability to execute at least one thread per particle simultaneously could significantly reduce the execution time and the statistical noise of NUBEAM. Progress on implementation on a GPU will be presented.
Crystal MD: The massively parallel molecular dynamics software for metal with BCC structure

NASA Astrophysics Data System (ADS)

Hu, Changjun; Bai, He; He, Xinfu; Zhang, Boyao; Nie, Ningming; Wang, Xianmeng; Ren, Yingwen

2017-02-01

Material irradiation effect is one of the most important keys to use nuclear power. However, the lack of high-throughput irradiation facility and knowledge of evolution process, lead to little understanding of the addressed issues. With the help of high-performance computing, we could make a further understanding of micro-level-material. In this paper, a new data structure is proposed for the massively parallel simulation of the evolution of metal materials under irradiation environment. Based on the proposed data structure, we developed the new molecular dynamics software named Crystal MD. The simulation with Crystal MD achieved over 90% parallel efficiency in test cases, and it takes more than 25% less memory on multi-core clusters than LAMMPS and IMD, which are two popular molecular dynamics simulation software. Using Crystal MD, a two trillion particles simulation has been performed on Tianhe-2 cluster.
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1996-01-01

We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.

Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1997-01-01

We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Revisiting benzene cluster cations for the chemical ionization of dimethyl sulfide and select volatile organic compounds

DOE PAGES

Kim, Michelle J.; Zoerb, Matthew C.; Campbell, Nicole R.; ...

2016-04-05

Here, benzene cluster cations were revisited as a sensitive and selective reagent ion for the chemical ionization of dimethyl sulfide (DMS) and a select group of volatile organic compounds (VOCs). Laboratory characterization was performed using both a new set of compounds (i.e., DMS, β-caryophyllene) as well as previously studied VOCs (i.e., isoprene, α-pinene). Using a field deployable chemical-ionization time-of-flight mass spectrometer (CI-ToFMS), benzene cluster cations demonstrated high sensitivity (> 1 ncps ppt −1) to DMS, isoprene, and α-pinene standards. Parallel measurements conducted using a chemical-ionization quadrupole mass spectrometer, with a much weaker electric field, demonstrated that ion–molecule reactions likely proceed through amore » combination of ligand-switching and direct charge transfer mechanisms. Laboratory tests suggest that benzene cluster cations may be suitable for the selective ionization of sesquiterpenes, where minimal fragmentation (< 25 %) was observed for the detection of β-caryophyllene, a bicyclic sesquiterpene. The in-field stability of benzene cluster cations using CI-ToFMS was examined in the marine boundary layer during the High Wind Gas Exchange Study (HiWinGS). The use of benzene cluster cation chemistry for the selective detection of DMS was validated against an atmospheric pressure ionization mass spectrometer, where measurements from the two instruments were highly correlated ( R 2 > 0.95, 10 s averages) over a wide range of sampling conditions.« less
The HST Frontier Fields: Complete High-Level Science Data Products for All 6 Clusters

NASA Astrophysics Data System (ADS)

Koekemoer, Anton M.; Mack, Jennifer; Lotz, Jennifer M.; Borncamp, David; Khandrika, Harish G.; Lucas, Ray A.; Martlin, Catherine; Porterfield, Blair; Sunnquist, Ben; Anderson, Jay; Avila, Roberto J.; Barker, Elizabeth A.; Grogin, Norman A.; Gunning, Heather C.; Hilbert, Bryan; Ogaz, Sara; Robberto, Massimo; Sembach, Kenneth; Flanagan, Kathryn; Mountain, Matt; HST Frontier Fields Team

2017-01-01

The Hubble Space Telescope Frontier Fields program (PI: J. Lotz) is a large Director's Discretionary program of 840 orbits, to obtain ultra-deep observations of six strong lensing clusters of galaxies, together with parallel deep blank fields, making use of the strong lensing amplification by these clusters of distant background galaxies to detect the faintest galaxies currently observable in the high-redshift universe. The entire program has now completed successfully for all 6 clusters, namely Abell 2744, Abell S1063, Abell 370, MACS J0416.1-2403, MACS J0717.5+3745 and MACS J1149.5+2223,. Each of these was observed over two epochs, to a total depth of 140 orbits on the main cluster and an associated parallel field, obtaining images in ACS (F435W, F606W, F814W) and WFC3/IR (F105W, F125W, F140W, F160W) on both the main cluster and the parallel field in all cases. Full sets of high-level science products have been generated for all these clusters by the team at STScI, including cumulative-depth data releases during each epoch, as well as full-depth releases after the completion of each epoch. These products include all the full-depth distortion-corrected drizzled mosaics and associated products for each cluster, which are science-ready to facilitate the construction of lensing models as well as enabling a wide range of other science projects. Many improvements beyond default calibration for ACS and WFC3/IR are implemented in these data products, including corrections for persistence, time-variable sky, and low-level dark current residuals, as well as improvements in astrometric alignment to achieve milliarcsecond-level accuracy. The full set of resulting high-level science products and mosaics are publicly delivered to the community via the Mikulski Archive for Space Telescopes (MAST) to enable the widest scientific use of these data, as well as ensuring a public legacy dataset of the highest possible quality that is of lasting value to the entire community.
The HST Frontier Fields: Complete Observations and High-Level Science Data Products for All 6 Clusters

NASA Astrophysics Data System (ADS)

Koekemoer, Anton M.; Mack, Jennifer; Lotz, Jennifer M.; Borncamp, David; Khandrika, Harish G.; Lucas, Ray A.; Martlin, Catherine; Martlin, Catherine; Porterfield, Blair; Sunnquist, Ben; Anderson, Jay; Avila, Roberto J.; Barker, Elizabeth A.; Grogin, Norman A.; Gunning, Heather C.; Hilbert, Bryan; Ogaz, Sara; Robberto, Massimo; Sembach, Kenneth; Flanagan, Kathryn; Mountain, Matt; HST Frontier Fields Team

2017-06-01

The Hubble Space Telescope Frontier Fields program is a large Director's Discretionary program of 840 orbits, to obtain ultra-deep observations of six strong lensing clusters of galaxies, together with parallel deep blank fields, making use of the strong lensing amplification by these clusters of distant background galaxies to detect the faintest galaxies currently observable in the high-redshift universe. The entire program has now completed successfully for all 6 clusters, namely Abell 2744, Abell S1063, Abell 370, MACS J0416.1-2403, MACS J0717.5+3745 and MACS J1149.5+2223,. Each of these was observed over two epochs, to a total depth of 140 orbits on the main cluster and an associated parallel field, obtaining images in ACS (F435W, F606W, F814W) and WFC3/IR (F105W, F125W, F140W, F160W) on both the main cluster and the parallel field in all cases. Full sets of high-level science products have been generated for all these clusters by the team at STScI, including cumulative-depth data releases during each epoch, as well as full-depth releases after the completion of each epoch. These products include all the full-depth distortion-corrected drizzled mosaics and associated products for each cluster, which are science-ready to facilitate the construction of lensing models as well as enabling a wide range of other science projects. Many improvements beyond default calibration for ACS and WFC3/IR are implemented in these data products, including corrections for persistence, time-variable sky, and low-level dark current residuals, as well as improvements in astrometric alignment to achieve milliarcsecond-level accuracy. The full set of resulting high-level science products and mosaics are publicly delivered to the community via the Mikulski Archive for Space Telescopes (MAST) to enable the widest scientific use of these data, as well as ensuring a public legacy dataset of the highest possible quality that is of lasting value to the entire community.
WebStruct and VisualStruct: Web interfaces and visualization for Structure software implemented in a cluster environment.

PubMed

Jayashree, B; Rajgopal, S; Hoisington, D; Prasanth, V P; Chandra, S

2008-09-24

Structure, is a widely used software tool to investigate population genetic structure with multi-locus genotyping data. The software uses an iterative algorithm to group individuals into "K" clusters, representing possibly K genetically distinct subpopulations. The serial implementation of this programme is processor-intensive even with small datasets. We describe an implementation of the program within a parallel framework. Speedup was achieved by running different replicates and values of K on each node of the cluster. A web-based user-oriented GUI has been implemented in PHP, through which the user can specify input parameters for the programme. The number of processors to be used can be specified in the background command. A web-based visualization tool "Visualstruct", written in PHP (HTML and Java script embedded), allows for the graphical display of population clusters output from Structure, where each individual may be visualized as a line segment with K colors defining its possible genomic composition with respect to the K genetic sub-populations. The advantage over available programs is in the increased number of individuals that can be visualized. The analyses of real datasets indicate a speedup of up to four, when comparing the speed of execution on clusters of eight processors with the speed of execution on one desktop. The software package is freely available to interested users upon request.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Michelle J.; Zoerb, Matthew C.; Campbell, Nicole R.

Here, benzene cluster cations were revisited as a sensitive and selective reagent ion for the chemical ionization of dimethyl sulfide (DMS) and a select group of volatile organic compounds (VOCs). Laboratory characterization was performed using both a new set of compounds (i.e., DMS, β-caryophyllene) as well as previously studied VOCs (i.e., isoprene, α-pinene). Using a field deployable chemical-ionization time-of-flight mass spectrometer (CI-ToFMS), benzene cluster cations demonstrated high sensitivity (> 1 ncps ppt −1) to DMS, isoprene, and α-pinene standards. Parallel measurements conducted using a chemical-ionization quadrupole mass spectrometer, with a much weaker electric field, demonstrated that ion–molecule reactions likely proceed through amore » combination of ligand-switching and direct charge transfer mechanisms. Laboratory tests suggest that benzene cluster cations may be suitable for the selective ionization of sesquiterpenes, where minimal fragmentation (< 25 %) was observed for the detection of β-caryophyllene, a bicyclic sesquiterpene. The in-field stability of benzene cluster cations using CI-ToFMS was examined in the marine boundary layer during the High Wind Gas Exchange Study (HiWinGS). The use of benzene cluster cation chemistry for the selective detection of DMS was validated against an atmospheric pressure ionization mass spectrometer, where measurements from the two instruments were highly correlated ( R 2 > 0.95, 10 s averages) over a wide range of sampling conditions.« less
Two Parallel Olfactory Pathways for Processing General Odors in a Cockroach

PubMed Central

Watanabe, Hidehiro; Nishino, Hiroshi; Mizunami, Makoto; Yokohari, Fumio

2017-01-01

In animals, sensory processing via parallel pathways, including the olfactory system, is a common design. However, the mechanisms that parallel pathways use to encode highly complex and dynamic odor signals remain unclear. In the current study, we examined the anatomical and physiological features of parallel olfactory pathways in an evolutionally basal insect, the cockroach Periplaneta americana. In this insect, the entire system for processing general odors, from olfactory sensory neurons to higher brain centers, is anatomically segregated into two parallel pathways. Two separate populations of secondary olfactory neurons, type1 and type2 projection neurons (PNs), with dendrites in distinct glomerular groups relay olfactory signals to segregated areas of higher brain centers. We conducted intracellular recordings, revealing olfactory properties and temporal patterns of both types of PNs. Generally, type1 PNs exhibit higher odor-specificities to nine tested odorants than type2 PNs. Cluster analyses revealed that odor-evoked responses were temporally complex and varied in type1 PNs, while type2 PNs exhibited phasic on-responses with either early or late latencies to an effective odor. The late responses are 30–40 ms later than the early responses. Simultaneous intracellular recordings from two different PNs revealed that a given odor activated both types of PNs with different temporal patterns, and latencies of early and late responses in type2 PNs might be precisely controlled. Our results suggest that the cockroach is equipped with two anatomically and physiologically segregated parallel olfactory pathways, which might employ different neural strategies to encode odor information. PMID:28529476
NETRA: A parallel architecture for integrated vision systems. 1: Architecture and organization

NASA Technical Reports Server (NTRS)

Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is considered to be a system that uses vision algorithms from all levels of processing for a high level application (such as object recognition). A model of computation is presented for parallel processing for an IVS. Using the model, desired features and capabilities of a parallel architecture suitable for IVSs are derived. Then a multiprocessor architecture (called NETRA) is presented. This architecture is highly flexible without the use of complex interconnection schemes. The topology of NETRA is recursively defined and hence is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. It is a recursively defined tree-type hierarchical architecture where each of the leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide for desired flexibility. A qualitative evaluation of NETRA is presented. Then general schemes are described to map parallel algorithms onto NETRA. Algorithms are classified according to their communication requirements for parallel processing. An extensive analysis of inter-cluster communication strategies in NETRA is presented, and parameters affecting performance of parallel algorithms when mapped on NETRA are discussed. Finally, a methodology to evaluate performance of algorithms on NETRA is described.
Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

NASA Astrophysics Data System (ADS)

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

2006-05-01

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Using earthquake clusters to identify fracture zones at Puna geothermal field, Hawaii

NASA Astrophysics Data System (ADS)

Lucas, A.; Shalev, E.; Malin, P.; Kenedi, C. L.

2010-12-01

The actively producing Puna geothermal system (PGS) is located on the Kilauea East Rift Zone (ERZ), which extends out from the active Kilauea volcano on Hawaii. In the Puna area the rift trend is identified as NE-SW from surface expressions of normal faulting with a corresponding strike; at PGS the surface expression offsets in a left step, but no rift perpendicular faulting is observed. An eight station borehole seismic network has been installed in the area of the geothermal system. Since June 2006, a total of 6162 earthquakes have been located close to or inside the geothermal system. The spread of earthquake locations follows the rift trend, but down rift to the NE of PGS almost no earthquakes are observed. Most earthquakes located within the PGS range between 2-3 km depth. Up rift to the SW of PGS the number of events decreases and the depth range increases to 3-4 km. All initial locations used Hypoinverse71 and showed no trends other than the dominant rift parallel. Double difference relocation of all earthquakes, using both catalog and cross-correlation, identified one large cluster but could not conclusively identify trends within the cluster. A large number of earthquake waveforms showed identifiable shear wave splitting. For five stations out of the six where shear wave splitting was observed, the dominant polarization direction was rift parallel. Two of the five stations also showed a smaller rift perpendicular signal. The sixth station (located close to the area of the rift offset) displayed a N-S polarization, approximately halfway between rift parallel and perpendicular. The shear wave splitting time delays indicate that fracture density is higher at the PGS compared to the surrounding ERZ. Correlation co-efficient clustering with independent P and S wave windows was used to identify clusters based on similar earthquake waveforms. In total, 40 localized clusters containing ten or more events were identified. The largest cluster was located in the production area for the power plant. Most of the clusters had linear features when their Hypoinverse locations were plotted. The concentration of individual linear features was higher in the PGS than the surrounding ERZ. The resolution of the features was resolved further by relocating each individual cluster through the catalog double difference method. Mapping of the linear features showed that a number of the larger features ran rift parallel. However a large number of rift perpendicular features were also identified. In the area where the anomalous (N-S) shear wave polarization was observed, a number of linear features with a similar orientation were identified. We assume that events occurring on the same fracture zone have similar source mechanisms and thus similar waveforms. It is concluded that the linear features identified by earthquake clustering are fracture zones. The orientation and concentration of the fracture zones is consistent with that of the shear wave splitting polarizations.
SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

1990-01-01

Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Exploring the role of quantitative feedback in inhaler technique education: a cluster-randomised, two-arm, parallel-group, repeated-measures study.

PubMed

Toumas-Shehata, Mariam; Price, David; Basheti, Iman Amin; Bosnic-Anticevich, Sinthia

2014-11-13

Feedback is a critical component of any educational intervention. When it comes to feedback associated with inhaler technique education, there is a lack of knowledge on its role or its potential to solve the major issue of poor inhaler technique. This study aims to explore the role of feedback in inhaler technique education and its impact on the inhaler technique of patients over time. A parallel-group, repeated-measures study was conducted in the community pharmacy in which the effectiveness of current best practice inhaler technique education utilising qualitative visual feedback (Group 1) was compared with a combination of qualitative and quantitative visual feedback (Group 2). The impact of these two interventions on inhaler technique maintenance was evaluated. Community pharmacists were randomly allocated to recruit people with asthma who were using a dry powder inhaler. At Visit 1 their inhaler technique was evaluated and education delivered and they were followed up at Visit 2 (1 month later). Both educational interventions resulted in an increase in the proportion of patients with correct inhaler technique: from 4% to 51% in Group 1 and from 6% to 83% in Group 2 (Pearson's Chi-Squared, P=0.03, n=49, and Pearson's Chi-Squared, P=0.01, n=48, respectively). The magnitude of improvement was statistically significantly higher for Group 2 compared with Group 1 (n=97, P=0.02, Pearson's Chi-Square test). The nature of feedback has an impact on the effectiveness of inhaler technique education with regard to correct inhaler technique maintenance over time.
Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs.

PubMed

Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

2008-05-28

Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4-15.9 times faster, while Unphased jobs performed 1.1-18.6 times faster compared to the accumulated computation duration. Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance.
Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs

PubMed Central

Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

2008-01-01

Background Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Results Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4–15.9 times faster, while Unphased jobs performed 1.1–18.6 times faster compared to the accumulated computation duration. Conclusion Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance. PMID:18541045
Group hypnosis vs. relaxation for smoking cessation in adults: a cluster-randomised controlled trial.

PubMed

Dickson-Spillmann, Maria; Haug, Severin; Schaub, Michael P

2013-12-23

Despite the popularity of hypnotherapy for smoking cessation, the efficacy of this method is unclear. We aimed to investigate the efficacy of a single-session of group hypnotherapy for smoking cessation compared to relaxation in Swiss adult smokers. This was a cluster-randomised, parallel-group, controlled trial. A single session of hypnosis or relaxation for smoking cessation was delivered to groups of smokers (median size = 11). Participants were 223 smokers consuming ≥ 5 cigarettes per day, willing to quit and not using cessation aids (47.1% females, M = 37.5 years [SD = 11.8], 86.1% Swiss). Nicotine withdrawal, smoking abstinence self-efficacy, and adverse reactions were assessed at a 2-week follow-up. The main outcome, self-reported 30-day point prevalence of smoking abstinence, was assessed at a 6-month follow up. Abstinence was validated through salivary analysis. Secondary outcomes included number of cigarettes smoked per day, smoking abstinence self-efficacy, and nicotine withdrawal. At the 6-month follow up, 14.7% in the hypnosis group and 17.8% in the relaxation group were abstinent. The intervention had no effect on smoking status (p = .73) or on the number of cigarettes smoked per day (p = .56). Smoking abstinence self-efficacy did not differ between the interventions (p = .14) at the 2-week follow-up, but non-smokers in the hypnosis group experienced reduced withdrawal (p = .02). Both interventions produced few adverse reactions (p = .81). A single session of group hypnotherapy does not appear to be more effective for smoking cessation than a group relaxation session. Current Controlled Trials ISRCTN72839675.
September epsilon Perseid cluster as a result of orbital fragmentation

NASA Astrophysics Data System (ADS)

Koten, P.; Čapek, D.; Spurný, P.; Vaubaillon, J.; Popek, M.; Shrbený, L.

2017-04-01

Context. A bright fireball was observed above the Czech Republic on September 9, 2016, at 23:06:59 UT. Moreover, the video cameras at two different stations recorded eight fainter meteors flying on parallel atmospheric trajectories within less than 2 s. All the meteors belong to the September epsilon Perseid meteor shower. The measured proximity of all meteors during a very low activity meteor shower suggests that a cluster of meteors was observed. Aims: The goal of the paper is first to determine whether this event was a random occurrence or a real meteor cluster and second, if it was a cluster, to determine the epoch and at what distance from the Earth the separation of the particles occurred. Methods: The atmospheric trajectories of the observed meteors, masses, and relative distances of individual particles were determined using a double-station observation. According to the distances and masses of the particles, the most probable distance and time of fragmentation is determined. Results: The observed group of meteors is interpreted as the result of the orbital fragmentation of a bigger meteoroid. The fragmentation happened no earlier than 2 or 3 days before the encounter with the Earth at a distance smaller than 0.08 AU from the Earth.
CLustre: semi-automated lineament clustering for palaeo-glacial reconstruction

NASA Astrophysics Data System (ADS)

Smith, Mike; Anders, Niels; Keesstra, Saskia

2016-04-01

Palaeo glacial reconstructions, or "inversions", using evidence from the palimpsest landscape are increasingly being undertaken with larger and larger databases. Predominant in landform evidence is the lineament (or drumlin) where the biggest datasets number in excess of 50,000 individual forms. One stage in the inversion process requires the identification of lineaments that are generically similar and then their subsequent interpretation in to a coherent chronology of events. Here we present CLustre, a semi-authomated algorithm that clusters lineaments using a locally adaptive, region growing, method. This is initially tested using 1,500 model runs on a synthetic dataset, before application to two case studies (where manual clustering has been undertaken by independent researchers): (1) Dubawnt Lake, Canada and (2) Victoria island, Canada. Results using the synthetic data show that classifications are robust in most scenarios, although specific cases of cross-cutting lineaments may lead to incorrect clusters. Application to the case studies showed a very good match to existing published work, with differences related to limited numbers of unclassified lineaments and parallel cross-cutting lineaments. The value in CLustre comes from the semi-automated, objective, application of a classification method that is repeatable. Once classified, summary statistics of lineament groups can be calculated and then used in the inversion.
The Frontier Fields: Survey Design and Initial Results

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lotz, J. M.; Koekemoer, A.; Grogin, N.

What are the faintest distant galaxies we can see with the Hubble Space Telescope ( HST ) now, before the launch of the James Webb Space Telescope ? This is the challenge taken up by the Frontier Fields, a Director’s discretionary time campaign with HST and the Spitzer Space Telescope to see deeper into the universe than ever before. The Frontier Fields combines the power of HST and Spitzer with the natural gravitational telescopes of massive high-magnification clusters of galaxies to produce the deepest observations of clusters and their lensed galaxies ever obtained. Six clusters—Abell 2744, MACSJ0416.1-2403, MACSJ0717.5+3745, MACSJ1149.5+2223, Abellmore » S1063, and Abell 370—have been targeted by the HST ACS/WFC and WFC3/IR cameras with coordinated parallel fields for over 840 HST orbits. The parallel fields are the second-deepest observations thus far by HST with 5 σ point-source depths of ∼29th ABmag. Galaxies behind the clusters experience typical magnification factors of a few, with small regions magnified by factors of 10–100. Therefore, the Frontier Field cluster HST images achieve intrinsic depths of ∼30–33 mag over very small volumes. Spitzer has obtained over 1000 hr of Director’s discretionary imaging of the Frontier Field cluster and parallels in IRAC 3.6 and 4.5 μ m bands to 5 σ point-source depths of ∼26.5, 26.0 ABmag. We demonstrate the exceptional sensitivity of the HST Frontier Field images to faint high-redshift galaxies, and review the initial results related to the primary science goals.« less
Ion-Stockmayer clusters: Minima, classical thermodynamics, and variational ground state estimates of Li+(CH3NO2)n (n = 1-20)

NASA Astrophysics Data System (ADS)

Curotto, E.

2015-12-01

Structural optimizations, classical NVT ensemble, and variational Monte Carlo simulations of ion Stockmayer clusters parameterized to approximate the Li+(CH3NO2)n (n = 1-20) systems are performed. The Metropolis algorithm enhanced by the parallel tempering strategy is used to measure internal energies and heat capacities, and a parallel version of the genetic algorithm is employed to obtain the most important minima. The first solvation sheath is octahedral and this feature remains the dominant theme in the structure of clusters with n ≥ 6. The first "magic number" is identified using the adiabatic solvent dissociation energy, and it marks the completion of the second solvation layer for the lithium ion-nitromethane clusters. It corresponds to the n = 18 system, a solvated ion with the first sheath having octahedral symmetry, weakly bound to an eight-membered and a four-membered ring crowning a vertex of the octahedron. Variational Monte Carlo estimates of the adiabatic solvent dissociation energy reveal that quantum effects further enhance the stability of the n = 18 system relative to its neighbors.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.

PubMed

Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

2016-01-01

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.

Efficient Record Linkage Algorithms Using Complete Linkage Clustering

PubMed Central

Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

2016-01-01

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gittens, Alex; Kottalam, Jey; Yang, Jiyan

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with themore » fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.« less
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Parallel Implementation of the Discontinuous Galerkin Method

NASA Technical Reports Server (NTRS)

Baggag, Abdalkader; Atkins, Harold; Keyes, David

1999-01-01

This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
Molecular identification and cluster analysis of homofermentative thermophilic lactobacilli isolated from dairy products.

PubMed

Andrighetto, C; De Dea, P; Lombardi, A; Neviani, E; Rossetti, L; Giraffa, G

1998-10-01

Twenty-five strains of thermophilic lactobacilli isolated from yoghurt and from semi-hard and hard cheeses (in parallel with nine type or reference strains) were identified and grouped according to their genetic relatedness. Strains were identified by sugar fermentation patterns using the "API 50 CHL" galleries, by species-specific DNA probes in dot-blot hybridization experiments, by amplification and restriction analysis of the 16S rRNA gene (ARDRA) and by polymerase chain reaction (PCR) using species-specific oligonucleotide primers. Strains were classified as Lactobacillus delbrueckii subsp. lactis and subsp. bulgaricus, L. helveticus, and L. acidophilus. Strains which were atypical by sugar fermentation patterns were also identified. Most of the strains could not be grouped using carbohydrate fermentation profiles. PCR fingerprinting was used to identify DNA profiles for the 25 lactobacilli. Experimentally obtained PCR profiles enabled discrimination of all strains, which were grouped according to the similarities in their combined patterns. In general, the clustering of the strains corresponded well with species delineation obtained by molecular identification. The dendrogram of genetic relatedness enabled the unambiguous identification of most of the strains which were shown to be atypical by the sugar fermentation profile, except for a discrepancy in one L. delbrueckii subsp. lactis strain and one atypical Lactobacillus sp. strain.
Cost/Performance Ratio Achieved by Using a Commodity-Based Cluster

NASA Technical Reports Server (NTRS)

Lopez, Isaac

2001-01-01

Researchers at the NASA Glenn Research Center acquired a commodity cluster based on Intel Corporation processors to compare its performance with a traditional UNIX cluster in the execution of aeropropulsion applications. Since the cost differential of the clusters was significant, a cost/performance ratio was calculated. After executing a propulsion application on both clusters, the researchers demonstrated a 9.4 cost/performance ratio in favor of the Intel-based cluster. These researchers utilize the Aeroshark cluster as one of the primary testbeds for developing NPSS parallel application codes and system software. The Aero-shark cluster provides 64 Intel Pentium II 400-MHz processors, housed in 32 nodes. Recently, APNASA - a code developed by a Government/industry team for the design and analysis of turbomachinery systems was used for a simulation on Glenn's Aeroshark cluster.
Implementation of High-Order Multireference Coupled-Cluster Methods on Intel Many Integrated Core Architecture.

PubMed

Aprà, E; Kowalski, K

2016-03-08

In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
Performance monitoring for new phase dynamic optimization of instruction dispatch cluster configuration

DOEpatents

Balasubramonian, Rajeev [Sandy, UT; Dwarkadas, Sandhya [Rochester, NY; Albonesi, David [Ithaca, NY

2012-01-24

In a processor having multiple clusters which operate in parallel, the number of clusters in use can be varied dynamically. At the start of each program phase, the configuration option for an interval is run to determine the optimal configuration, which is used until the next phase change is detected. The optimum instruction interval is determined by starting with a minimum interval and doubling it until a low stability factor is reached.
Path lumping: An efficient algorithm to identify metastable path channels for conformational dynamics of multi-body systems

NASA Astrophysics Data System (ADS)

Meng, Luming; Sheong, Fu Kit; Zeng, Xiangze; Zhu, Lizhe; Huang, Xuhui

2017-07-01

Constructing Markov state models from large-scale molecular dynamics simulation trajectories is a promising approach to dissect the kinetic mechanisms of complex chemical and biological processes. Combined with transition path theory, Markov state models can be applied to identify all pathways connecting any conformational states of interest. However, the identified pathways can be too complex to comprehend, especially for multi-body processes where numerous parallel pathways with comparable flux probability often coexist. Here, we have developed a path lumping method to group these parallel pathways into metastable path channels for analysis. We define the similarity between two pathways as the intercrossing flux between them and then apply the spectral clustering algorithm to lump these pathways into groups. We demonstrate the power of our method by applying it to two systems: a 2D-potential consisting of four metastable energy channels and the hydrophobic collapse process of two hydrophobic molecules. In both cases, our algorithm successfully reveals the metastable path channels. We expect this path lumping algorithm to be a promising tool for revealing unprecedented insights into the kinetic mechanisms of complex multi-body processes.
Large-scale parallel genome assembler over cloud computing environment.

PubMed

Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

2017-06-01

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
High Performance Distributed Computing in a Supercomputer Environment: Computational Services and Applications Issues

NASA Technical Reports Server (NTRS)

Kramer, Williams T. C.; Simon, Horst D.

1994-01-01

This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
The Hubble Space Telescope Frontier Fields Program

NASA Astrophysics Data System (ADS)

Koekemoer, Anton M.; Mack, Jennifer; Lotz, Jennifer M.; Borncamp, David; Khandrika, Harish G.; Lucas, Ray A.; Martlin, Catherine; Porterfield, Blair; Sunnquist, Ben; Anderson, Jay; Avila, Roberto J.; Barker, Elizabeth A.; Grogin, Norman A.; Gunning, Heather C.; Hilbert, Bryan; Ogaz, Sara; Robberto, Massimo; Sembach, Kenneth; Flanagan, Kathryn; Mountain, Matt

2017-08-01

The Hubble Space Telescope Frontier Fields program is a large Director's Discretionary program of 840 orbits, to obtain ultra-deep observations of six strong lensing clusters of galaxies, together with parallel deep blank fields, making use of the strong lensing amplification by these clusters of distant background galaxies to detect the faintest galaxies currently observable in the high-redshift universe. The entire program has now completed successfully for all 6 clusters, namely Abell 2744, Abell S1063, Abell 370, MACS J0416.1-2403, MACS J0717.5+3745 and MACS J1149.5+2223,. Each of these was observed over two epochs, to a total depth of 140 orbits on the main cluster and an associated parallel field, obtaining images in ACS (F435W, F606W, F814W) and WFC3/IR (F105W, F125W, F140W, F160W) on both the main cluster and the parallel field in all cases. Full sets of high-level science products have been generated for all these clusters by the team at STScI, including cumulative-depth data releases during each epoch, as well as full-depth releases after the completion of each epoch. These products include all the full-depth distortion-corrected drizzled mosaics and associated products for each cluster, which are science-ready to facilitate the construction of lensing models as well as enabling a wide range of other science projects. Many improvements beyond default calibration for ACS and WFC3/IR are implemented in these data products, including corrections for persistence, time-variable sky, and low-level dark current residuals, as well as improvements in astrometric alignment to achieve milliarcsecond-level accuracy. The full set of resulting high-level science products and mosaics are publicly delivered to the community via the Mikulski Archive for Space Telescopes (MAST) to enable the widest scientific use of these data, as well as ensuring a public legacy dataset of the highest possible quality that is of lasting value to the entire community.
HPCC Methodologies for Structural Design and Analysis on Parallel and Distributed Computing Platforms

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1998-01-01

In this grant, we have proposed a three-year research effort focused on developing High Performance Computation and Communication (HPCC) methodologies for structural analysis on parallel processors and clusters of workstations, with emphasis on reducing the structural design cycle time. Besides consolidating and further improving the FETI solver technology to address plate and shell structures, we have proposed to tackle the following design related issues: (a) parallel coupling and assembly of independently designed and analyzed three-dimensional substructures with non-matching interfaces, (b) fast and smart parallel re-analysis of a given structure after it has undergone design modifications, (c) parallel evaluation of sensitivity operators (derivatives) for design optimization, and (d) fast parallel analysis of mildly nonlinear structures. While our proposal was accepted, support was provided only for one year.
Disparity : scalable anomaly detection for clusters.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desai, N.; Bradshaw, R.; Lusk, E.

2008-01-01

In this paper, we describe disparity, a tool that does parallel, scalable anomaly detection for clusters. Disparity uses basic statistical methods and scalable reduction operations to perform data reduction on client nodes and uses these results to locate node anomalies. We discuss the implementation of disparity and present results of its use on a SiCortex SC5832 system.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less
The formation of magnetic silicide Fe3Si clusters during ion implantation

NASA Astrophysics Data System (ADS)

Balakirev, N.; Zhikharev, V.; Gumarov, G.

2014-05-01

A simple two-dimensional model of the formation of magnetic silicide Fe3Si clusters during high-dose Fe ion implantation into silicon has been proposed and the cluster growth process has been computer simulated. The model takes into account the interaction between the cluster magnetization and magnetic moments of Fe atoms random walking in the implanted layer. If the clusters are formed in the presence of the external magnetic field parallel to the implanted layer, the model predicts the elongation of the growing cluster in the field direction. It has been proposed that the cluster elongation results in the uniaxial magnetic anisotropy in the plane of the implanted layer, which is observed in iron silicide films ion-beam synthesized in the external magnetic field.
Certification of Completion of Level-2 Milestone 464: Complete Phase 1 Integration of Site-Wide Global Parallel File System (SWGPFS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heidelberg, S T; Fitzgerald, K J; Richmond, G H

2006-01-24

There has been substantial development of the Lustre parallel filesystem prior to the configuration described below for this milestone. The initial Lustre filesystems that were deployed were directly connected to the cluster interconnect, i.e. Quadrics Elan3. That is, the clients (OSSes) and Meta-data Servers (MDS) were all directly connected to the cluster's internal high speed interconnect. This configuration serves a single cluster very well, but does not provide sharing of the filesystem among clusters. LLNL funded the development of high-efficiency ''portals router'' code by CFS (the company that develops Lustre) to enable us to move the Lustre servers to amore » GigE-connected network configuration, thus making it possible to connect to the servers from several clusters. With portals routing available, here is what changes: (1) another storage-only cluster is deployed to front the Lustre storage devices (these become the Lustre OSSes and MDS), (2) this ''Lustre cluster'' is attached via GigE connections to a large GigE switch/router cloud, (3) a small number of compute-cluster nodes are designated as ''gateway'' or ''portal router'' nodes, and (4) the portals router nodes are GigE-connected to the switch/router cloud. The Lustre configuration is then changed to reflect the new network paths. A typical example of this is a compute cluster and a related visualization cluster: the compute cluster produces the data (writes it to the Lustre filesystem), and the visualization cluster consumes some of the data (reads it from the Lustre filesystem). This process can be expanded by aggregating several collections of Lustre backend storage resources into one or more ''centralized'' Lustre filesystems, and then arranging to have several ''client'' clusters mount these centralized filesystems. The ''client clusters'' can be any combination of compute, visualization, archiving, or other types of cluster. This milestone demonstrates the operation and performance of a scaled-down version of such a large, centralized, shared Lustre filesystem concept.« less
The effectiveness of a life style modification and peer support home blood pressure monitoring in control of hypertension: protocol for a cluster randomized controlled trial.

PubMed

Su, Tin Tin; Majid, Hazreen Abdul; Nahar, Azmi Mohamed; Azizan, Nurul Ain; Hairi, Farizah Mohd; Thangiah, Nithiah; Dahlui, Maznah; Bulgiba, Awang; Murray, Liam J

2014-01-01

Death rates due to hypertension in low and middle income countries are higher compared to high income countries. The present study is designed to combine life style modification and home blood pressure monitoring for control of hypertension in the context of low and middle income countries. The study is a two armed, parallel group, un-blinded, cluster randomized controlled trial undertaken within lower income areas in Kuala Lumpur. Two housing complexes will be assigned to the intervention group and the other two housing complexes will be allocated in the control group. Based on power analysis, 320 participants will be recruited. The participants in the intervention group (n = 160) will undergo three main components in the intervention which are the peer support for home blood pressure monitoring, face to face health coaching on healthy diet and demonstration and training for indoor home based exercise activities while the control group will receive a pamphlet containing information on hypertension. The primary outcomes are systolic and diastolic blood pressure. Secondary outcome measures include practice of self-blood pressure monitoring, dietary intake, level of physical activity and physical fitness. The present study will evaluate the effect of lifestyle modification and peer support home blood pressure monitoring on blood pressure control, during a 6 month intervention period. Moreover, the study aims to assess whether these effects can be sustainable more than six months after the intervention has ended.
Wrapping up BLAST and other applications for use on Unix clusters.

PubMed

Hokamp, Karsten; Shields, Denis C; Wolfe, Kenneth H; Caffrey, Daniel R

2003-02-12

We have developed two programs that speed up common bioinformatic applications by spreading them across a UNIX cluster.(1) BLAST.pm, a new module for the 'MOLLUSC' package. (2) WRAPID, a simple tool for parallelizing large numbers of small instances of programs such as BLAST, FASTA and CLUSTALW. The packages were developed in Perl on a 20-node Linux cluster and are provided together with a configuration script and documentation. They can be freely downloaded from http://wolfe.gen.tcd.ie/wrapper.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ali, Amjad Majid; Albert, Don; Andersson, Par

SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small computer clusters. As a cluster resource manager, SLURM has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work 9normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.
Transcriptional profiles of Arabidopsis stomataless mutants reveal developmental and physiological features of life in the absence of stomata

PubMed Central

de Marcos, Alberto; Triviño, Magdalena; Pérez-Bueno, María Luisa; Ballesteros, Isabel; Barón, Matilde; Mena, Montaña; Fenoll, Carmen

2015-01-01

Loss of function of the positive stomata development regulators SPCH or MUTE in Arabidopsis thaliana renders stomataless plants; spch-3 and mute-3 mutants are extreme dwarfs, but produce cotyledons and tiny leaves, providing a system to interrogate plant life in the absence of stomata. To this end, we compared their cotyledon transcriptomes with that of wild-type plants. K-means clustering of differentially expressed genes generated four clusters: clusters 1 and 2 grouped genes commonly regulated in the mutants, while clusters 3 and 4 contained genes distinctively regulated in mute-3. Classification in functional categories and metabolic pathways of genes in clusters 1 and 2 suggested that both mutants had depressed secondary, nitrogen and sulfur metabolisms, while only a few photosynthesis-related genes were down-regulated. In situ quenching analysis of chlorophyll fluorescence revealed limited inhibition of photosynthesis. This and other fluorescence measurements matched the mutant transcriptomic features. Differential transcriptomes of both mutants were enriched in growth-related genes, including known stomata development regulators, which paralleled their epidermal phenotypes. Analysis of cluster 3 was not informative for developmental aspects of mute-3. Cluster 4 comprised genes differentially up−regulated in mute−3, 35% of which were direct targets for SPCH and may relate to the unique cell types of mute−3. A screen of T-DNA insertion lines in genes differentially expressed in the mutants identified a gene putatively involved in stomata development. A collection of lines for conditional overexpression of transcription factors differentially expressed in the mutants rendered distinct epidermal phenotypes, suggesting that these proteins may be novel stomatal development regulators. Thus, our transcriptome analysis represents a useful source of new genes for the study of stomata development and for characterizing physiology and growth in the absence of stomata. PMID:26157447
A communication library for the parallelization of air quality models on structured grids

NASA Astrophysics Data System (ADS)

Miehe, Philipp; Sandu, Adrian; Carmichael, Gregory R.; Tang, Youhua; Dăescu, Dacian

PAQMSG is an MPI-based, Fortran 90 communication library for the parallelization of air quality models (AQMs) on structured grids. It consists of distribution, gathering and repartitioning routines for different domain decompositions implementing a master-worker strategy. The library is architecture and application independent and includes optimization strategies for different architectures. This paper presents the library from a user perspective. Results are shown from the parallelization of STEM-III on Beowulf clusters. The PAQMSG library is available on the web. The communication routines are easy to use, and should allow for an immediate parallelization of existing AQMs. PAQMSG can also be used for constructing new models.
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop

NASA Astrophysics Data System (ADS)

Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.

2018-04-01

The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
Swimming in Sculptor

NASA Image and Video Library

2016-03-07

Peering deep into the early Universe, this picturesque parallel field observation from the NASA/ESA Hubble Space Telescope reveals thousands of colourful galaxies swimming in the inky blackness of space. A few foreground stars from our own galaxy, the Milky Way, are also visible. In October 2013 Hubble’s Wide Field Camera 3 (WFC3) and Advanced Camera for Surveys (ACS) began observing this portion of sky as part of the Frontier Fields programme. This spectacular skyscape was captured during the study of the giant galaxy cluster Abell 2744, otherwise known as Pandora’s Box. While one of Hubble’s cameras concentrated on Abell 2744, the other camera viewed this adjacent patch of sky near to the cluster. Containing countless galaxies of various ages, shapes and sizes, this parallel field observation is nearly as deep as the Hubble Ultra-Deep Field. In addition to showcasing the stunning beauty of the deep Universe in incredible detail, this parallel field — when compared to other deep fields — will help astronomers understand how similar the Universe looks in different directions
Parallel Navier-Stokes computations on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar

1995-01-01

We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
When Your Pregnancy Echoes Your Illness: Transition to Motherhood With Inflammatory Bowel Disease.

PubMed

Ghorayeb, Jihane; Branney, Peter; Selinger, Christian P; Madill, Anna

2018-03-01

Our aim is to provide an understanding of the experience of women with inflammatory bowel disease (IBD) who have made the transition to motherhood. A total of 22 mothers with IBD were recruited from around the United Kingdom. Semi-structured interviews were conducted and analyzed using thematic analysis. The central concept- Blurred Lines-offers a novel frame for understanding the transition to motherhood with IBD through identifying parallels between having IBD and becoming, and being, a mother. Parallels clustered into three main themes: Need for Readiness, Lifestyle Changes, and Monitoring Personal and Physical Development. Hence, women with IBD are in some ways well prepared for the challenges of motherhood even though, as a group, they tend to restrict their reproductive choices. We recommend health professionals initiate conversations about reproduction early and provide a multidisciplinary approach to pregnancy and IBD in which women have confidence that their ongoing treatment will be integrated successfully with their maternity care.
Assessment of phytoplankton class abundance using fluorescence excitation-emission matrix by parallel factor analysis and nonnegative least squares

NASA Astrophysics Data System (ADS)

Su, Rongguo; Chen, Xiaona; Wu, Zhenzhen; Yao, Peng; Shi, Xiaoyong

2015-07-01

The feasibility of using fluorescence excitation-emission matrix (EEM) along with parallel factor analysis (PARAFAC) and nonnegative least squares (NNLS) method for the differentiation of phytoplankton taxonomic groups was investigated. Forty-one phytoplankton species belonging to 28 genera of five divisions were studied. First, the PARAFAC model was applied to EEMs, and 15 fluorescence components were generated. Second, 15 fluorescence components were found to have a strong discriminating capability based on Bayesian discriminant analysis (BDA). Third, all spectra of the fluorescence component compositions for the 41 phytoplankton species were spectrographically sorted into 61 reference spectra using hierarchical cluster analysis (HCA), and then, the reference spectra were used to establish a database. Finally, the phytoplankton taxonomic groups was differentiated by the reference spectra database using the NNLS method. The five phytoplankton groups were differentiated with the correct discrimination ratios (CDRs) of 100% for single-species samples at the division level. The CDRs for the mixtures were above 91% for the dominant phytoplankton species and above 73% for the subdominant phytoplankton species. Sixteen of the 85 field samples collected from the Changjiang River estuary were analyzed by both HPLC-CHEMTAX and the fluorometric technique developed. The results of both methods reveal that Bacillariophyta was the dominant algal group in these 16 samples and that the subdominant algal groups comprised Dinophyta, Chlorophyta and Cryptophyta. The differentiation results by the fluorometric technique were in good agreement with those from HPLC-CHEMTAX. The results indicate that the fluorometric technique could differentiate algal taxonomic groups accurately at the division level.
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Distributed multitasking ITS with PVM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fan, W.C.; Halbleib, J.A. Sr.

1995-12-31

Advances in computer hardware and communication software have made it possible to perform parallel-processing computing on a collection of desktop workstations. For many applications, multitasking on a cluster of high-performance workstations has achieved performance comparable to or better than that on a traditional supercomputer. From the point of view of cost-effectiveness, it also allows users to exploit available but unused computational resources and thus achieve a higher performance-to-cost ratio. Monte Carlo calculations are inherently parallelizable because the individual particle trajectories can be generated independently with minimum need for interprocessor communication. Furthermore, the number of particle histories that can be generatedmore » in a given amount of wall-clock time is nearly proportional to the number of processors in the cluster. This is an important fact because the inherent statistical uncertainty in any Monte Carlo result decreases as the number of histories increases. For these reasons, researchers have expended considerable effort to take advantage of different parallel architectures for a variety of Monte Carlo radiation transport codes, often with excellent results. The initial interest in this work was sparked by the multitasking capability of the MCNP code on a cluster of workstations using the Parallel Virtual Machine (PVM) software. On a 16-machine IBM RS/6000 cluster, it has been demonstrated that MCNP runs ten times as fast as on a single-processor CRAY YMP. In this paper, we summarize the implementation of a similar multitasking capability for the coupled electronphoton transport code system, the Integrated TIGER Series (ITS), and the evaluation of two load-balancing schemes for homogeneous and heterogeneous networks.« less
Distributed multitasking ITS with PVM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fan, W.C.; Halbleib, J.A. Sr.

1995-02-01

Advances of computer hardware and communication software have made it possible to perform parallel-processing computing on a collection of desktop workstations. For many applications, multitasking on a cluster of high-performance workstations has achieved performance comparable or better than that on a traditional supercomputer. From the point of view of cost-effectiveness, it also allows users to exploit available but unused computational resources, and thus achieve a higher performance-to-cost ratio. Monte Carlo calculations are inherently parallelizable because the individual particle trajectories can be generated independently with minimum need for interprocessor communication. Furthermore, the number of particle histories that can be generated inmore » a given amount of wall-clock time is nearly proportional to the number of processors in the cluster. This is an important fact because the inherent statistical uncertainty in any Monte Carlo result decreases as the number of histories increases. For these reasons, researchers have expended considerable effort to take advantage of different parallel architectures for a variety of Monte Carlo radiation transport codes, often with excellent results. The initial interest in this work was sparked by the multitasking capability of MCNP on a cluster of workstations using the Parallel Virtual Machine (PVM) software. On a 16-machine IBM RS/6000 cluster, it has been demonstrated that MCNP runs ten times as fast as on a single-processor CRAY YMP. In this paper, we summarize the implementation of a similar multitasking capability for the coupled electron/photon transport code system, the Integrated TIGER Series (ITS), and the evaluation of two load balancing schemes for homogeneous and heterogeneous networks.« less
Distributed computing feasibility in a non-dedicated homogeneous distributed system

NASA Technical Reports Server (NTRS)

Leutenegger, Scott T.; Sun, Xian-He

1993-01-01

The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

PubMed Central

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
Parallel programming of gradient-based iterative image reconstruction schemes for optical tomography.

PubMed

Hielscher, Andreas H; Bartel, Sebastian

2004-02-01

Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
Global Relationships Among the Physical Properties of Stellar Systems.

NASA Astrophysics Data System (ADS)

Burstein, David; Bender, Ralf; Faber, S.; Nolthenius, R.

1997-10-01

The Κ-space three-dimensional parameter system was originally defined to examine the physical properties of dynamically hot elliptical galaxies and bulges (DRGs). The axes of Κ-space are proportional to the logarithm of galaxy mass, mass-to-light ratio, and a third quantity that is mainly surface brightness. In this paper we define self-consistent Κ parameters for disk galaxies, galaxy groups and clusters, and globular clusters and use them to project an integrated view of the major classes of self-gravitating, equilibrium stellar systems in the universe. Each type of stellar system is found to populate its own fundamental plane in Κ-space. At least six different planes are found: (1) the original fundamental plane for DRGs; (2) a nearly-parallel plane slightly offset for Sa-Sc spirals; (3) a plane with different tilt but similar zero point for Scd-Irr galaxies; (4) a plane parallel to the DRG plane but offset by a factor of 10 in mass-to-light ratio for rich galaxy clusters; (5) a plane for galaxy groups that bridges the gap between rich clusters and galaxies; and (6) a plane for Galactic globular clusters. We propose the term "cosmic metaplane" to describe this ensemble of interrelated and interconnected fundamental planes. The projection Κ1-Κ3 (M/L vs M) views all planes essentially edge-on. Planes share the common characteristic that M/L is either constant or increasing with mass. The Κ1-Κ2 projection views all of these planes close to face-on, while Κ2-Κ3 shows variable slopes for different groups owing to the slightly different tilts of the individual planes. The Tully-Fisher relation is the correct compromise projection to view the spiral-irregular planes nearly edge on, analogous to the Dπ-σ relation for DRGs. No stellar system yet violates the rule first found from the study of DRGs, namely, Κ1+Κ2 constant, here chosen to be 8. In physical terms, this says that the maximum global luminosity density of stellar systems varies as M-4/3. Galaxies march away from this "zone of exclusion" (ZOE) in Κ12 as a function of Hubble type: DRGs are closest, with Sm-Irr's being furthest away. The distribution of systems in Κ-space is generally consistent with predictions of galaxy formation via hierarchical clustering and merging. The cosmic metaplane is simply the cosmic virial plane common to all self-gravitating stellar systems, tilted and displaced in mass-to-light ratio for various types of systems due to differences in stellar population and amount of baryonic dissipation. Hierarchical clustering from an n =-1.8 power-law density fluctuation spectrum (plus dissipation) comes close to reproducing the slope of the ZOE, and the progressive displacement of Hubble types from this line is consistent with the formation of early-type galaxies from higher n-σ fluctuations than late Hubble types. The M/L values for galaxy groups containing only a few, mostly spiral galaxies, vary the strongest with M. Moreover, it is these groups that bridge the gap between the two planes defined by the brightest galaxies and the lowest mass rich clusters, giving the cosmic metaplane its striking appearance. Why this is so is but one of four key questions raised by our study. The second question is why the slopes of individual Hubble types in the Κ1-Κ2 lie plane parallel the ZOE. At face value, this appears to suggest less dissipation of massive galaxies within their dark halos compared to lower-mass galaxies of the same Hubble type. The third is why we find isotropic stellar systems only within an effective mass range of 109.5-11.75 Msun. This would seem to imply that dissipation only results in galaxy components flattened by rotation in a limited mass range. The fourth question, perhaps the most basic of all, is how does M/L vary so smoothly with M among all stellar systems so as to give the individual tilts of the various fundamental planes, yet preserve the overall appearance of a metaplane? The answer to this last question must await a more thorough knowledge of how galaxies relate to many parameters, including: their environment, structure, angular momentum acquisition, density, dark matter concentration, the physics of star formation in general, and the formation of the initial mass function in particular. The present investigation is limited by existing data to the B passband and is strongly magnitude-limited, not volume-limited. Rare or hard-to-discover galaxy types, such as R II galaxies, starburst galaxies and low-surface-brightness galaxies, are missing or are under-represented, and use of the B band over-emphasizes stellar population differences. A volume-limited Κ-space survey based on Κ-band photometry and complete to low surface brightness and faint magnitudes is highly desirable but requires data yet to be obtained.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
MPIGeneNet: Parallel Calculation of Gene Co-Expression Networks on Multicore Clusters.

PubMed

Gonzalez-Dominguez, Jorge; Martin, Maria J

2017-10-10

In this work we present MPIGeneNet, a parallel tool that applies Pearson's correlation and Random Matrix Theory to construct gene co-expression networks. It is based on the state-of-the-art sequential tool RMTGeneNet, which provides networks with high robustness and sensitivity at the expenses of relatively long runtimes for large scale input datasets. MPIGeneNet returns the same results as RMTGeneNet but improves the memory management, reduces the I/O cost, and accelerates the two most computationally demanding steps of co-expression network construction by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on two different systems using three typical input datasets shows that MPIGeneNet is significantly faster than RMTGeneNet. As an example, our tool is up to 175.41 times faster on a cluster with eight nodes, each one containing two 12-core Intel Haswell processors. Source code of MPIGeneNet, as well as a reference manual, are available at https://sourceforge.net/projects/mpigenenet/.
Group hypnosis vs. relaxation for smoking cessation in adults: a cluster-randomised controlled trial

PubMed Central

2013-01-01

Background Despite the popularity of hypnotherapy for smoking cessation, the efficacy of this method is unclear. We aimed to investigate the efficacy of a single-session of group hypnotherapy for smoking cessation compared to relaxation in Swiss adult smokers. Methods This was a cluster-randomised, parallel-group, controlled trial. A single session of hypnosis or relaxation for smoking cessation was delivered to groups of smokers (median size = 11). Participants were 223 smokers consuming ≥ 5 cigarettes per day, willing to quit and not using cessation aids (47.1% females, M = 37.5 years [SD = 11.8], 86.1% Swiss). Nicotine withdrawal, smoking abstinence self-efficacy, and adverse reactions were assessed at a 2-week follow-up. The main outcome, self-reported 30-day point prevalence of smoking abstinence, was assessed at a 6-month follow up. Abstinence was validated through salivary analysis. Secondary outcomes included number of cigarettes smoked per day, smoking abstinence self-efficacy, and nicotine withdrawal. Results At the 6-month follow up, 14.7% in the hypnosis group and 17.8% in the relaxation group were abstinent. The intervention had no effect on smoking status (p = .73) or on the number of cigarettes smoked per day (p = .56). Smoking abstinence self-efficacy did not differ between the interventions (p = .14) at the 2-week follow-up, but non-smokers in the hypnosis group experienced reduced withdrawal (p = .02). Both interventions produced few adverse reactions (p = .81). Conclusions A single session of group hypnotherapy does not appear to be more effective for smoking cessation than a group relaxation session. Trial registration Current Controlled Trials ISRCTN72839675. PMID:24365274
Massively parallel implementations of coupled-cluster methods for electron spin resonance spectra. I. Isotropic hyperfine coupling tensors in large radicals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Verma, Prakash; Morales, Jorge A., E-mail: jorge.morales@ttu.edu; Perera, Ajith

2013-11-07

Coupled cluster (CC) methods provide highly accurate predictions of molecular properties, but their high computational cost has precluded their routine application to large systems. Fortunately, recent computational developments in the ACES III program by the Bartlett group [the OED/ERD atomic integral package, the super instruction processor, and the super instruction architecture language] permit overcoming that limitation by providing a framework for massively parallel CC implementations. In that scheme, we are further extending those parallel CC efforts to systematically predict the three main electron spin resonance (ESR) tensors (A-, g-, and D-tensors) to be reported in a series of papers. Inmore » this paper inaugurating that series, we report our new ACES III parallel capabilities that calculate isotropic hyperfine coupling constants in 38 neutral, cationic, and anionic radicals that include the {sup 11}B, {sup 17}O, {sup 9}Be, {sup 19}F, {sup 1}H, {sup 13}C, {sup 35}Cl, {sup 33}S,{sup 14}N, {sup 31}P, and {sup 67}Zn nuclei. Present parallel calculations are conducted at the Hartree-Fock (HF), second-order many-body perturbation theory [MBPT(2)], CC singles and doubles (CCSD), and CCSD with perturbative triples [CCSD(T)] levels using Roos augmented double- and triple-zeta atomic natural orbitals basis sets. HF results consistently overestimate isotropic hyperfine coupling constants. However, inclusion of electron correlation effects in the simplest way via MBPT(2) provides significant improvements in the predictions, but not without occasional failures. In contrast, CCSD results are consistently in very good agreement with experimental results. Inclusion of perturbative triples to CCSD via CCSD(T) leads to small improvements in the predictions, which might not compensate for the extra computational effort at a non-iterative N{sup 7}-scaling in CCSD(T). The importance of these accurate computations of isotropic hyperfine coupling constants to elucidate experimental ESR spectra, to interpret spin-density distributions, and to characterize and identify radical species is illustrated with our results from large organic radicals. Those include species relevant for organic chemistry, petroleum industry, and biochemistry, such as the cyclo-hexyl, 1-adamatyl, and Zn-porphycene anion radicals, inter alia.« less

Trace-Driven Debugging of Message Passing Programs

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

1998-01-01

In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Ion-Stockmayer clusters: Minima, classical thermodynamics, and variational ground state estimates of Li{sup +}(CH{sub 3}NO{sub 2}){sub n} (n = 1–20)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curotto, E., E-mail: curotto@arcadia.edu

2015-12-07

Structural optimizations, classical NVT ensemble, and variational Monte Carlo simulations of ion Stockmayer clusters parameterized to approximate the Li{sup +}(CH{sub 3}NO{sub 2}){sub n} (n = 1–20) systems are performed. The Metropolis algorithm enhanced by the parallel tempering strategy is used to measure internal energies and heat capacities, and a parallel version of the genetic algorithm is employed to obtain the most important minima. The first solvation sheath is octahedral and this feature remains the dominant theme in the structure of clusters with n ≥ 6. The first “magic number” is identified using the adiabatic solvent dissociation energy, and it marksmore » the completion of the second solvation layer for the lithium ion-nitromethane clusters. It corresponds to the n = 18 system, a solvated ion with the first sheath having octahedral symmetry, weakly bound to an eight-membered and a four-membered ring crowning a vertex of the octahedron. Variational Monte Carlo estimates of the adiabatic solvent dissociation energy reveal that quantum effects further enhance the stability of the n = 18 system relative to its neighbors.« less
Implementation of MPEG-2 encoder to multiprocessor system using multiple MVPs (TMS320C80)

NASA Astrophysics Data System (ADS)

Kim, HyungSun; Boo, Kenny; Chung, SeokWoo; Choi, Geon Y.; Lee, YongJin; Jeon, JaeHo; Park, Hyun Wook

1997-05-01

This paper presents the efficient algorithm mapping for the real-time MPEG-2 encoding on the KAIST image computing system (KICS), which has a parallel architecture using five multimedia video processors (MVPs). The MVP is a general purpose digital signal processor (DSP) of Texas Instrument. It combines one floating-point processor and four fixed- point DSPs on a single chip. The KICS uses the MVP as a primary processing element (PE). Two PEs form a cluster, and there are two processing clusters in the KICS. Real-time MPEG-2 encoder is implemented through the spatial and the functional partitioning strategies. Encoding process of spatially partitioned half of the video input frame is assigned to ne processing cluster. Two PEs perform the functionally partitioned MPEG-2 encoding tasks in the pipelined operation mode. One PE of a cluster carries out the transform coding part and the other performs the predictive coding part of the MPEG-2 encoding algorithm. One MVP among five MVPs is used for system control and interface with host computer. This paper introduces an implementation of the MPEG-2 algorithm with a parallel processing architecture.
How to cluster in parallel with neural networks

NASA Technical Reports Server (NTRS)

Kamgar-Parsi, Behzad; Gualtieri, J. A.; Devaney, Judy E.; Kamgar-Parsi, Behrooz

1988-01-01

Partitioning a set of N patterns in a d-dimensional metric space into K clusters - in a way that those in a given cluster are more similar to each other than the rest - is a problem of interest in astrophysics, image analysis and other fields. As there are approximately K(N)/K (factorial) possible ways of partitioning the patterns among K clusters, finding the best solution is beyond exhaustive search when N is large. Researchers show that this problem can be formulated as an optimization problem for which very good, but not necessarily optimal solutions can be found by using a neural network. To do this the network must start from many randomly selected initial states. The network is simulated on the MPP (a 128 x 128 SIMD array machine), where researchers use the massive parallelism not only in solving the differential equations that govern the evolution of the network, but also by starting the network from many initial states at once, thus obtaining many solutions in one run. Researchers obtain speedups of two to three orders of magnitude over serial implementations and the promise through Analog VLSI implementations of speedups comensurate with human perceptual abilities.
Sample size calculations for cluster randomised crossover trials in Australian and New Zealand intensive care research.

PubMed

Arnup, Sarah J; McKenzie, Joanne E; Pilcher, David; Bellomo, Rinaldo; Forbes, Andrew B

2018-06-01

The cluster randomised crossover (CRXO) design provides an opportunity to conduct randomised controlled trials to evaluate low risk interventions in the intensive care setting. Our aim is to provide a tutorial on how to perform a sample size calculation for a CRXO trial, focusing on the meaning of the elements required for the calculations, with application to intensive care trials. We use all-cause in-hospital mortality from the Australian and New Zealand Intensive Care Society Adult Patient Database clinical registry to illustrate the sample size calculations. We show sample size calculations for a two-intervention, two 12-month period, cross-sectional CRXO trial. We provide the formulae, and examples of their use, to determine the number of intensive care units required to detect a risk ratio (RR) with a designated level of power between two interventions for trials in which the elements required for sample size calculations remain constant across all ICUs (unstratified design); and in which there are distinct groups (strata) of ICUs that differ importantly in the elements required for sample size calculations (stratified design). The CRXO design markedly reduces the sample size requirement compared with the parallel-group, cluster randomised design for the example cases. The stratified design further reduces the sample size requirement compared with the unstratified design. The CRXO design enables the evaluation of routinely used interventions that can bring about small, but important, improvements in patient care in the intensive care setting.
Geology of the Brick Flat massive sulfide body, Iron Mountain cluster, West Shasta district, California ( USA).

USGS Publications Warehouse

Albers, J.P.

1985-01-01

The Brick Flat massive sulfide body is one of a group of 8 individual bodies that constitute the Iron Mountain cluster in the S part of the West Shasta district. Before they were separated by postmineral faulting, 5 of the 8 sulfide bodies formed a single large deposit about 1375 m long with a mass of some 23 million metric tons. The pyritic Brick Flat sulfide body is one of the 5 faulted segements of this deposit. The Brick Flat massive sulfide lies within medium phenocryst rhyolite that is characteristic of the ore-bearing middle unit of the Balaklala Rhyolite. It is interpreted to be downfaulted a vertical distance of 75 to 85 m from the Old Mine sulfide-gossan orebody along the N-dipping Camden South fault. It is bounded in turn on its N side by another parallel fault, the Camden North, which drops the orebody down another 75 m to the level of the Richmond orebody. -from Author
A comparison of queueing, cluster and distributed computing systems

NASA Technical Reports Server (NTRS)

Kaplan, Joseph A.; Nelson, Michael L.

1993-01-01

Using workstation clusters for distributed computing has become popular with the proliferation of inexpensive, powerful workstations. Workstation clusters offer both a cost effective alternative to batch processing and an easy entry into parallel computing. However, a number of workstations on a network does not constitute a cluster. Cluster management software is necessary to harness the collective computing power. A variety of cluster management and queuing systems are compared: Distributed Queueing Systems (DQS), Condor, Load Leveler, Load Balancer, Load Sharing Facility (LSF - formerly Utopia), Distributed Job Manager (DJM), Computing in Distributed Networked Environments (CODINE), and NQS/Exec. The systems differ in their design philosophy and implementation. Based on published reports on the different systems and conversations with the system's developers and vendors, a comparison of the systems are made on the integral issues of clustered computing.
Evaluating SPLASH-2 Applications Using MapReduce

NASA Astrophysics Data System (ADS)

Zhu, Shengkai; Xiao, Zhiwei; Chen, Haibo; Chen, Rong; Zhang, Weihua; Zang, Binyu

MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.
Scalable computing for evolutionary genomics.

PubMed

Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

2012-01-01

Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
PlantTribes: a gene and gene family resource for comparative genomics in plants

PubMed Central

Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.

2008-01-01

The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study. PMID:18073194
High-Performance, Multi-Node File Copies and Checksums for Clustered File Systems

NASA Technical Reports Server (NTRS)

Kolano, Paul Z.; Ciotti, Robert B.

2012-01-01

Modern parallel file systems achieve high performance using a variety of techniques, such as striping files across multiple disks to increase aggregate I/O bandwidth and spreading disks across multiple servers to increase aggregate interconnect bandwidth. To achieve peak performance from such systems, it is typically necessary to utilize multiple concurrent readers/writers from multiple systems to overcome various singlesystem limitations, such as number of processors and network bandwidth. The standard cp and md5sum tools of GNU coreutils found on every modern Unix/Linux system, however, utilize a single execution thread on a single CPU core of a single system, and hence cannot take full advantage of the increased performance of clustered file systems. Mcp and msum are drop-in replacements for the standard cp and md5sum programs that utilize multiple types of parallelism and other optimizations to achieve maximum copy and checksum performance on clustered file systems. Multi-threading is used to ensure that nodes are kept as busy as possible. Read/write parallelism allows individual operations of a single copy to be overlapped using asynchronous I/O. Multinode cooperation allows different nodes to take part in the same copy/checksum. Split-file processing allows multiple threads to operate concurrently on the same file. Finally, hash trees allow inherently serial checksums to be performed in parallel. Mcp and msum provide significant performance improvements over standard cp and md5sum using multiple types of parallelism and other optimizations. The total speed-ups from all improvements are significant. Mcp improves cp performance over 27x, msum improves md5sum performance almost 19x, and the combination of mcp and msum improves verified copies via cp and md5sum by almost 22x. These improvements come in the form of drop-in replacements for cp and md5sum, so are easily used and are available for download as open source software at http://mutil.sourceforge.net.
Algorithms and software used in selecting structure of machine-training cluster based on neurocomputers

NASA Astrophysics Data System (ADS)

Romanchuk, V. A.; Lukashenko, V. V.

2018-05-01

The technique of functioning of a control system by a computing cluster based on neurocomputers is proposed. Particular attention is paid to the method of choosing the structure of the computing cluster due to the fact that the existing methods are not effective because of a specialized hardware base - neurocomputers, which are highly parallel computer devices with an architecture different from the von Neumann architecture. A developed algorithm for choosing the computational structure of a cloud cluster is described, starting from the direction of data transfer in the flow control graph of the program and its adjacency matrix.
Chromatin organization and global regulation of Hox gene clusters

PubMed Central

Montavon, Thomas; Duboule, Denis

2013-01-01

During development, a properly coordinated expression of Hox genes, within their different genomic clusters is critical for patterning the body plans of many animals with a bilateral symmetry. The fascinating correspondence between the topological organization of Hox clusters and their transcriptional activation in space and time has served as a paradigm for understanding the relationships between genome structure and function. Here, we review some recent observations, which revealed highly dynamic changes in the structure of chromatin at Hox clusters, in parallel with their activation during embryonic development. We discuss the relevance of these findings for our understanding of large-scale gene regulation. PMID:23650639
RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

PubMed Central

Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan

2017-01-01

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325
HeinzelCluster: accelerated reconstruction for FORE and OSEM3D.

PubMed

Vollmar, S; Michel, C; Treffert, J T; Newport, D F; Casey, M; Knöss, C; Wienhard, K; Liu, X; Defrise, M; Heiss, W D

2002-08-07

Using iterative three-dimensional (3D) reconstruction techniques for reconstruction of positron emission tomography (PET) is not feasible on most single-processor machines due to the excessive computing time needed, especially so for the large sinogram sizes of our high-resolution research tomograph (HRRT). In our first approach to speed up reconstruction time we transform the 3D scan into the format of a two-dimensional (2D) scan with sinograms that can be reconstructed independently using Fourier rebinning (FORE) and a fast 2D reconstruction method. On our dedicated reconstruction cluster (seven four-processor systems, Intel PIII@700 MHz, switched fast ethernet and Myrinet, Windows NT Server), we process these 2D sinograms in parallel. We have achieved a speedup > 23 using 26 processors and also compared results for different communication methods (RPC, Syngo, Myrinet GM). The other approach is to parallelize OSEM3D (implementation of C Michel), which has produced the best results for HRRT data so far and is more suitable for an adequate treatment of the sinogram gaps that result from the detector geometry of the HRRT. We have implemented two levels of parallelization for four dedicated cluster (a shared memory fine-grain level on each node utilizing all four processors and a coarse-grain level allowing for 15 nodes) reducing the time for one core iteration from over 7 h to about 35 min.
RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization.

PubMed

Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan

2017-08-04

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.
Simple Linux Utility for Resource Management

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jette, M.

2009-09-09

SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small computer clusters. As a cluster resource manager, SLURM has three key functions. First, it allocates exclusive and/or non exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allciated nodes. Finally, it arbitrates conflicting requests for resouces by managing a queue of pending work.
Parallel definition of tear film maps on distributed-memory clusters for the support of dry eye diagnosis.

PubMed

González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J

2017-02-01

The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
High order parallel numerical schemes for solving incompressible flows

NASA Technical Reports Server (NTRS)

Lin, Avi; Milner, Edward J.; Liou, May-Fun; Belch, Richard A.

1992-01-01

The use of parallel computers for numerically solving flow fields has gained much importance in recent years. This paper introduces a new high order numerical scheme for computational fluid dynamics (CFD) specifically designed for parallel computational environments. A distributed MIMD system gives the flexibility of treating different elements of the governing equations with totally different numerical schemes in different regions of the flow field. The parallel decomposition of the governing operator to be solved is the primary parallel split. The primary parallel split was studied using a hypercube like architecture having clusters of shared memory processors at each node. The approach is demonstrated using examples of simple steady state incompressible flows. Future studies should investigate the secondary split because, depending on the numerical scheme that each of the processors applies and the nature of the flow in the specific subdomain, it may be possible for a processor to seek better, or higher order, schemes for its particular subcase.
Template based parallel checkpointing in a massively parallel computer system

DOEpatents

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

Astrophysical data mining with GPU. A case study: Genetic classification of globular clusters

NASA Astrophysics Data System (ADS)

Cavuoti, S.; Garofalo, M.; Brescia, M.; Paolillo, M.; Pescape', A.; Longo, G.; Ventre, G.

2014-01-01

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU/CUDA parallel computing technology. The model was derived from our CPU serial implementation, named GAME (Genetic Algorithm Model Experiment). It was successfully tested and validated on the detection of candidate Globular Clusters in deep, wide-field, single band HST images. The GPU version of GAME will be made available to the community by integrating it into the web application DAMEWARE (DAta Mining Web Application REsource, http://dame.dsf.unina.it/beta_info.html), a public data mining service specialized on massive astrophysical data. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm leads to a speedup of a factor of 200× in the training phase with respect to the CPU based version.
Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram

2013-04-09

A novel parallel algorithm for non-iterative multireference coupled cluster (MRCC) theories, which merges recently introduced reference-level parallelism (RLP) [K. Bhaskaran-Nair, J.Brabec, E. Aprà, H.J.J. van Dam, J. Pittner, K. Kowalski, J. Chem. Phys. 137, 094112 (2012)] with the possibility of accelerating numerical calculations using graphics processing unit (GPU) is presented. We discuss the performance of this algorithm on the example of the MRCCSD(T) method (iterative singles and doubles and perturbative triples), where the corrections due to triples are added to the diagonal elements of the MRCCSD (iterative singles and doubles) effective Hamiltonian matrix. The performance of the combined RLP/GPU algorithmmore » is illustrated on the example of the Brillouin-Wigner (BW) and Mukherjee (Mk) state-specific MRCCSD(T) formulations.« less
HFF-DeepSpace Photometric Catalogs of the 12 Hubble Frontier Fields, Clusters, and Parallels: Photometry, Photometric Redshifts, and Stellar Masses

NASA Astrophysics Data System (ADS)

Shipley, Heath V.; Lange-Vagle, Daniel; Marchesini, Danilo; Brammer, Gabriel B.; Ferrarese, Laura; Stefanon, Mauro; Kado-Fong, Erin; Whitaker, Katherine E.; Oesch, Pascal A.; Feinstein, Adina D.; Labbé, Ivo; Lundgren, Britt; Martis, Nicholas; Muzzin, Adam; Nedkova, Kalina; Skelton, Rosalind; van der Wel, Arjen

2018-03-01

We present Hubble multi-wavelength photometric catalogs, including (up to) 17 filters with the Advanced Camera for Surveys and Wide Field Camera 3 from the ultra-violet to near-infrared for the Hubble Frontier Fields and associated parallels. We have constructed homogeneous photometric catalogs for all six clusters and their parallels. To further expand these data catalogs, we have added ultra-deep K S -band imaging at 2.2 μm from the Very Large Telescope HAWK-I and Keck-I MOSFIRE instruments. We also add post-cryogenic Spitzer imaging at 3.6 and 4.5 μm with the Infrared Array Camera (IRAC), as well as archival IRAC 5.8 and 8.0 μm imaging when available. We introduce the public release of the multi-wavelength (0.2–8 μm) photometric catalogs, and we describe the unique steps applied for the construction of these catalogs. Particular emphasis is given to the source detection band, the contamination of light from the bright cluster galaxies (bCGs), and intra-cluster light (ICL). In addition to the photometric catalogs, we provide catalogs of photometric redshifts and stellar population properties. Furthermore, this includes all the images used in the construction of the catalogs, including the combined models of bCGs and ICL, the residual images, segmentation maps, and more. These catalogs are a robust data set of the Hubble Frontier Fields and will be an important aid in designing future surveys, as well as planning follow-up programs with current and future observatories to answer key questions remaining about first light, reionization, the assembly of galaxies, and many more topics, most notably by identifying high-redshift sources to target.
StrAuto: automation and parallelization of STRUCTURE analysis.

PubMed

Chhatre, Vikram E; Emerson, Kevin J

2017-03-24

Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation - a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org .
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less
Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models.

PubMed

Ferrucci, Filomena; Salza, Pasquale; Sarro, Federica

2017-06-29

The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs' computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.
UV-light-driven prebiotic synthesis of iron-sulfur clusters

NASA Astrophysics Data System (ADS)

Bonfio, Claudia; Valer, Luca; Scintilla, Simone; Shah, Sachin; Evans, David J.; Jin, Lin; Szostak, Jack W.; Sasselov, Dimitar D.; Sutherland, John D.; Mansy, Sheref S.

2017-12-01

Iron-sulfur clusters are ancient cofactors that play a fundamental role in metabolism and may have impacted the prebiotic chemistry that led to life. However, it is unclear whether iron-sulfur clusters could have been synthesized on prebiotic Earth. Dissolved iron on early Earth was predominantly in the reduced ferrous state, but ferrous ions alone cannot form polynuclear iron-sulfur clusters. Similarly, free sulfide may not have been readily available. Here we show that UV light drives the synthesis of [2Fe-2S] and [4Fe-4S] clusters through the photooxidation of ferrous ions and the photolysis of organic thiols. Iron-sulfur clusters coordinate to and are stabilized by a wide range of cysteine-containing peptides and the assembly of iron-sulfur cluster-peptide complexes can take place within model protocells in a process that parallels extant pathways. Our experiments suggest that iron-sulfur clusters may have formed easily on early Earth, facilitating the emergence of an iron-sulfur-cluster-dependent metabolism.
The quality of reporting in cluster randomised crossover trials: proposal for reporting items and an assessment of reporting quality.

PubMed

Arnup, Sarah J; Forbes, Andrew B; Kahan, Brennan C; Morgan, Katy E; McKenzie, Joanne E

2016-12-06

The cluster randomised crossover (CRXO) design is gaining popularity in trial settings where individual randomisation or parallel group cluster randomisation is not feasible or practical. Our aim is to stimulate discussion on the content of a reporting guideline for CRXO trials and to assess the reporting quality of published CRXO trials. We undertook a systematic review of CRXO trials. Searches of MEDLINE, EMBASE, and CINAHL Plus as well as citation searches of CRXO methodological articles were conducted to December 2014. Reporting quality was assessed against both modified items from 2010 CONSORT and 2012 cluster trials extension and other proposed quality measures. Of the 3425 records identified through database searching, 83 trials met the inclusion criteria. Trials were infrequently identified as "cluster randomis(z)ed crossover" in title (n = 7, 8%) or abstract (n = 21, 25%), and a rationale for the design was infrequently provided (n = 20, 24%). Design parameters such as the number of clusters and number of periods were well reported. Discussion of carryover took place in only 17 trials (20%). Sample size methods were only reported in 58% (n = 48) of trials. A range of approaches were used to report baseline characteristics. The analysis method was not adequately reported in 23% (n = 19) of trials. The observed within-cluster within-period intracluster correlation and within-cluster between-period intracluster correlation for the primary outcome data were not reported in any trial. The potential for selection, performance, and detection bias could be evaluated in 30%, 81%, and 70% of trials, respectively. There is a clear need to improve the quality of reporting in CRXO trials. Given the unique features of a CRXO trial, it is important to develop a CONSORT extension. Consensus amongst trialists on the content of such a guideline is essential.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2015-11-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Parallel evolution of image processing tools for multispectral imagery

NASA Astrophysics Data System (ADS)

Harvey, Neal R.; Brumby, Steven P.; Perkins, Simon J.; Porter, Reid B.; Theiler, James P.; Young, Aaron C.; Szymanski, John J.; Bloch, Jeffrey J.

2000-11-01

We describe the implementation and performance of a parallel, hybrid evolutionary-algorithm-based system, which optimizes image processing tools for feature-finding tasks in multi-spectral imagery (MSI) data sets. Our system uses an integrated spatio-spectral approach and is capable of combining suitably-registered data from different sensors. We investigate the speed-up obtained by parallelization of the evolutionary process via multiple processors (a workstation cluster) and develop a model for prediction of run-times for different numbers of processors. We demonstrate our system on Landsat Thematic Mapper MSI , covering the recent Cerro Grande fire at Los Alamos, NM, USA.
Ant-like task allocation and recruitment in cooperative robots

NASA Astrophysics Data System (ADS)

Krieger, Michael J. B.; Billeter, Jean-Bernard; Keller, Laurent

2000-08-01

One of the greatest challenges in robotics is to create machines that are able to interact with unpredictable environments in real time. A possible solution may be to use swarms of robots behaving in a self-organized manner, similar to workers in an ant colony. Efficient mechanisms of division of labour, in particular series-parallel operation and transfer of information among group members, are key components of the tremendous ecological success of ants. Here we show that the general principles regulating division of labour in ant colonies indeed allow the design of flexible, robust and effective robotic systems. Groups of robots using ant-inspired algorithms of decentralized control techniques foraged more efficiently and maintained higher levels of group energy than single robots. But the benefits of group living decreased in larger groups, most probably because of interference during foraging. Intriguingly, a similar relationship between group size and efficiency has been documented in social insects. Moreover, when food items were clustered, groups where robots could recruit other robots in an ant-like manner were more efficient than groups without information transfer, suggesting that group dynamics of swarms of robots may follow rules similar to those governing social insects.
Reactions of Metal-Metal Multiple Bonds. 14. Synthesis and Characterization of Triangulo-W3 and Mo2W-oxo Capped Alkoxide Clusters. Comproportionation of M-M Triple Bonds, sigma(2)pi(4) and d(o) Metal-oxo Groups: M Triple Bond M + M Triple Bond O Yields M3(micron 3-O).

DTIC Science & Technology

1984-05-02

the syntheses of dinuclear and trinuclear complexes employing metal -alkylidyne or -alkylidene fragments.8 Reaction 1 also has a parallel with the...1 0 which was previously examined. The mixed metal complex is undoubtedly disordered with respect to the disposition of molybdenum and tungsten atoms...than for the analogous Mo3 complex suggests greater metal - metal overlap and possibly stronger bonding interactions in the W3 complex which would not
Unique Study Designs in Nephrology: N-of-1 Trials and Other Designs.

PubMed

Samuel, Joyce P; Bell, Cynthia S

2016-11-01

Alternatives to the traditional parallel-group trial design may be required to answer clinical questions in special populations, rare conditions, or with limited resources. N-of-1 trials are a unique trial design which can inform personalized evidence-based decisions for the patient when data from traditional clinical trials are lacking or not generalizable. A concise overview of factorial design, cluster randomization, adaptive designs, crossover studies, and n-of-1 trials will be provided along with pertinent examples in nephrology. The indication for analysis strategies such as equivalence and noninferiority trials will be discussed, as well as analytic pitfalls. Copyright © 2016 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
To Support Research Activities Under the NASA Experimental Program to Stimulate Competitive Research

NASA Technical Reports Server (NTRS)

Gregory, John C.

2003-01-01

The Alabama NASA EPSCoR Program is a collaborative venture of The Alabama Space Grant Consortium, The Alabama EPSCoR, and faculty and staff at 10 Alabama colleges and universities as well as the Alabama School of Math and Science in Mobile. There are two Research Clusters which include infrastructure-building and outreach elements embedded in their research activities. Each of the two Research Clusters is in an area of clear and demonstrable relevance to NASA's mission, to components of other Alabama EPSCoR projects, and to the State of Alabama's economic development. This Final Report summarizes and reports upon those additional activities occurring after the first report was submitted in March 2000 (included here as Appendix C). Since the nature of the activities and the manner in which they relate to one another differ by cluster, these clusters function independently and are summarized in parallel in this report. They do share a common administration by the Alabama Space Grant Consortium (ASGC) and by this means, good ideas from each group were communicated to the other, as appropriate. During the past year these research teams, involving 15 scientists, 16 graduate students, 16 undergraduates, and 7 high school students involving 10 Alabama universities had 14 peer reviewed scientific journal articles published, 21 others reviewed for publication or published in proceedings, gave 7 formal presentations and numerous informal presentations to well over 3000 people, received 3 patents and were awarded 14 research proposals for more than $213K dollars in additional research related to these investigations. Each cluster's activities are described and an Appendix summarizes these achievements.
Solving Coupled Gross--Pitaevskii Equations on a Cluster of PlayStation 3 Computers

NASA Astrophysics Data System (ADS)

Edwards, Mark; Heward, Jeffrey; Clark, C. W.

2009-05-01

At Georgia Southern University we have constructed an 8+1--node cluster of Sony PlayStation 3 (PS3) computers with the intention of using this computing resource to solve problems related to the behavior of ultra--cold atoms in general with a particular emphasis on studying bose--bose and bose--fermi mixtures confined in optical lattices. As a first project that uses this computing resource, we have implemented a parallel solver of the coupled time--dependent, one--dimensional Gross--Pitaevskii (TDGP) equations. These equations govern the behavior of dual-- species bosonic mixtures. We chose the split--operator/FFT to solve the coupled 1D TDGP equations. The fast Fourier transform component of this solver can be readily parallelized on the PS3 cpu known as the Cell Broadband Engine (CellBE). Each CellBE chip contains a single 64--bit PowerPC Processor Element known as the PPE and eight ``Synergistic Processor Element'' identified as the SPE's. We report on this algorithm and compare its performance to a non--parallel solver as applied to modeling evaporative cooling in dual--species bosonic mixtures.
Reconstructing evolutionary trees in parallel for massive sequences.

PubMed

Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

2017-12-14

Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Using Clustering to Establish Climate Regimes from PCM Output

NASA Technical Reports Server (NTRS)

Oglesby, Robert; Arnold, James E. (Technical Monitor); Hoffman, Forrest; Hargrove, W. W.; Erickson, D.

2002-01-01

A multivariate statistical clustering technique--based on the k-means algorithm of Hartigan has been used to extract patterns of climatological significance from 200 years of general circulation model (GCM) output. Originally developed and implemented on a Beowulf-style parallel computer constructed by Hoffman and Hargrove from surplus commodity desktop PCs, the high performance parallel clustering algorithm was previously applied to the derivation of ecoregions from map stacks of 9 and 25 geophysical conditions or variables for the conterminous U.S. at a resolution of 1 sq km. Now applied both across space and through time, the clustering technique yields temporally-varying climate regimes predicted by transient runs of the Parallel Climate Model (PCM). Using a business-as-usual (BAU) scenario and clustering four fields of significance to the global water cycle (surface temperature, precipitation, soil moisture, and snow depth) from 1871 through 2098, the authors' analysis shows an increase in spatial area occupied by the cluster or climate regime which typifies desert regions (i.e., an increase in desertification) and a decrease in the spatial area occupied by the climate regime typifying winter-time high latitude perma-frost regions. The patterns of cluster changes have been analyzed to understand the predicted variability in the water cycle on global and continental scales. In addition, representative climate regimes were determined by taking three 10-year averages of the fields 100 years apart for northern hemisphere winter (December, January, and February) and summer (June, July, and August). The result is global maps of typical seasonal climate regimes for 100 years in the past, for the present, and for 100 years into the future. Using three-dimensional data or phase space representations of these climate regimes (i.e., the cluster centroids), the authors demonstrate the portion of this phase space occupied by the land surface at all points in space and time. Any single spot on the globe will exist in one of these climate regimes at any single point in time. By incrementing time, that same spot will trace out a trajectory or orbit between and among these climate regimes (or atmospheric states) in phase (or state) space. When a geographic region enters a state it never previously visited, a climatic change is said to have occurred. Tracing out the entire trajectory of a single spot on the globe yields a 'manifold' in state space representing the shape of its predicted climate occupancy. This sort of analysis enables a researcher to more easily grasp the multivariate behavior of the climate system.
Experimental Program to Stimulate Competitive Research (EPSCoR)

NASA Technical Reports Server (NTRS)

Dingerson, Michael R.

1997-01-01

Report includes: (1) CLUSTER: "Studies in Macromolecular Behavior in Microgravity Environment": The Role of Protein Oligomers in Protein Crystallization; Phase Separation Phenomena in Microgravity; Traveling Front Polymerizations; Investigating Mechanisms Affecting Phase Transition Response and Changes in Thermal Transport Properties in ER-Fluids under Normal and Microgravity Conditions. (2) CLUSTER: "Computational/Parallel Processing Studies": Flows in Local Chemical Equilibrium; A Computational Method for Solving Very Large Problems; Modeling of Cavitating Flows.

High-throughput shadow mask printing of passive electrical components on paper by supersonic cluster beam deposition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Caruso, Francesco; Bellacicca, Andrea; Milani, Paolo, E-mail: pmilani@mi.infn.it

We report the rapid prototyping of passive electrical components (resistors and capacitors) on plain paper by an additive and parallel technology consisting of supersonic cluster beam deposition (SCBD) coupled with shadow mask printing. Cluster-assembled films have a growth mechanism substantially different from that of atom-assembled ones providing the possibility of a fine tuning of their electrical conduction properties around the percolative conduction threshold. Exploiting the precise control on cluster beam intensity and shape typical of SCBD, we produced, in a one-step process, batches of resistors with resistance values spanning a range of two orders of magnitude. Parallel plate capacitors withmore » paper as the dielectric medium were also produced with capacitance in the range of tens of picofarads. Compared to standard deposition technologies, SCBD allows for a very efficient use of raw materials and the rapid production of components with different shape and dimensions while controlling independently the electrical characteristics. Discrete electrical components produced by SCBD are very robust against deformation and bending, and they can be easily assembled to build circuits with desired characteristics. The availability of large batches of these components enables the rapid and cheap prototyping and integration of electrical components on paper as building blocks of more complex systems.« less
schwimmbad: A uniform interface to parallel processing pools in Python

NASA Astrophysics Data System (ADS)

Price-Whelan, Adrian M.; Foreman-Mackey, Daniel

2017-09-01

Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce

NASA Astrophysics Data System (ADS)

Chen, Xi; Zhou, Liqing

2015-12-01

With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Flood predictions using the parallel version of distributed numerical physical rainfall-runoff model TOPKAPI

NASA Astrophysics Data System (ADS)

Boyko, Oleksiy; Zheleznyak, Mark

2015-04-01

The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Hospital integrated parallel cluster for fast and cost-efficient image analysis: clinical experience and research evaluation

NASA Astrophysics Data System (ADS)

Erberich, Stephan G.; Hoppe, Martin; Jansen, Christian; Schmidt, Thomas; Thron, Armin; Oberschelp, Walter

2001-08-01

In the last few years more and more University Hospitals as well as private hospitals changed to digital information systems for patient record, diagnostic files and digital images. Not only that patient management becomes easier, it is also very remarkable how clinical research can profit from Picture Archiving and Communication Systems (PACS) and diagnostic databases, especially from image databases. Since images are available on the finger tip, difficulties arise when image data needs to be processed, e.g. segmented, classified or co-registered, which usually demands a lot computational power. Today's clinical environment does support PACS very well, but real image processing is still under-developed. The purpose of this paper is to introduce a parallel cluster of standard distributed systems and its software components and how such a system can be integrated into a hospital environment. To demonstrate the cluster technique we present our clinical experience with the crucial but cost-intensive motion correction of clinical routine and research functional MRI (fMRI) data, as it is processed in our Lab on a daily basis.
Enhancing the early home learning environment through a brief group parenting intervention: study protocol for a cluster randomised controlled trial.

PubMed

Nicholson, Jan M; Cann, Warren; Matthews, Jan; Berthelsen, Donna; Ukoumunne, Obioha C; Trajanovska, Misel; Bennetts, Shannon K; Hillgrove, Tessa; Hamilton, Victoria; Westrupp, Elizabeth; Hackworth, Naomi J

2016-06-02

The quality of the home learning environment has a significant influence on children's language and communication skills during the early years with children from disadvantaged families disproportionately affected. This paper describes the protocol and participant baseline characteristics of a community-based effectiveness study. It evaluates the effects of 'smalltalk', a brief group parenting intervention (with or without home coaching) on the quality of the early childhood home learning environment. The study comprises two cluster randomised controlled superiority trials (one for infants and one for toddlers) designed and conducted in parallel. In 20 local government areas (LGAs) in Victoria, Australia, six locations (clusters) were randomised to one of three conditions: standard care (control); smalltalk group-only program; or smalltalk plus (group program plus home coaching). Programs were delivered to parents experiencing socioeconomic disadvantage through two existing age-based services, the maternal and child health service (infant program, ages 6-12 months), and facilitated playgroups (toddler program, ages 12-36 months). Outcomes were assessed by parent report and direct observation at baseline (0 weeks), post-intervention (12 weeks) and follow-up (32 weeks). Primary outcomes were parent verbal responsivity and home activities with child at 32 weeks. Secondary outcomes included parenting confidence, parent wellbeing and children's communication, socio-emotional and general development skills. Analyses will use intention-to-treat random effects ("multilevel") models to account for clustering. Across the 20 LGAs, 986 parents of infants and 1200 parents of toddlers enrolled and completed baseline measures. Eighty four percent of families demonstrated one or more of the targeted risk factors for poor child development (low income; receives government benefits; single, socially isolated or young parent; culturally or linguistically diverse background). This study will provide unique data on the effectiveness of a brief group parenting intervention for enhancing the early home learning environment of young children from disadvantaged families. It will also provide evidence of the extent to which additional one-on-one support is required to achieve change and whether there are greater benefits when delivered in the 1st year of life or later. The program has been designed for scale-up across existing early childhood services if proven effective. 8 September 2011; ACTRN12611000965909 .
Infrared Spectra of M^+(2-AMINO-1-PHENYL ETHANOL)(H_2O)_{n=0-2}Ar (M=Na, K)

NASA Astrophysics Data System (ADS)

Nicely, Amy L.; Lisy, James M.

2009-06-01

A balance of competing electrostatic and hydrogen bonding interactions directs the structure of hydrated gas-phase cluster ions. Because of this, a biologically relevant model of cluster structures should include the effects of surrounding water molecules and metal ions such as sodium and potassium, which are found in high concentrations in the bloodstream. The molecule 2-amino-1-phenyl ethanol (APE) serves as a model for the neurotransmitters ephedrine and adrenaline. The neutral APE molecule contains an internal hydrogen bond between the amino and hydroxyl groups. In the M^+(APE) complex, the cation can either interrupt the internal hydrogen bond or position itself above the phenyl group, leaving the internal hydrogen bond intact. The former is preferred based on DFT calculations (B3LYP/6-31+G*) for both K^+ and Na^+ across the entire range from 0-400K, but infrared photodissociation (IRPD) spectra indicate a preference for the latter configuration at low temperatures. The IRPD spectra of M^+(H_2O)_{n=1-2} and M^+(H_2O)_{n=0-2}Ar (M=Na, K) will be presented along with parallel DFT and thermodynamics calculations to assist with the identification of the isomers present in each experiment.
Acute Whiplash Injury Study (AWIS): a protocol for a cluster randomised pilot and feasibility trial of an Active Behavioural Physiotherapy Intervention in an insurance private setting

PubMed Central

Wiangkham, Taweewat; Duda, Joan; Haque, M Sayeed; Price, Jonathan; Rushton, Alison

2016-01-01

Introduction Whiplash-associated disorder (WAD) causes substantial social and economic burden internationally. Up to 60% of patients with WAD progress to chronicity. Research therefore needs to focus on effective management in the acute stage to prevent the development of chronicity. Approximately 93% of patients are classified as WADII (neck complaint and musculoskeletal sign(s)), and in the UK, most are managed in the private sector. In our recent systematic review, a combination of active and behavioural physiotherapy was identified as potentially effective in the acute stage. An Active Behavioural Physiotherapy Intervention (ABPI) was developed through combining empirical (modified Delphi study) and theoretical (social cognitive theory focusing on self-efficacy) evidence. This pilot and feasibility trial has been designed to inform the design of an adequately powered definitive randomised controlled trial. Methods and analysis Two parallel phases. (1) An external pilot and feasibility cluster randomised double-blind (assessor and participants), parallel two-arm (ABPI vs standard physiotherapy) clinical trial to evaluate procedures and feasibility. Six UK private physiotherapy clinics will be recruited and cluster randomised by a computer-generated randomisation sequence. Sixty participants (30 each arm) will be assessed at recruitment (baseline) and at 3 months postbaseline. The planned primary outcome measure is the neck disability index. (2) An embedded exploratory qualitative study using semistructured indepth interviews (n=3–4 physiotherapists) and a focus group (n=6–8 patients) and entailing the recruitment of purposive samples will explore perceptions of the ABPI. Quantitative data will be analysed descriptively. Qualitative data will be coded and analysed deductively (identify themes) and inductively (identify additional themes). Ethics and dissemination This trial is approved by the University of Birmingham Ethics Committee (ERN_15-0542). Trial registration number ISRCTN84528320. PMID:27412105
Fast Whole-Engine Stirling Analysis

NASA Technical Reports Server (NTRS)

Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako

2006-01-01

This presentation discusses the simulation approach to whole-engine for physical consistency, REV regenerator modeling, grid layering for smoothness, and quality, conjugate heat transfer method adjustment, high-speed low cost parallel cluster, and debugging.
A scalable PC-based parallel computer for lattice QCD

NASA Astrophysics Data System (ADS)

Fodor, Z.; Katz, S. D.; Pappa, G.

2003-05-01

A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eo¨tvo¨s Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered (wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop.
A relational structure of voluntary visual-attention abilities

PubMed Central

Skogsberg, KatieAnn; Grabowecky, Marcia; Wilt, Joshua; Revelle, William; Iordanescu, Lucica; Suzuki, Satoru

2015-01-01

Many studies have examined attention mechanisms involved in specific behavioral tasks (e.g., search, tracking, distractor inhibition). However, relatively little is known about the relationships among those attention mechanisms. Is there a fundamental attention faculty that makes a person superior or inferior at most types of attention tasks, or do relatively independent processes mediate different attention skills? We focused on individual differences in voluntary visual-attention abilities using a battery of eleven representative tasks. An application of parallel analysis, hierarchical-cluster analysis, and multidimensional scaling to the inter-task correlation matrix revealed four functional clusters, representing spatiotemporal attention, global attention, transient attention, and sustained attention, organized along two dimensions, one contrasting spatiotemporal and global attention and the other contrasting transient and sustained attention. Comparison with the neuroscience literature suggests that the spatiotemporal-global dimension corresponds to the dorsal frontoparietal circuit and the transient-sustained dimension corresponds to the ventral frontoparietal circuit, with distinct sub-regions mediating the separate clusters within each dimension. We also obtained highly specific patterns of gender difference, and of deficits for college students with elevated ADHD traits. These group differences suggest that different mechanisms of voluntary visual attention can be selectively strengthened or weakened based on genetic, experiential, and/or pathological factors. PMID:25867505
Formation of carbon nanoclusters by implantation of keV carbon ions in fused silica followed by thermal annealing

NASA Astrophysics Data System (ADS)

Olivero, P.; Peng, J. L.; Liu, A.; Reichart, P.; McCallum, J. C.; Sze, J. Y.; Lau, S. P.; Tay, B. K.; Kalish, R.; Dhar, S.; Feldman, Leonard; Jamieson, David N.; Prawer, Steven

2005-02-01

In the last decade, the synthesis and characterization of nanometer sized carbon clusters have attracted growing interest within the scientific community. This is due to both scientific interest in the process of diamond nucleation and growth, and to the promising technological applications in nanoelectronics and quantum communications and computing. Our research group has demonstrated that MeV carbon ion implantation in fused silica followed by thermal annealing in the presence of hydrogen leads to the formation of nanocrystalline diamond, with cluster size ranging from 5 to 40 nm. In the present paper, we report the synthesis of carbon nanoclusters by the implantation into fused silica of keV carbon ions using the Plasma Immersion Ion Implantation (PIII) technique, followed by thermal annealing in forming gas (4% 2H in Ar). The present study is aimed at evaluating this implantation technique that has the advantage of allowing high fluence-rates on large substrates. The carbon nanostructures have been characterized with optical absorption and Raman spectroscopies, cross sectional Transmission Electron Microscopy (TEM), and Parallel Electron Energy Loss Spectroscopy (PEELS). Nuclear Reaction Analysis (NRA) has been employed to evaluate the deuterium incorporation during the annealing process, as a key mechanism to stabilize the formation of the clusters.
Bioinformatics algorithm based on a parallel implementation of a machine learning approach using transducers

NASA Astrophysics Data System (ADS)

Roche-Lima, Abiel; Thulasiram, Ruppa K.

2012-02-01

Finite automata, in which each transition is augmented with an output label in addition to the familiar input label, are considered finite-state transducers. Transducers have been used to analyze some fundamental issues in bioinformatics. Weighted finite-state transducers have been proposed to pairwise alignments of DNA and protein sequences; as well as to develop kernels for computational biology. Machine learning algorithms for conditional transducers have been implemented and used for DNA sequence analysis. Transducer learning algorithms are based on conditional probability computation. It is calculated by using techniques, such as pair-database creation, normalization (with Maximum-Likelihood normalization) and parameters optimization (with Expectation-Maximization - EM). These techniques are intrinsically costly for computation, even worse when are applied to bioinformatics, because the databases sizes are large. In this work, we describe a parallel implementation of an algorithm to learn conditional transducers using these techniques. The algorithm is oriented to bioinformatics applications, such as alignments, phylogenetic trees, and other genome evolution studies. Indeed, several experiences were developed using the parallel and sequential algorithm on Westgrid (specifically, on the Breeze cluster). As results, we obtain that our parallel algorithm is scalable, because execution times are reduced considerably when the data size parameter is increased. Another experience is developed by changing precision parameter. In this case, we obtain smaller execution times using the parallel algorithm. Finally, number of threads used to execute the parallel algorithm on the Breezy cluster is changed. In this last experience, we obtain as result that speedup is considerably increased when more threads are used; however there is a convergence for number of threads equal to or greater than 16.
A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

NASA Technical Reports Server (NTRS)

Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

2005-01-01

A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.
Clustering of velocities in a GPS network spanning the Sierra Nevada Block, the Northern Walker Lane Belt, and the Central Nevada Seismic Belt, California-Nevada

NASA Astrophysics Data System (ADS)

Savage, J. C.; Simpson, R. W.

2013-09-01

The deformation across the Sierra Nevada Block, the Walker Lane Belt, and the Central Nevada Seismic Belt (CNSB) between 38.5°N and 40.5°N has been analyzed by clustering GPS velocities to identify coherent blocks. Cluster analysis determines the number of clusters required and assigns the GPS stations to the proper clusters. The clusters are shown on a fault map by symbols located at the positions of the GPS stations, each symbol representing the cluster to which the velocity of that GPS station belongs. Fault systems that separate the clusters are readily identified on such a map. Four significant clusters are identified. Those clusters are strips separated by (from west to east) the Mohawk Valley-Genoa fault system, the Pyramid Lake-Wassuk fault system, and the Central Nevada Seismic Belt. The strain rates within the westernmost three clusters approximate simple right-lateral shear (~13 nstrain/a) across vertical planes roughly parallel to the cluster boundaries. Clustering does not recognize the longitudinal segmentation of the Walker Lane Belt into domains dominated by either northwesterly trending, right-lateral faults or northeasterly trending, left-lateral faults.
Clustering of velocities in a GPS network spanning the Sierra Nevada Block, the northern Walker Lane Belt, and the Central Nevada Seismic Belt, California-Nevada

USGS Publications Warehouse

Savage, James C.; Simpson, Robert W.

2013-01-01

The deformation across the Sierra Nevada Block, the Walker Lane Belt, and the Central Nevada Seismic Belt (CNSB) between 38.5°N and 40.5°N has been analyzed by clustering GPS velocities to identify coherent blocks. Cluster analysis determines the number of clusters required and assigns the GPS stations to the proper clusters. The clusters are shown on a fault map by symbols located at the positions of the GPS stations, each symbol representing the cluster to which the velocity of that GPS station belongs. Fault systems that separate the clusters are readily identified on such a map. Four significant clusters are identified. Those clusters are strips separated by (from west to east) the Mohawk Valley-Genoa fault system, the Pyramid Lake-Wassuk fault system, and the Central Nevada Seismic Belt. The strain rates within the westernmost three clusters approximate simple right-lateral shear (~13 nstrain/a) across vertical planes roughly parallel to the cluster boundaries. Clustering does not recognize the longitudinal segmentation of the Walker Lane Belt into domains dominated by either northwesterly trending, right-lateral faults or northeasterly trending, left-lateral faults.
The evolution of active galactic nuclei in clusters of galaxies from the Dark Energy Survey

DOE PAGES

Bufanda, E.; Hollowood, D.; Jeltema, T. E.; ...

2016-12-13

The correlation between active galactic nuclei (AGN) and environment provides important clues to AGN fueling and the relationship of black hole growth to galaxy evolution. Here, we analyze the fraction of galaxies in clusters hosting AGN as a function of redshift and cluster richness for X-ray detected AGN associated with clusters of galaxies in Dark Energy Survey (DES) Science Verification data. The present sample includes 33 AGN with L_X > 10 43 ergs s -1 in non-central, host galaxies with luminosity greater than 0.5 L* from a total sample of 432 clusters in the redshift range of 0.10.7. Our resultmore » is in good agreement with previous work and parallels the increase in star formation in cluster galaxies over the same redshift range. But, the AGN fraction in clusters is observed to have no significant correlation with cluster mass. Future analyses with DES Year 1 through Year 3 data will be able to clarify whether AGN activity is correlated to cluster mass and will tightly constrain the relationship between cluster AGN populations and redshift.« less
Alignments of the galaxies in and around the Virgo cluster with the local velocity shear

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Jounghun; Rey, Soo Chang; Kim, Suk, E-mail: jounghun@astro.snu.ac.kr

2014-08-10

Observational evidence is presented for the alignment between the cosmic sheet and the principal axis of the velocity shear field at the position of the Virgo cluster. The galaxies in and around the Virgo cluster from the Extended Virgo Cluster Catalog that was recently constructed by Kim et al. are used to determine the direction of the local sheet. The peculiar velocity field reconstructed from the Sloan Digital Sky Survey Data Release 7 is analyzed to estimate the local velocity shear tensor at the Virgo center. Showing first that the minor principal axis of the local velocity shear tensor ismore » almost parallel to the direction of the line of sight, we detect a clear signal of alignment between the positions of the Virgo satellites and the intermediate principal axis of the local velocity shear projected onto the plane of the sky. Furthermore, the dwarf satellites are found to appear more strongly aligned than their normal counterparts, which is interpreted as an indication of the following. (1) The normal satellites and the dwarf satellites fall in the Virgo cluster preferentially along the local filament and the local sheet, respectively. (2) The local filament is aligned with the minor principal axis of the local velocity shear while the local sheet is parallel to the plane spanned by the minor and intermediate principal axes. Our result is consistent with the recent numerical claim that the velocity shear is a good tracer of the cosmic web.« less
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.

PubMed

Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar

2017-03-01

This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8 × 800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.

Evolutionary transitions towards eusociality in snapping shrimps.

PubMed

Chak, Solomon Tin Chi; Duffy, J Emmett; Hultgren, Kristin M; Rubenstein, Dustin R

2017-03-20

Animal social organization varies from complex societies where reproduction is dominated by a single individual (eusociality) to those where reproduction is more evenly distributed among group members (communal breeding). Yet, how simple groups transition evolutionarily to more complex societies remains unclear. Competing hypotheses suggest that eusociality and communal breeding are alternative evolutionary endpoints, or that communal breeding is an intermediate stage in the transition towards eusociality. We tested these alternative hypotheses in sponge-dwelling shrimps, Synalpheus spp. Although species varied continuously in reproductive skew, they clustered into pair-forming, communal and eusocial categories based on several demographic traits. Evolutionary transition models suggested that eusocial and communal species are discrete evolutionary endpoints that evolved independently from pair-forming ancestors along alternative paths. This 'family-centred' origin of eusociality parallels observations in insects and vertebrates, reinforcing the role of kin selection in the evolution of eusociality and suggesting a general model of animal social evolution.
An Efficient Computational Framework for the Analysis of Whole Slide Images: Application to Follicular Lymphoma Immunohistochemistry

PubMed Central

Samsi, Siddharth; Krishnamurthy, Ashok K.; Gurcan, Metin N.

2012-01-01

Follicular Lymphoma (FL) is one of the most common non-Hodgkin Lymphoma in the United States. Diagnosis and grading of FL is based on the review of histopathological tissue sections under a microscope and is influenced by human factors such as fatigue and reader bias. Computer-aided image analysis tools can help improve the accuracy of diagnosis and grading and act as another tool at the pathologist’s disposal. Our group has been developing algorithms for identifying follicles in immunohistochemical images. These algorithms have been tested and validated on small images extracted from whole slide images. However, the use of these algorithms for analyzing the entire whole slide image requires significant changes to the processing methodology since the images are relatively large (on the order of 100k × 100k pixels). In this paper we discuss the challenges involved in analyzing whole slide images and propose potential computational methodologies for addressing these challenges. We discuss the use of parallel computing tools on commodity clusters and compare performance of the serial and parallel implementations of our approach. PMID:22962572
Dwarf galaxy populations in present-day galaxy clusters - II. The history of early-type and late-type dwarfs

NASA Astrophysics Data System (ADS)

Lisker, Thorsten; Weinmann, Simone M.; Janz, Joachim; Meyer, Hagen T.

2013-06-01

How did the dwarf galaxy population of present-day galaxy clusters form and grow over time? We address this question by analysing the history of dark matter subhaloes in the Millennium II cosmological simulation. A semi-analytic model serves as the link to observations. We argue that a reasonable analogue to early morphological types or red-sequence dwarf galaxies are those subhaloes that experienced strong mass-loss, or alternatively those that have spent a long time in massive haloes. This approach reproduces well the observed morphology-distance relation of dwarf galaxies in the Virgo and Coma clusters, and thus provides insight into their history. Over their lifetime, present-day late types have experienced an amount of environmental influence similar to what the progenitors of dwarf ellipticals had already experienced at redshifts above 2. Therefore, dwarf ellipticals are more likely to be a result of early and continuous environmental influence in group- and cluster-size haloes, rather than a recent transformation product. The observed morphological sequences of late-type and early-type galaxies have developed in parallel, not consecutively. Consequently, the characteristics of today's late-type galaxies are not necessarily representative for the progenitors of today's dwarf ellipticals. Studies aiming to reproduce the present-day dwarf population thus need to start at early epochs, model the influence of various environments, and also take into account the evolution of the environments themselves.
A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam

In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.
Increasing the perceptual salience of relationships in parallel coordinate plots.

PubMed

Harter, Jonathan M; Wu, Xunlei; Alabi, Oluwafemi S; Phadke, Madhura; Pinto, Lifford; Dougherty, Daniel; Petersen, Hannah; Bass, Steffen; Taylor, Russell M

2012-01-01

We present three extensions to parallel coordinates that increase the perceptual salience of relationships between axes in multivariate data sets: (1) luminance modulation maintains the ability to preattentively detect patterns in the presence of overplotting, (2) adding a one-vs.-all variable display highlights relationships between one variable and all others, and (3) adding a scatter plot within the parallel-coordinates display preattentively highlights clusters and spatial layouts without strongly interfering with the parallel-coordinates display. These techniques can be combined with one another and with existing extensions to parallel coordinates, and two of them generalize beyond cases with known-important axes. We applied these techniques to two real-world data sets (relativistic heavy-ion collision hydrodynamics and weather observations with statistical principal component analysis) as well as the popular car data set. We present relationships discovered in the data sets using these methods.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Chen, Yousu; Wu, Di

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

2010-01-01

Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gosink, Luke; Wu, Kesheng; Bethel, E. Wes

2009-06-02

The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitionsmore » and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel. We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture--for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS's performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column's base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g., GPUs).« less
A Parallel Neuromorphic Text Recognition System and Its Implementation on a Heterogeneous High-Performance Computing Cluster

DTIC Science & Technology

2013-01-01

M. Ahmadi, and M. Shridhar, “ Handwritten Numeral Recognition with Multiple Features and Multistage Classifiers,” Proc. IEEE Int’l Symp. Circuits...ARTICLE (Post Print) 3. DATES COVERED (From - To) SEP 2011 – SEP 2013 4. TITLE AND SUBTITLE A PARALLEL NEUROMORPHIC TEXT RECOGNITION SYSTEM AND ITS...research in computational intelligence has entered a new era. In this paper, we present an HPC-based context-aware intelligent text recognition
Mass spectrometric identification of intermediates in the O2-driven [4Fe-4S] to [2Fe-2S] cluster conversion in FNR

PubMed Central

Crack, Jason C.; Thomson, Andrew J.

2017-01-01

The iron-sulfur cluster containing protein Fumarate and Nitrate Reduction (FNR) is the master regulator for the switch between anaerobic and aerobic respiration in Escherichia coli and many other bacteria. The [4Fe-4S] cluster functions as the sensory module, undergoing reaction with O2 that leads to conversion to a [2Fe-2S] form with loss of high-affinity DNA binding. Here, we report studies of the FNR cluster conversion reaction using time-resolved electrospray ionization mass spectrometry. The data provide insight into the reaction, permitting the detection of cluster conversion intermediates and products, including a [3Fe-3S] cluster and persulfide-coordinated [2Fe-2S] clusters [[2Fe-2S](S)n, where n = 1 or 2]. Analysis of kinetic data revealed a branched mechanism in which cluster sulfide oxidation occurs in parallel with cluster conversion and not as a subsequent, secondary reaction to generate [2Fe-2S](S)n species. This methodology shows great potential for broad application to studies of protein cofactor–small molecule interactions. PMID:28373574
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains.

PubMed

Jha, Ashwani; Flurchick, K M; Bikdash, Marwan; Kc, Dukka B

2016-01-01

Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10-15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains

PubMed Central

Jha, Ashwani; Flurchick, K. M.; Bikdash, Marwan

2016-01-01

Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors. PMID:27747230
Efficacy of a web- and text messaging-based intervention to reduce problem drinking in adolescents: Results of a cluster-randomized controlled trial.

PubMed

Haug, Severin; Paz Castro, Raquel; Kowatsch, Tobias; Filler, Andreas; Dey, Michelle; Schaub, Michael P

2017-02-01

To test the efficacy of a combined web- and text messaging-based intervention to reduce problem drinking in young people compared to assessment only. Two-arm, parallel-group, cluster-randomized controlled trial with assessments at baseline and 6-month follow up. The automated intervention included online feedback, based on the social norms approach, and individually tailored text messages addressing social norms, outcome expectations, motivation, self-efficacy, and planning processes, provided over 3 months. The main outcome criterion was the prevalence of risky single-occasion drinking (RSOD, defined as drinking at least 5 standard drinks on a single occasion in men and 4 in women) in the past 30 days. Irrespective of alcohol consumption, 1,355 students from 80 Swiss vocational and upper secondary school classes, all of whom owned a mobile phone, were invited to participate in the study. Of these, 1,041 (76.8%) students participated in the study. Based on intention-to-treat analyses, RSOD prevalence decreased by 5.9% in the intervention group and increased by 2.6% in the control group, relative to that of baseline assessment (odds ratio [OR] = 0.62, 95% confidence interval [CI] = 0.44-0.87). No significant group differences were observed for the following secondary outcomes: RSOD frequency, quantity of alcohol consumed, estimated peak blood alcohol concentration, and overestimation of peer drinking norms. The intervention program reduced RSOD, which is a major indicator of problem drinking in young people, effectively. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Performance comparison analysis library communication cluster system using merge sort

NASA Astrophysics Data System (ADS)

Wulandari, D. A. R.; Ramadhan, M. E.

2018-04-01

Begins by using a single processor, to increase the speed of computing time, the use of multi-processor was introduced. The second paradigm is known as parallel computing, example cluster. The cluster must have the communication potocol for processing, one of it is message passing Interface (MPI). MPI have many library, both of them OPENMPI and MPICH2. Performance of the cluster machine depend on suitable between performance characters of library communication and characters of the problem so this study aims to analyze the comparative performances libraries in handling parallel computing process. The case study in this research are MPICH2 and OpenMPI. This case research execute sorting’s problem to know the performance of cluster system. The sorting problem use mergesort method. The research method is by implementing OpenMPI and MPICH2 on a Linux-based cluster by using five computer virtual then analyze the performance of the system by different scenario tests and three parameters for to know the performance of MPICH2 and OpenMPI. These performances are execution time, speedup and efficiency. The results of this study showed that the addition of each data size makes OpenMPI and MPICH2 have an average speed-up and efficiency tend to increase but at a large data size decreases. increased data size doesn’t necessarily increased speed up and efficiency but only execution time example in 100000 data size. OpenMPI has a execution time greater than MPICH2 example in 1000 data size average execution time with MPICH2 is 0,009721 and OpenMPI is 0,003895 OpenMPI can customize communication needs.
A Multiple Sphere T-Matrix Fortran Code for Use on Parallel Computer Clusters

NASA Technical Reports Server (NTRS)

Mackowski, D. W.; Mishchenko, M. I.

2011-01-01

A general-purpose Fortran-90 code for calculation of the electromagnetic scattering and absorption properties of multiple sphere clusters is described. The code can calculate the efficiency factors and scattering matrix elements of the cluster for either fixed or random orientation with respect to the incident beam and for plane wave or localized- approximation Gaussian incident fields. In addition, the code can calculate maps of the electric field both interior and exterior to the spheres.The code is written with message passing interface instructions to enable the use on distributed memory compute clusters, and for such platforms the code can make feasible the calculation of absorption, scattering, and general EM characteristics of systems containing several thousand spheres.
Evaluation of Job Queuing/Scheduling Software: Phase I Report

NASA Technical Reports Server (NTRS)

Jones, James Patton

1996-01-01

The recent proliferation of high performance work stations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, the national Aerodynamic Simulation (NAS) supercomputer facility compiled a requirements checklist for job queuing/scheduling software. Next, NAS began an evaluation of the leading job management system (JMS) software packages against the checklist. This report describes the three-phase evaluation process, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still insufficient, even in the leading JMS's. However, by ranking each JMS evaluated against the requirements, we provide data that will be useful to other sites in selecting a JMS.
Interactive visual exploration and analysis of origin-destination data

NASA Astrophysics Data System (ADS)

Ding, Linfang; Meng, Liqiu; Yang, Jian; Krisp, Jukka M.

2018-05-01

In this paper, we propose a visual analytics approach for the exploration of spatiotemporal interaction patterns of massive origin-destination data. Firstly, we visually query the movement database for data at certain time windows. Secondly, we conduct interactive clustering to allow the users to select input variables/features (e.g., origins, destinations, distance, and duration) and to adjust clustering parameters (e.g. distance threshold). The agglomerative hierarchical clustering method is applied for the multivariate clustering of the origin-destination data. Thirdly, we design a parallel coordinates plot for visualizing the precomputed clusters and for further exploration of interesting clusters. Finally, we propose a gradient line rendering technique to show the spatial and directional distribution of origin-destination clusters on a map view. We implement the visual analytics approach in a web-based interactive environment and apply it to real-world floating car data from Shanghai. The experiment results show the origin/destination hotspots and their spatial interaction patterns. They also demonstrate the effectiveness of our proposed approach.
A distal earthquake cluster concurrent with the 2006 explosive eruption of Augustine Volcano, Alaska

USGS Publications Warehouse

Fisher, M.A.; Ruppert, N.A.; White, R.A.; Wilson, Frederic H.; Comer, D.; Sliter, R.W.; Wong, F.L.

2009-01-01

Clustered earthquakes located 25??km northeast of Augustine Volcano began about 6??months before and ceased soon after the volcano's 2006 explosive eruption. This distal seismicity formed a dense cluster less than 5??km across, in map view, and located in depth between 11??km and 16??km. This seismicity was contemporaneous with sharply increased shallow earthquake activity directly below the volcano's vent. Focal mechanisms for five events within the distal cluster show strike-slip fault movement. Cluster seismicity best defines a plane when it is projected onto a northeast-southwest cross section, suggesting that the seismogenic fault strikes northwest. However, two major structural trends intersect near Augustine Volcano, making it difficult to put the seismogenic fault into a regional-geologic context. Specifically, interpretation of marine multichannel seismic-reflection (MCS) data shows reverse faults, directly above the seismicity cluster, that trend northeast, parallel to the regional geologic strike but perpendicular to the fault suggested by the clustered seismicity. The seismogenic fault could be a reactivated basement structure.
Inference of median difference based on the Box-Cox model in randomized clinical trials.

PubMed

Maruo, K; Isogawa, N; Gosho, M

2015-05-10

In randomized clinical trials, many medical and biological measurements are not normally distributed and are often skewed. The Box-Cox transformation is a powerful procedure for comparing two treatment groups for skewed continuous variables in terms of a statistical test. However, it is difficult to directly estimate and interpret the location difference between the two groups on the original scale of the measurement. We propose a helpful method that infers the difference of the treatment effect on the original scale in a more easily interpretable form. We also provide statistical analysis packages that consistently include an estimate of the treatment effect, covariance adjustments, standard errors, and statistical hypothesis tests. The simulation study that focuses on randomized parallel group clinical trials with two treatment groups indicates that the performance of the proposed method is equivalent to or better than that of the existing non-parametric approaches in terms of the type-I error rate and power. We illustrate our method with cluster of differentiation 4 data in an acquired immune deficiency syndrome clinical trial. Copyright © 2015 John Wiley & Sons, Ltd.
Effect of Deploying Trained Community Based Reproductive Health Nurses (CORN) on Long-Acting Reversible Contraception (LARC) Use in Rural Ethiopia: A Cluster Randomized Community Trial.

PubMed

Zerfu, Taddese Alemu; Ayele, Henok Taddese; Bogale, Tariku Nigatu

2018-06-01

To investigate the effect of innovative means to distribute LARC on contraceptive use, we implemented a three arm, parallel groups, cluster randomized community trial design. The intervention consisted of placing trained community-based reproductive health nurses (CORN) within health centers or health posts. The nurses provided counseling to encourage women to use LARC and distributed all contraceptive methods. A total of 282 villages were randomly selected and assigned to a control arm (n = 94) or 1 of 2 treatment arms (n = 94 each). The treatment groups differed by where the new service providers were deployed, health post or health center. We calculated difference-in-difference (DID) estimates to assess program impacts on LARC use. After nine months of intervention, the use of LARC methods increased significantly by 72.3 percent, while the use of short acting methods declined by 19.6 percent. The proportion of women using LARC methods increased by 45.9 percent and 45.7 percent in the health post and health center based intervention arms, respectively. Compared to the control group, the DID estimates indicate that the use of LARC methods increased by 11.3 and 12.3 percentage points in the health post and health center based intervention arms. Given the low use of LARC methods in similar settings, deployment of contextually trained nurses at the grassroots level could substantially increase utilization of these methods. © 2018 The Population Council, Inc.

Large-scale variation in subsurface stream biofilms: a cross-regional comparison of metabolic function and community similarity.

PubMed

Findlay, S; Sinsabaugh, R L

2006-10-01

We examined bacterial metabolic activity and community similarity in shallow subsurface stream sediments distributed across three regions of the eastern United States to assess whether there were parallel changes in functional and structural attributes at this large scale. Bacterial growth, oxygen consumption, and a suite of extracellular enzyme activities were assayed to describe functional variability. Community similarity was assessed using randomly amplified polymorphic DNA (RAPD) patterns. There were significant differences in streamwater chemistry, metabolic activity, and bacterial growth among regions with, for instance, twofold higher bacterial production in streams near Baltimore, MD, compared to Hubbard Brook, NH. Five of eight extracellular enzymes showed significant differences among regions. Cluster analyses of individual streams by metabolic variables showed clear groups with significant differences in representation of sites from different regions among groups. Clustering of sites based on randomly amplified polymorphic DNA banding resulted in groups with generally less internal similarity although there were still differences in distribution of regional sites. There was a marginally significant (p = 0.09) association between patterns based on functional and structural variables. There were statistically significant but weak (r2 approximately 30%) associations between landcover and measures of both structure and function. These patterns imply a large-scale organization of biofilm communities and this structure may be imposed by factor(s) such as landcover and covariates such as nutrient concentrations, which are known to also cause differences in macrobiota of stream ecosystems.
Parallel high-performance grid computing: capabilities and opportunities of a novel demanding service and business class allowing highest resource efficiency.

PubMed

Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A

2010-01-01

Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

NASA Technical Reports Server (NTRS)

O'Connell, Matthew; Karman, Steve L.

2015-01-01

Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
From Surveillance to Intervention: Overview and Baseline Findings for the Active City of Liverpool Active Schools and SportsLinx (A-CLASS) Project

PubMed Central

McWhannell, Nicola; Henaghan, Jayne L.

2018-01-01

This paper outlines the implementation of a programme of work that started with the development of a population-level children’s health, fitness and lifestyle study in 1996 (SportsLinx) leading to selected interventions one of which is described in detail: the Active City of Liverpool, Active Schools and SportsLinx (A-CLASS) Project. The A-CLASS Project aimed to quantify the effectiveness of structured and unstructured physical activity (PA) programmes on children’s PA, fitness, body composition, bone health, cardiac and vascular structures, fundamental movement skills, physical self-perception and self-esteem. The study was a four-arm parallel-group school-based cluster randomised controlled trial (clinical trials no. NCT02963805), and compared different exposure groups: a high intensity PA (HIPA) group, a fundamental movement skill (FMS) group, a PA signposting (PASS) group and a control group, in a two-schools-per-condition design. Baseline findings indicate that children’s fundamental movement skill competence levels are low-to-moderate, yet these skills are inversely associated with percentage body fat. Outcomes of this project will make an important contribution to the design and implementation of children’s PA promotion initiatives.
FLY MPI-2: a parallel tree code for LSS

NASA Astrophysics Data System (ADS)

Becciani, U.; Comparato, M.; Antonuccio-Delogu, V.

2006-04-01

New version program summaryProgram title: FLY 3.1 Catalogue identifier: ADSC_v2_0 Licensing provisions: yes Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSC_v2_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland No. of lines in distributed program, including test data, etc.: 158 172 No. of bytes in distributed program, including test data, etc.: 4 719 953 Distribution format: tar.gz Programming language: Fortran 90, C Computer: Beowulf cluster, PC, MPP systems Operating system: Linux, Aix RAM: 100M words Catalogue identifier of previous version: ADSC_v1_0 Journal reference of previous version: Comput. Phys. Comm. 155 (2003) 159 Does the new version supersede the previous version?: yes Nature of problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force Solution method: FLY is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986) Reasons for the new version: The new version of FLY is implemented by using the MPI-2 standard: the distributed version 3.1 was developed by using the MPICH2 library on a PC Linux cluster. Today the FLY performance allows us to consider the FLY code among the most powerful parallel codes for tree N-body simulations. Another important new feature regards the availability of an interface with hydrodynamical Paramesh based codes. Simulations must follow a box large enough to accurately represent the power spectrum of fluctuations on very large scales so that we may hope to compare them meaningfully with real data. The number of particles then sets the mass resolution of the simulation, which we would like to make as fine as possible. The idea to build an interface between two codes, that have different and complementary cosmological tasks, allows us to execute complex cosmological simulations with FLY, specialized for DM evolution, and a code specialized for hydrodynamical components that uses a Paramesh block structure. Summary of revisions: The parallel communication schema was totally changed. The new version adopts the MPICH2 library. Now FLY can be executed on all Unix systems having an MPI-2 standard library. The main data structure, is declared in a module procedure of FLY (fly_h.F90 routine). FLY creates the MPI Window object for one-sided communication for all the shared arrays, with a call like the following: CALL MPI_WIN_CREATE(POS, SIZE, REAL8, MPI_INFO_NULL, MPI_COMM_WORLD, WIN_POS, IERR) the following main window objects are created: win_pos, win_vel, win_acc: particles positions velocities and accelerations, win_pos_cell, win_mass_cell, win_quad, win_subp, win_grouping: cells positions, masses, quadrupole momenta, tree structure and grouping cells. Other windows are created for dynamic load balance and global counters. Restrictions: The program uses the leapfrog integrator schema, but could be changed by the user. Unusual features: FLY uses the MPI-2 standard: the MPICH2 library on Linux systems was adopted. To run this version of FLY the working directory must be shared among all the processors that execute FLY. Additional comments: Full documentation for the program is included in the distribution in the form of a README file, a User Guide and a Reference manuscript. Running time: IBM Linux Cluster 1350, 512 nodes with 2 processors for each node and 2 GB RAM for each processor, at Cineca, was adopted to make performance tests. Processor type: Intel Xeon Pentium IV 3.0 GHz and 512 KB cache (128 nodes have Nocona processors). Internal Network: Myricom LAN Card "C" Version and "D" Version. Operating System: Linux SuSE SLES 8. The code was compiled using the mpif90 compiler version 8.1 and with basic optimization options in order to have performances that could be useful compared with other generic clusters Processors
A tailored multicomponent program to reduce discomfort in critically ill patients: a cluster-randomized controlled trial.

PubMed

Kalfon, Pierre; Baumstarck, Karine; Estagnasie, Philippe; Geantot, Marie-Agnès; Berric, Audrey; Simon, Georges; Floccard, Bernard; Signouret, Thomas; Boucekine, Mohamed; Fromentin, Mélanie; Nyunga, Martine; Sossou, Achille; Venot, Marion; Robert, René; Follin, Arnaud; Audibert, Juliette; Renault, Anne; Garrouste-Orgeas, Maïté; Collange, Olivier; Levrat, Quentin; Villard, Isabelle; Thevenin, Didier; Pottecher, Julien; Patrigeon, René-Gilles; Revel, Nathalie; Vigne, Coralie; Azoulay, Elie; Mimoz, Olivier; Auquier, Pascal

2017-12-01

Critically ill patients are exposed to stressful conditions and experience several discomforts. The primary objective was to assess whether a tailored multicomponent program is effective for reducing self-perceived discomfort. In a cluster-randomized two-arm parallel trial, 34 French adult intensive care units (ICUs) without planned interventions to reduce discomfort were randomized, 17 to the arm including a 6-month period of program implementation followed by a 6-month period without the program (experimental group), and 17 to the arm with an inversed sequence (control group). The tailored multicomponent program consisted of assessment of ICU-related self-perceived discomforts, immediate and monthly feedback to healthcare teams, and site-specific tailored interventions. The primary outcome was the overall discomfort score derived from the 16-item IPREA questionnaire (0, minimal, 100, maximal overall discomfort) and the secondary outcomes were the discomfort scores of each IPREA item. IPREA was administered on the day of ICU discharge with a considered timeframe from the ICU admission until ICU discharge. During a 1-month assessment period, 398 and 360 patients were included in the experimental group and the control group, respectively. The difference (experimental minus control) of the overall discomfort score between groups was - 7.00 (95% CI - 9.89 to - 4.11, p < 0.001). After adjustment (age, gender, ICU duration, mechanical ventilation duration, and type of admission), the program effect was still positive for the overall discomfort score (difference - 6.35, SE 1.23, p < 0.001) and for 12 out of 16 items. This tailored multicomponent program decreased self-perceived discomfort in adult critically ill patients. Clinicaltrials.gov Identifier NCT02442934.
Impact of a Brief Group Intervention to Enhance Parenting and the Home Learning Environment for Children Aged 6-36 Months: a Cluster Randomised Controlled Trial.

PubMed

Hackworth, N J; Berthelsen, D; Matthews, J; Westrupp, E M; Cann, W; Ukoumunne, O C; Bennetts, S K; Phan, T; Scicluna, A; Trajanovska, M; Yu, M; Nicholson, J M

2017-04-01

This study evaluated the effectiveness of a group parenting intervention designed to strengthen the home learning environment of children from disadvantaged families. Two cluster randomised controlled superiority trials were conducted in parallel and delivered within existing services: a 6-week parenting group (51 locations randomised; 986 parents) for parents of infants (aged 6-12 months), and a 10-week facilitated playgroup (58 locations randomised; 1200 parents) for parents of toddlers (aged 12-36 months). Each trial had three conditions: intervention (smalltalk group-only); enhanced intervention with home coaching (smalltalk plus); and 'standard'/usual practice controls. Parent-report and observational measures were collected at baseline, 12 and 32 weeks follow-up. Primary outcomes were parent verbal responsivity and home learning activities at 32 weeks. In the infant trial, there were no differences by trial arm for the primary outcomes at 32 weeks. In the toddler trial at 32-weeks, participants in the smalltalk group-only trial showed improvement compared to the standard program for parent verbal responsivity (effect size (ES) = 0.16; 95% CI 0.01, 0.36) and home learning activities (ES = 0.17; 95% CI 0.01, 0.38) but smalltalk plus did not. For the secondary outcomes in the infant trial, several initial differences favouring smalltalk plus were evident at 12 weeks, but not maintained to 32 weeks. For the toddler trial, differences in secondary outcomes favouring smalltalk plus were evident at 12 weeks and maintained to 32 weeks. These trials provide some evidence of the benefits of a parenting intervention focused on the home learning environment for parents of toddlers but not infants. 8 September 2011; ACTRN12611000965909 .
UBO Detector - A cluster-based, fully automated pipeline for extracting white matter hyperintensities.

PubMed

Jiang, Jiyang; Liu, Tao; Zhu, Wanlin; Koncz, Rebecca; Liu, Hao; Lee, Teresa; Sachdev, Perminder S; Wen, Wei

2018-07-01

We present 'UBO Detector', a cluster-based, fully automated pipeline for extracting and calculating variables for regions of white matter hyperintensities (WMH) (available for download at https://cheba.unsw.edu.au/group/neuroimaging-pipeline). It takes T1-weighted and fluid attenuated inversion recovery (FLAIR) scans as input, and SPM12 and FSL functions are utilised for pre-processing. The candidate clusters are then generated by FMRIB's Automated Segmentation Tool (FAST). A supervised machine learning algorithm, k-nearest neighbor (k-NN), is applied to determine whether the candidate clusters are WMH or non-WMH. UBO Detector generates both image and text (volumes and the number of WMH clusters) outputs for whole brain, periventricular, deep, and lobar WMH, as well as WMH in arterial territories. The computation time for each brain is approximately 15 min. We validated the performance of UBO Detector by showing a) high segmentation (similarity index (SI) = 0.848) and volumetric (intraclass correlation coefficient (ICC) = 0.985) agreement between the UBO Detector-derived and manually traced WMH; b) highly correlated (r 2  > 0.9) and a steady increase of WMH volumes over time; and c) significant associations of periventricular (t = 22.591, p < 0.001) and deep (t = 14.523, p < 0.001) WMH volumes generated by UBO Detector with Fazekas rating scores. With parallel computing enabled in UBO Detector, the processing can take advantage of multi-core CPU's that are commonly available on workstations. In conclusion, UBO Detector is a reliable, efficient and fully automated WMH segmentation pipeline. Copyright © 2018 Elsevier Inc. All rights reserved.
The HST Large Programme on ω Centauri. II. Internal Kinematics

NASA Astrophysics Data System (ADS)

Bellini, Andrea; Libralato, Mattia; Bedin, Luigi R.; Milone, Antonino P.; van der Marel, Roeland P.; Anderson, Jay; Apai, Dániel; Burgasser, Adam J.; Marino, Anna F.; Rees, Jon M.

2018-01-01

In this second installment of the series, we look at the internal kinematics of the multiple stellar populations of the globular cluster ω Centauri in one of the parallel Hubble Space Telescope (HST) fields, located at about 3.5 half-light radii from the center of the cluster. Thanks to the over 15 yr long baseline and the exquisite astrometric precision of the HST cameras, well-measured stars in our proper-motion catalog have errors as low as ∼10 μas yr‑1, and the catalog itself extends to near the hydrogen-burning limit of the cluster. We show that second-generation (2G) stars are significantly more radially anisotropic than first-generation (1G) stars. The latter are instead consistent with an isotropic velocity distribution. In addition, 1G stars have excess systemic rotation in the plane of the sky with respect to 2G stars. We show that the six populations below the main-sequence (MS) knee identified in our first paper are associated with the five main population groups recently isolated on the upper MS in the core of cluster. Furthermore, we find both 1G and 2G stars in the field to be far from being in energy equipartition, with {η }1{{G}}=-0.007+/- 0.026 for the former and {η }2{{G}}=0.074+/- 0.029 for the latter, where η is defined so that the velocity dispersion {σ }μ scales with stellar mass as {σ }μ \\propto {m}-η . The kinematical differences reported here can help constrain the formation mechanisms for the multiple stellar populations in ω Centauri and other globular clusters. We make our astro-photometric catalog publicly available.
The Cluster Population of UGC 2885

NASA Astrophysics Data System (ADS)

Holwerda, Benne

2017-08-01

UGC 2885 was discoverd to be the most extended disk galaxy [250 kpc diameter] by Vera Rubin in the 1980's. We ask for HST observations of UGC 2885 as it is close enough to resolve the GC population with HST but it is a substantially more extended disk than any studied before. LCDM galaxy assembly implies that the GC population comes from small accreted systems and the disk -and the clusters associated with it- predominantly from gas accretion (matching angular momentum to the disk). Several scaling relations between the GC population and parent galaxy have been observed but these differ for disk and spheroidal (massive) galaxies.We propose to observe this galaxy with HST in 4 point WFC3 mosaic with coordinated ACS parallels to probe both the disk and outer halo component of the GC population. GC populations have been studied extensively using HST color mosaics of local disk galaxies and these can serve as comparison samples. How UGC 2885 cluster populations relate to its stellar and halo mass, luminosity and with radius will reveal the formation history of extra-ordinary disk.Our goals are twofold: our science goal is to map the luminosity, (some) size, and color distributions of the stellar and globular clusters in and around this disk. In absolute terms, we expect to find many GC but the relative relation of the GC population to this galaxy's mass (stellar and halo) and size will shed light on its formation history; similar to a group or cluster central elliptical or to a field galaxy (albeit one with a disk 10x the Milky Way's size)? Our secondary motive is to make an HST tribute image to the late Vera Rubin.
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

PubMed

Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

2012-01-01

We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
KNBD: A Remote Kernel Block Server for Linux

NASA Technical Reports Server (NTRS)

Becker, Jeff

1999-01-01

I am developing a prototype of a Linux remote disk block server whose purpose is to serve as a lower level component of a parallel file system. Parallel file systems are an important component of high performance supercomputers and clusters. Although supercomputer vendors such as SGI and IBM have their own custom solutions, there has been a void and hence a demand for such a system on Beowulf-type PC Clusters. Recently, the Parallel Virtual File System (PVFS) project at Clemson University has begun to address this need (1). Although their system provides much of the functionality of (and indeed was inspired by) the equivalent file systems in the commercial supercomputer market, their system is all in user-space. Migrating their 10 services to the kernel could provide a performance boost, by obviating the need for expensive system calls. Thanks to Pavel Machek, the Linux kernel has provided the network block device (2) with kernels 2.1.101 and later. You can configure this block device to redirect reads and writes to a remote machine's disk. This can be used as a building block for constructing a striped file system across several nodes.
CO adsorption on (111) and (100) surfaces of the Pt sub 3 Ti alloy. Evidence for parallel binding and strong activation of CO

NASA Technical Reports Server (NTRS)

Mehandru, S. P.; Anderson, A. B.; Ross, P. N.

1985-01-01

The CO adsorption on a 40 atom cluster model of the (111) surface and a 36 atom cluster model of the (100) surface of the Pt3Ti alloy was studied. Parallel binding to high coordinate sites associated with Ti and low CO bond scission barriers are predicted for both surfaces. The binding of CO to Pt sites occurs in an upright orientation. These orientations are a consequence of the nature of the CO pi donation interactions with the surface. On the Ti sites the orbitals donate to the nearly empty Ti 3d band and the antibonding counterpart orbitals are empty. On the Pt sites, however, they are in the filled Pt 5d region of the alloy band, which causes CO to bond in a vertical orientation by 5 delta donation from the carbon end.
A parallel-processing approach to computing for the geographic sciences

USGS Publications Warehouse

Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark

2001-01-01

The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.
The HST Frontier Fields: High-Level Science Data Products for the First 4 Completed Clusters, and Latest Data on the Remaining Clusters

NASA Astrophysics Data System (ADS)

Koekemoer, Anton M.; Mack, Jennifer; Lotz, Jennifer; Anderson, Jay; Avila, Roberto J.; Barker, Elizabeth A.; Borncamp, David; Gunning, Heather C.; Hilbert, Bryan; Khandrika, Harish G.; Lucas, Ray A.; Ogaz, Sara; Porterfield, Blair; Grogin, Norman A.; Robberto, Massimo; Flanagan, Kathryn; Mountain, Matt; HST Frontier Fields Team

2016-01-01

The Hubble Space Telescope Frontier Fields program is a large Director's Discretionary program of 840 orbits, to obtain ultra-deep observations of six strong lensing clusters of galaxies, together with parallel deep blank fields, making use of the strong lensing amplification by these clusters of distant background galaxies to detect the faintest galaxies currently observable in the high-redshift universe. The first four of these clusters are now complete, namely Abell 2744, MACS J0416.1-2403, MACS J0717.5+3745 and MACS J1149.5+2223, with each of these having been observed over two epochs, to a total depth of 140 orbits on the main cluster and an associated parallel field, using ACS (F435W, F606W, F814W) and WFC3/IR (F105W, F125W, F140W, F160W). The remaining two clusters, Abell 370 and Abell S1063, are currently in progress. Full sets of high-level science products have been generated for all these clusters by the team at STScI, including a total of 24 separate cumulative-depth data releases during each epoch, as well as full-depth version 1.0 releases at the end of each completed epoch. These products include all the full-depth distortion-corrected mosaics and associated products for each cluster, which are science-ready to facilitate the construction of lensing models as well as enabling a wide range of other science projects. Many improvements beyond default calibration for ACS and WFC3/IR are implemented in these data products, including corrections for persistence, time-variable sky, and low-level dark current residuals, as well as improvements in astrometric alignment to achieve milliarcsecond-level accuracy. The resulting high-level science products are delivered via the Mikulski Archive for Space Telescopes (MAST) to the community on a rapid timescale to enable the widest scientific use of these data, as well as ensuring a public legacy dataset of the highest possible quality that is of lasting value to the entire community.
The HST Frontier Fields: High-Level Science Data Products for the First 4 Completed Clusters, and for the Last 2 Clusters Currently in Progress

NASA Astrophysics Data System (ADS)

Koekemoer, Anton M.; Mack, Jennifer; Lotz, Jennifer M.; Anderson, Jay; Avila, Roberto J.; Barker, Elizabeth A.; Borncamp, David; Gunning, Heather C.; Hilbert, Bryan; Khandrika, Harish G.; Lucas, Ray A.; Ogaz, Sara; Porterfield, Blair; Sunnquist, Ben; Grogin, Norman A.; Robberto, Massimo; Sembach, Kenneth; Flanagan, Kathryn; Mountain, Matt; HST Frontier Fields Team

2016-06-01

The Hubble Space Telescope Frontier Fields program (PI: J. Lotz) is a large Director's Discretionary program of 840 orbits, to obtain ultra-deep observations of six strong lensing clusters of galaxies, together with parallel deep blank fields, making use of the strong lensing amplification by these clusters of distant background galaxies to detect the faintest galaxies currently observable in the high-redshift universe. The first four of these clusters are now complete, namely Abell 2744, MACS J0416.1-2403, MACS J0717.5+3745 and MACS J1149.5+2223, with each of these having been observed over two epochs, to a total depth of 140 orbits on the main cluster and an associated parallel field, using ACS (F435W, F606W, F814W) and WFC3/IR (F105W, F125W, F140W, F160W). The remaining two clusters, Abell 370 and Abell S1063, are currently in progress, with the first epoch for each having been completed. Full sets of high-level science products have been generated for all these clusters by the team at STScI, including cumulative-depth v0.5 data releases during each epoch, as well as full-depth version 1.0 releases after the completion of each epoch. These products include all the full-depth distortion-corrected mosaics and associated products for each cluster, which are science-ready to facilitate the construction of lensing models as well as enabling a wide range of other science projects. Many improvements beyond default calibration for ACS and WFC3/IR are implemented in these data products, including corrections for persistence, time-variable sky, and low-level dark current residuals, as well as improvements in astrometric alignment to achieve milliarcsecond-level accuracy. The full set of resulting high-level science products are publicly delivered to the community via the Mikulski Archive for Space Telescopes (MAST) to enable the widest scientific use of these data, as well as ensuring a public legacy dataset of the highest possible quality that is of lasting value to the entire community.
OceanXtremes: Scalable Anomaly Detection in Oceanographic Time-Series

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Armstrong, E. M.; Chin, T. M.; Gill, K. M.; Greguska, F. R., III; Huang, T.; Jacob, J. C.; Quach, N.

2016-12-01

The oceanographic community must meet the challenge to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data-intensive reality, we are developing an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of 15 to 30-year ocean science datasets.Our parallel analytics engine is extending the NEXUS system and exploits multiple open-source technologies: Apache Cassandra as a distributed spatial "tile" cache, Apache Spark for in-memory parallel computation, and Apache Solr for spatial search and storing pre-computed tile statistics and other metadata. OceanXtremes provides these key capabilities: Parallel generation (Spark on a compute cluster) of 15 to 30-year Ocean Climatologies (e.g. sea surface temperature or SST) in hours or overnight, using simple pixel averages or customizable Gaussian-weighted "smoothing" over latitude, longitude, and time; Parallel pre-computation, tiling, and caching of anomaly fields (daily variables minus a chosen climatology) with pre-computed tile statistics; Parallel detection (over the time-series of tiles) of anomalies or phenomena by regional area-averages exceeding a specified threshold (e.g. high SST in El Nino or SST "blob" regions), or more complex, custom data mining algorithms; Shared discovery and exploration of ocean phenomena and anomalies (facet search using Solr), along with unexpected correlations between key measured variables; Scalable execution for all capabilities on a hybrid Cloud, using our on-premise OpenStack Cloud cluster or at Amazon. The key idea is that the parallel data-mining operations will be run "near" the ocean data archives (a local "network" hop) so that we can efficiently access the thousands of files making up a three decade time-series. The presentation will cover the architecture of OceanXtremes, parallelization of the climatology computation and anomaly detection algorithms using Spark, example results for SST and other time-series, and parallel performance metrics.
Parallel Fault Strands at 9-km Depth Resolved on the Imperial Fault, Southern California

NASA Astrophysics Data System (ADS)

Shearer, P. M.

2001-12-01

The Imperial Fault is one of the most active faults in California with several M>6 events during the 20th century and geodetic results suggesting that it currently carries almost 80% of the total plate motion between the Pacific and North American plates. We apply waveform cross-correlation to a group of ~1500 microearthquakes along the Imperial Fault and find that about 25% of the events form similar event clusters. Event relocation based on precise differential times among events in these clusters reveals multiple streaks of seismicity up to 5 km in length that are at a nearly constant depth of ~9 km but are spaced about 0.5 km apart in map view. These multiples are unlikely to be a location artifact because they are spaced more widely than the computed location errors and different streaks can be resolved within individual similar event clusters. The streaks are parallel to the mapped surface rupture of the 1979 Mw=6.5 Imperial Valley earthquake. No obvious temporal migration of the event locations is observed. Limited focal mechanism data for the events within the streaks are consistent with right-lateral slip on vertical fault planes. The seismicity not contained in similar event clusters cannot be located as precisely; our locations for these events scatter between 7 and 11 km depth, but it is possible that their true locations could be much more tightly clustered. The observed streaks have some similarities to those previously observed in northern California along the San Andreas and Hayward faults (e.g., Rubin et al., 1999; Waldhauser et al., 1999); however those streaks were imaged within a single fault plane rather than the multiple faults resolved on the Imperial Fault. The apparent constant depth of the Imperial streaks is similar to that seen in Hawaii at much shallower depth by Gillard et al. (1996). Geodetic results (e.g., Lyons et al., 2001) suggest that the Imperial Fault is currently slipping at 45 mm/yr below a locked portion that extends to ~10 km depth. We interpret our observed seismicity streaks as representing activity on multiple fault strands at transition depths between the locked shallow part of the Imperial Fault and the slipping portion at greater depths. It is likely that these strands extend into the aseismic region below, suggesting that the lower crustal shear zone is at least 2 km wide.
Application of a parallel genetic algorithm to the global optimization of medium-sized Au-Pd sub-nanometre clusters

NASA Astrophysics Data System (ADS)

Hussein, Heider A.; Demiroglu, Ilker; Johnston, Roy L.

2018-02-01

To contribute to the discussion of the high activity and reactivity of Au-Pd system, we have adopted the BPGA-DFT approach to study the structural and energetic properties of medium-sized Au-Pd sub-nanometre clusters with 11-18 atoms. We have examined the structural behaviour and stability as a function of cluster size and composition. The study suggests 2D-3D crossover points for pure Au clusters at 14 and 16 atoms, whereas pure Pd clusters are all found to be 3D. For Au-Pd nanoalloys, the role of cluster size and the influence of doping were found to be extensive and non-monotonic in altering cluster structures. Various stability criteria (e.g. binding energies, second differences in energy, and mixing energies) are used to evaluate the energetics, structures, and tendency of segregation in sub-nanometre Au-Pd clusters. HOMO-LUMO gaps were calculated to give additional information on cluster stability and a systematic homotop search was used to evaluate the energies of the generated global minima of mono-substituted clusters and the preferred doping sites, as well as confirming the validity of the BPGA-DFT approach.
Merging K-means with hierarchical clustering for identifying general-shaped groups.

PubMed

Peterson, Anna D; Ghosh, Arka P; Maitra, Ranjan

2018-01-01

Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and K -means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tree-like structure but suffers from computational complexity in large datasets while K -means clustering is efficient but designed to identify homogeneous spherically-shaped clusters. We present a hybrid non-parametric clustering approach that amalgamates the two methods to identify general-shaped clusters and that can be applied to larger datasets. Specifically, we first partition the dataset into spherical groups using K -means. We next merge these groups using hierarchical methods with a data-driven distance measure as a stopping criterion. Our proposal has the potential to reveal groups with general shapes and structure in a dataset. We demonstrate good performance on several simulated and real datasets.

MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Young Galaxy Candidates in the Hubble Frontier Fields. IV. MACS J1149.5+2223

NASA Astrophysics Data System (ADS)

Zheng, Wei; Zitrin, Adi; Infante, Leopoldo; Laporte, Nicolas; Huang, Xingxing; Moustakas, John; Ford, Holland C.; Shu, Xinwen; Wang, Junxian; Diego, Jose M.; Bauer, Franz E.; Troncoso Iribarren, Paulina; Broadhurst, Tom; Molino, Alberto

2017-02-01

We search for high-redshift dropout galaxies behind the Hubble Frontier Fields (HFF) galaxy cluster MACS J1149.5+2223, a powerful cosmic lens that has revealed a number of unique objects in its field. Using the deep images from the Hubble and Spitzer space telescopes, we find 11 galaxies at z > 7 in the MACS J1149.5+2223 cluster field, and 11 in its parallel field. The high-redshift nature of the bright z ≃ 9.6 galaxy MACS1149-JD, previously reported by Zheng et al., is further supported by non-detection in the extremely deep optical images from the HFF campaign. With the new photometry, the best photometric redshift solution for MACS1149-JD reduces slightly to z = 9.44 ± 0.12. The young galaxy has an estimated stellar mass of (7+/- 2)× {10}8 {M}⊙ , and was formed at z={13.2}-1.6+1.9 when the universe was ≈300 Myr old. Data available for the first four HFF clusters have already enabled us to find faint galaxies to an intrinsic magnitude of {M}{UV}≃ -15.5, approximately a factor of 10 deeper than the parallel fields.
Mitochondrial DNA Diversity of Modern, Ancient and Wild Sheep (Ovis gmelinii anatolica) from Turkey: New Insights on the Evolutionary History of Sheep

PubMed Central

Pişkin, Evangelia; Engin, Atilla; Özer, Füsun; Yüncü, Eren; Doğan, Şükrü Anıl; Togan, İnci

2013-01-01

In the present study, to contribute to the understanding of the evolutionary history of sheep, the mitochondrial (mt) DNA polymorphisms occurring in modern Turkish native domestic (n = 628), modern wild (Ovis gmelinii anatolica) (n = 30) and ancient domestic sheep from Oylum Höyük in Kilis (n = 33) were examined comparatively with the accumulated data in the literature. The lengths (75 bp/76 bp) of the second and subsequent repeat units of the mtDNA control region (CR) sequences differentiated the five haplogroups (HPGs) observed in the domestic sheep into two genetic clusters as was already implied by other mtDNA markers: the first cluster being composed of HPGs A, B, D and the second cluster harboring HPGs C, E. To manifest genetic relatedness between wild Ovis gmelinii and domestic sheep haplogroups, their partial cytochrome B sequences were examined together on a median-joining network. The two parallel but wider aforementioned clusters were observed also on the network of Ovis gmelenii individuals, within which domestic haplogroups were embedded. The Ovis gmelinii wilds of the present day appeared to be distributed on two partially overlapping geographic areas parallel to the genetic clusters that they belong to (the first cluster being in the western part of the overall distribution). Thus, the analyses suggested that the domestic sheep may be the products of two maternally distinct ancestral Ovis gmelinii populations. Furthermore, Ovis gmelinii anatolica individuals exhibited a haplotype of HPG A (n = 22) and another haplotype (n = 8) from the second cluster which was not observed among the modern domestic sheep. HPG E, with the newly observed members (n = 11), showed signs of expansion. Studies of ancient and modern mtDNA suggest that HPG C frequency increased in the Southeast Anatolia from 6% to 22% some time after the beginning of the Hellenistic period, 500 years Before Common Era (BCE). PMID:24349158
Mitochondrial DNA diversity of modern, ancient and wild sheep(Ovis gmelinii anatolica) from Turkey: new insights on the evolutionary history of sheep.

PubMed

Demirci, Sevgin; Koban Baştanlar, Evren; Dağtaş, Nihan Dilşad; Pişkin, Evangelia; Engin, Atilla; Ozer, Füsun; Yüncü, Eren; Doğan, Sükrü Anıl; Togan, Inci

2013-01-01

In the present study, to contribute to the understanding of the evolutionary history of sheep, the mitochondrial (mt) DNA polymorphisms occurring in modern Turkish native domestic (n = 628), modern wild (Ovis gmelinii anatolica) (n = 30) and ancient domestic sheep from Oylum Höyük in Kilis (n = 33) were examined comparatively with the accumulated data in the literature. The lengths (75 bp/76 bp) of the second and subsequent repeat units of the mtDNA control region (CR) sequences differentiated the five haplogroups (HPGs) observed in the domestic sheep into two genetic clusters as was already implied by other mtDNA markers: the first cluster being composed of HPGs A, B, D and the second cluster harboring HPGs C, E. To manifest genetic relatedness between wild Ovis gmelinii and domestic sheep haplogroups, their partial cytochrome B sequences were examined together on a median-joining network. The two parallel but wider aforementioned clusters were observed also on the network of Ovis gmelenii individuals, within which domestic haplogroups were embedded. The Ovis gmelinii wilds of the present day appeared to be distributed on two partially overlapping geographic areas parallel to the genetic clusters that they belong to (the first cluster being in the western part of the overall distribution). Thus, the analyses suggested that the domestic sheep may be the products of two maternally distinct ancestral Ovis gmelinii populations. Furthermore, Ovis gmelinii anatolica individuals exhibited a haplotype of HPG A (n = 22) and another haplotype (n = 8) from the second cluster which was not observed among the modern domestic sheep. HPG E, with the newly observed members (n = 11), showed signs of expansion. Studies of ancient and modern mtDNA suggest that HPG C frequency increased in the Southeast Anatolia from 6% to 22% some time after the beginning of the Hellenistic period, 500 years Before Common Era (BCE).
Parallel Wavefront Analysis for a 4D Interferometer

NASA Technical Reports Server (NTRS)

Rao, Shanti R.

2011-01-01

This software provides a programming interface for automating data collection with a PhaseCam interferometer from 4D Technology, and distributing the image-processing algorithm across a cluster of general-purpose computers. Multiple instances of 4Sight (4D Technology s proprietary software) run on a networked cluster of computers. Each connects to a single server (the controller) and waits for instructions. The controller directs the interferometer to several images, then assigns each image to a different computer for processing. When the image processing is finished, the server directs one of the computers to collate and combine the processed images, saving the resulting measurement in a file on a disk. The available software captures approximately 100 images and analyzes them immediately. This software separates the capture and analysis processes, so that analysis can be done at a different time and faster by running the algorithm in parallel across several processors. The PhaseCam family of interferometers can measure an optical system in milliseconds, but it takes many seconds to process the data so that it is usable. In characterizing an adaptive optics system, like the next generation of astronomical observatories, thousands of measurements are required, and the processing time quickly becomes excessive. A programming interface distributes data processing for a PhaseCam interferometer across a Windows computing cluster. A scriptable controller program coordinates data acquisition from the interferometer, storage on networked hard disks, and parallel processing. Idle time of the interferometer is minimized. This architecture is implemented in Python and JavaScript, and may be altered to fit a customer s needs.
Evidence for dike emplacement beneath Iliamna Volcano, Alaska in 1996

USGS Publications Warehouse

Roman, D.C.; Power, J.A.; Moran, S.C.; Cashman, K.V.; Doukas, M.P.; Neal, C.A.; Gerlach, T.M.

2004-01-01

Two earthquake swarms, comprising 88 and 2833 locatable events, occurred beneath Iliamna Volcano, Alaska, in May and August of 1996. Swarm earthquakes ranged in magnitude from -0.9 to 3.3. Increases in SO2 and CO2 emissions detected during the fall of 1996 were coincident with the second swarm. No other physical changes were observed in or around the volcano during this time period. No eruption occurred, and seismicity and measured gas emissions have remained at background levels since mid-1997. Earthquake hypocenters recorded during the swarms form a cluster in a previously aseismic volume of crust located to the south of Iliamna's summit at a depth of -1 to 4 km below sea level. This cluster is elongated to the NNW-SSE, parallel to the trend of the summit and southern vents at Iliamna and to the regional axis of maximum compressive stress determined through inversion of fault-plane solutions for regional earthquakes. Fault-plane solutions calculated for 24 swarm earthquakes located at the top of the new cluster suggest a heterogeneous stress field acting during the second swarm, characterized by normal faulting and strike-slip faulting with p-axes parallel to the axis of regional maximum compressive stress. The increase in earthquake rates, the appearance of a new seismic volume, and the elevated gas emissions at Iliamna Volcano indicate that new magma intruded beneath the volcano in 1996. The elongation of the 1996-1997 earthquake cluster parallel to the direction of regional maximum compressive stress and the accelerated occurrence of both normal and strike-slip faulting in a small volume of crust at the top of the new seismic volume may be explained by the emplacement and inflation of a subvertical planar dike beneath the summit of Iliamna and its southern satellite vents. ?? 2003 Elsevier B.V. All rights reserved.
Using Cluster Bootstrapping to Analyze Nested Data With a Few Clusters.

PubMed

Huang, Francis L

2018-04-01

Cluster randomized trials involving participants nested within intact treatment and control groups are commonly performed in various educational, psychological, and biomedical studies. However, recruiting and retaining intact groups present various practical, financial, and logistical challenges to evaluators and often, cluster randomized trials are performed with a low number of clusters (~20 groups). Although multilevel models are often used to analyze nested data, researchers may be concerned of potentially biased results due to having only a few groups under study. Cluster bootstrapping has been suggested as an alternative procedure when analyzing clustered data though it has seen very little use in educational and psychological studies. Using a Monte Carlo simulation that varied the number of clusters, average cluster size, and intraclass correlations, we compared standard errors using cluster bootstrapping with those derived using ordinary least squares regression and multilevel models. Results indicate that cluster bootstrapping, though more computationally demanding, can be used as an alternative procedure for the analysis of clustered data when treatment effects at the group level are of primary interest. Supplementary material showing how to perform cluster bootstrapped regressions using R is also provided.
75 FR 41521 - Delphi Corporation, Automotive Holding Group, Instrument Cluster Plant, Currently Known as...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-16

..., Automotive Holding Group, Instrument Cluster Plant, Currently Known as General Motors Corporation, Including... Corporation, Automotive Holding Group, Instrument Cluster Plant, including on-site leased workers from... Material Management working on-site at Delphi Corporation, Automotive Holding Group, Instrument Cluster...
Parallel k-means++ for Multiple Shared-Memory Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mackey, Patrick S.; Lewis, Robert R.

2016-09-22

In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
Vascular system modeling in parallel environment - distributed and shared memory approaches

PubMed Central

Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

2011-01-01

The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Construction and comparison of parallel implicit kinetic solvers in three spatial dimensions

NASA Astrophysics Data System (ADS)

Titarev, Vladimir; Dumbser, Michael; Utyuzhnikov, Sergey

2014-01-01

The paper is devoted to the further development and systematic performance evaluation of a recent deterministic framework Nesvetay-3D for modelling three-dimensional rarefied gas flows. Firstly, a review of the existing discretization and parallelization strategies for solving numerically the Boltzmann kinetic equation with various model collision integrals is carried out. Secondly, a new parallelization strategy for the implicit time evolution method is implemented which improves scaling on large CPU clusters. Accuracy and scalability of the methods are demonstrated on a pressure-driven rarefied gas flow through a finite-length circular pipe as well as an external supersonic flow over a three-dimensional re-entry geometry of complicated aerodynamic shape.
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Collaborative Simulation Grid: Multiscale Quantum-Mechanical/Classical Atomistic Simulations on Distributed PC Clusters in the US and Japan

NASA Technical Reports Server (NTRS)

Kikuchi, Hideaki; Kalia, Rajiv; Nakano, Aiichiro; Vashishta, Priya; Iyetomi, Hiroshi; Ogata, Shuji; Kouno, Takahisa; Shimojo, Fuyuki; Tsuruta, Kanji; Saini, Subhash;

2002-01-01

A multidisciplinary, collaborative simulation has been performed on a Grid of geographically distributed PC clusters. The multiscale simulation approach seamlessly combines i) atomistic simulation backed on the molecular dynamics (MD) method and ii) quantum mechanical (QM) calculation based on the density functional theory (DFT), so that accurate but less scalable computations are performed only where they are needed. The multiscale MD/QM simulation code has been Grid-enabled using i) a modular, additive hybridization scheme, ii) multiple QM clustering, and iii) computation/communication overlapping. The Gridified MD/QM simulation code has been used to study environmental effects of water molecules on fracture in silicon. A preliminary run of the code has achieved a parallel efficiency of 94% on 25 PCs distributed over 3 PC clusters in the US and Japan, and a larger test involving 154 processors on 5 distributed PC clusters is in progress.

Cross sectional TEM analysis of duplex HIPIMS and DC magnetron sputtered Mo and W doped carbon coatings

NASA Astrophysics Data System (ADS)

Sharp, J.; Castillo Muller, I.; Mandal, P.; Abbas, A.; West, G.; Rainforth, W. M.; Ehiasarian, A.; Hovsepian, P.

2015-10-01

A FIB lift-out sample was made from a wear-resistant carbon coating deposited by high power impulse magnetron sputtering (HIPIMS) with Mo and W. TEM analysis found columnar grains extending the whole ∼1800 nm thick film. Within the grains, the carbon was found to be organised into clusters showing some onion-like structure, with amorphous material between them; energy dispersive X-ray spectroscopy (EDS) found these clusters to be Mo- and W-rich in a later, thinner sample of the same material. Electron energy-loss spectroscopy (EELS) showed no difference in C-K edge, implying the bonding type to be the same in cluster and matrix. These clusters were arranged into stripes parallel to the film plane, of spacing 7-8 nm; there was a modulation in spacing between clusters within these stripes that produced a second, coarser set of striations of spacing ∼37 nm.
ClusCo: clustering and comparison of protein models.

PubMed

Jamroz, Michal; Kolinski, Andrzej

2013-02-22

The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the one of the bottlenecks in the protein modeling pipeline. Clusco is fast and easy-to-use software for high-throughput comparison of protein models with different similarity measures (cRMSD, dRMSD, GDT_TS, TM-Score, MaxSub, Contact Map Overlap) and clustering of the comparison results with standard methods: K-means Clustering or Hierarchical Agglomerative Clustering. The application was highly optimized and written in C/C++, including the code for parallel execution on CPU and GPU, which resulted in a significant speedup over similar clustering and scoring computation programs.
Parallel algorithm of VLBI software correlator under multiprocessor environment

NASA Astrophysics Data System (ADS)

Zheng, Weimin; Zhang, Dong

2007-11-01

The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Efficient computation of k-Nearest Neighbour Graphs for large high-dimensional data sets on GPU clusters.

PubMed

Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M

2013-01-01

This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
High-performance dynamic quantum clustering on graphics processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wittek, Peter, E-mail: peterwittek@acm.org

2013-01-15

Clustering methods in machine learning may benefit from borrowing metaphors from physics. Dynamic quantum clustering associates a Gaussian wave packet with the multidimensional data points and regards them as eigenfunctions of the Schroedinger equation. The clustering structure emerges by letting the system evolve and the visual nature of the algorithm has been shown to be useful in a range of applications. Furthermore, the method only uses matrix operations, which readily lend themselves to parallelization. In this paper, we develop an implementation on graphics hardware and investigate how this approach can accelerate the computations. We achieve a speedup of up tomore » two magnitudes over a multicore CPU implementation, which proves that quantum-like methods and acceleration by graphics processing units have a great relevance to machine learning.« less

Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Parallel architectures for iterative methods on adaptive, block structured grids

NASA Technical Reports Server (NTRS)

Gannon, D.; Vanrosendale, J.

1983-01-01

A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
An open cluster-randomized, 18-month trial to compare the effectiveness of educational outreach visits with usual guideline dissemination to improve family physician prescribing

PubMed Central

2014-01-01

Background The Portuguese National Health Directorate has issued clinical practice guidelines on prescription of anti-inflammatory drugs, acid suppressive therapy, and antiplatelets. However, their effectiveness in changing actual practice is unknown. Methods The study will compare the effectiveness of educational outreach visits regarding the improvement of compliance with clinical guidelines in primary care against usual dissemination strategies. A cost-benefit analysis will also be conducted. We will carry out a parallel, open, superiority, randomized trial directed to primary care physicians. Physicians will be recruited and allocated at a cluster-level (primary care unit) by minimization. Data will be analyzed at the physician level. Primary care units will be eligible if they use electronic prescribing and have at least four physicians willing to participate. Physicians in intervention units will be offered individual educational outreach visits (one for each guideline) at their workplace during a six-month period. Physicians in the control group will be offered a single unrelated group training session. Primary outcomes will be the proportion of cyclooxygenase-2 inhibitors prescribed in the anti-inflammatory class, and the proportion of omeprazole in the proton pump inhibitors class at 18 months post-intervention. Prescription data will be collected from the regional pharmacy claims database. We estimated a sample size of 110 physicians in each group, corresponding to 19 clusters with a mean size of 6 physicians. Outcome collection and data analysis will be blinded to allocation, but due to the nature of the intervention, physicians and detailers cannot be blinded. Discussion This trial will attempt to address unresolved issues in the literature, namely, long term persistence of effect, the importance of sequential visits in an outreach program, and cost issues. If successful, this trial may be the cornerstone for deploying large scale educational outreach programs within the Portuguese National Health Service. Trial registration ClinicalTrials.gov number NCT01984034. PMID:24423370
Parallel and Serial Grouping of Image Elements in Visual Perception

ERIC Educational Resources Information Center

Houtkamp, Roos; Roelfsema, Pieter R.

2010-01-01

The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some…
"GET-UP" study rationale and protocol: a cluster randomised controlled trial to evaluate the effects of reduced sitting on toddlers' cognitive development.

PubMed

Santos, Rute; Cliff, Dylan P; Howard, Steven J; Veldman, Sanne L; Wright, Ian M; Sousa-Sá, Eduarda; Pereira, João R; Okely, Anthony D

2016-11-09

The educational and cognitive differences associated with low socioeconomic status begin early in life and tend to persist throughout life. Coupled with the finding that levels of sedentary time are negatively associated with cognitive development, and time spent active tends to be lower in disadvantaged circumstances, this highlights the need for interventions that reduce the amount of time children spend sitting and sedentary during childcare. The proposed study aims to assess the effects of reducing sitting time during Early Childhood Education and Care (ECEC) services on cognitive development in toddlers from low socio-economic families. We will implement a 12-months 2-arm parallel group cluster randomised controlled trial (RCT) with Australian toddlers, aged 12 to 26 months at baseline. Educators from the ECEC services allocated to the intervention group will receive professional development on how to reduce sitting time while children attend ECEC. Participants' cognitive development will be assessed as a primary outcome, at baseline and post-intervention, using the cognitive sub-test from the Bayley Scales of Infant and Toddler Development. This trial has the potential to inform programs and policies designed to optimize developmental and health outcomes in toddlers, specifically in those from disadvantaged backgrounds. Australian New Zealand Clinical Trials Registry: ACTRN12616000471482 , 11/04/2016, retrospectively registered.
Cluster analysis of accelerated molecular dynamics simulations: A case study of the decahedron to icosahedron transition in Pt nanoparticles.

PubMed

Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F; Perez, Danny

2017-10-21

Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
Cluster analysis of accelerated molecular dynamics simulations: A case study of the decahedron to icosahedron transition in Pt nanoparticles

NASA Astrophysics Data System (ADS)

Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F.; Perez, Danny

2017-10-01

Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

PubMed

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

PubMed Central

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Development and parallelization of a direct numerical simulation to study the formation and transport of nanoparticle clusters in a viscous fluid

NASA Astrophysics Data System (ADS)

Sloan, Gregory James

The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
Node Resource Manager: A Distributed Computing Software Framework Used for Solving Geophysical Problems

NASA Astrophysics Data System (ADS)

Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.

2011-12-01

With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Parallel evolutionary computation in bioinformatics applications.

PubMed

Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

2013-05-01

A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
A hybrid parallel framework for the cellular Potts model simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Yi; He, Kejing; Dong, Shoubin

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
Parallel computing in genomic research: advances and applications

PubMed Central

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today’s genomic experiments have to process the so-called “biological big data” that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. PMID:26604801
Computational strategies for three-dimensional flow simulations on distributed computer systems. Ph.D. Thesis Semiannual Status Report, 15 Aug. 1993 - 15 Feb. 1994

NASA Technical Reports Server (NTRS)

Weed, Richard Allen; Sankar, L. N.

1994-01-01

An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.
Parallel computing in genomic research: advances and applications.

PubMed

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

NASA Astrophysics Data System (ADS)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Pre-processing and post-processing in group-cluster mergers

NASA Astrophysics Data System (ADS)

Vijayaraghavan, R.; Ricker, P. M.

2013-11-01

Galaxies in clusters are more likely to be of early type and to have lower star formation rates than galaxies in the field. Recent observations and simulations suggest that cluster galaxies may be `pre-processed' by group or filament environments and that galaxies that fall into a cluster as part of a larger group can stay coherent within the cluster for up to one orbital period (`post-processing'). We investigate these ideas by means of a cosmological N-body simulation and idealized N-body plus hydrodynamics simulations of a group-cluster merger. We find that group environments can contribute significantly to galaxy pre-processing by means of enhanced galaxy-galaxy merger rates, removal of galaxies' hot halo gas by ram pressure stripping and tidal truncation of their galaxies. Tidal distortion of the group during infall does not contribute to pre-processing. Post-processing is also shown to be effective: galaxy-galaxy collisions are enhanced during a group's pericentric passage within a cluster, the merger shock enhances the ram pressure on group and cluster galaxies and an increase in local density during the merger leads to greater galactic tidal truncation.
Structures of 38-atom gold-platinum nanoalloy clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ong, Yee Pin; Yoon, Tiem Leong; Lim, Thong Leng

2015-04-24

Bimetallic nanoclusters, such as gold-platinum nanoclusters, are nanomaterials promising wide range of applications. We perform a numerical study of 38-atom gold-platinum nanoalloy clusters, Au{sub n}Pt{sub 38−n} (0 ≤ n ≤ 38), to elucidate the geometrical structures of these clusters. The lowest-energy structures of these bimetallic nanoclusters at the semi-empirical level are obtained via a global-minimum search algorithm known as parallel tempering multi-canonical basin hopping plus genetic algorithm (PTMBHGA), in which empirical Gupta many-body potential is used to describe the inter-atomic interactions among the constituent atoms. The structures of gold-platinum nanoalloy clusters are predicted to be core-shell segregated nanoclusters. Gold atomsmore » are observed to preferentially occupy the surface of the clusters, while platinum atoms tend to occupy the core due to the slightly smaller atomic radius of platinum as compared to gold’s. The evolution of the geometrical structure of 38-atom Au-Pt clusters displays striking similarity with that of 38-atom Au-Cu nanoalloy clusters as reported in the literature.« less
Combinations of SNP genotypes from the Wellcome Trust Case Control Study of bipolar patients.

PubMed

Mellerup, Erling; Jørgensen, Martin Balslev; Dam, Henrik; Møller, Gert Lykke

2018-04-01

Combinations of genetic variants are the basis for polygenic disorders. We examined combinations of SNP genotypes taken from the 446 729 SNPs in The Wellcome Trust Case Control Study of bipolar patients. Parallel computing by graphics processing units, cloud computing, and data mining tools were used to scan The Wellcome Trust data set for combinations. Two clusters of combinations were significantly associated with bipolar disorder. One cluster contained 68 combinations, each of which included five SNP genotypes. Of the 1998 patients, 305 had combinations from this cluster in their genome, but none of the 1500 controls had any of these combinations in their genome. The other cluster contained six combinations, each of which included five SNP genotypes. Of the 1998 patients, 515 had combinations from the cluster in their genome, but none of the 1500 controls had any of these combinations in their genome. Clusters of combinations of genetic variants can be considered general risk factors for polygenic disorders, whereas accumulation of combinations from the clusters in the genome of a patient can be considered a personal risk factor.

Stochastic dynamics of small ensembles of non-processive molecular motors: The parallel cluster model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erdmann, Thorsten; Albert, Philipp J.; Schwarz, Ulrich S.

2013-11-07

Non-processive molecular motors have to work together in ensembles in order to generate appreciable levels of force or movement. In skeletal muscle, for example, hundreds of myosin II molecules cooperate in thick filaments. In non-muscle cells, by contrast, small groups with few tens of non-muscle myosin II motors contribute to essential cellular processes such as transport, shape changes, or mechanosensing. Here we introduce a detailed and analytically tractable model for this important situation. Using a three-state crossbridge model for the myosin II motor cycle and exploiting the assumptions of fast power stroke kinetics and equal load sharing between motors inmore » equivalent states, we reduce the stochastic reaction network to a one-step master equation for the binding and unbinding dynamics (parallel cluster model) and derive the rules for ensemble movement. We find that for constant external load, ensemble dynamics is strongly shaped by the catch bond character of myosin II, which leads to an increase of the fraction of bound motors under load and thus to firm attachment even for small ensembles. This adaptation to load results in a concave force-velocity relation described by a Hill relation. For external load provided by a linear spring, myosin II ensembles dynamically adjust themselves towards an isometric state with constant average position and load. The dynamics of the ensembles is now determined mainly by the distribution of motors over the different kinds of bound states. For increasing stiffness of the external spring, there is a sharp transition beyond which myosin II can no longer perform the power stroke. Slow unbinding from the pre-power-stroke state protects the ensembles against detachment.« less
LoCuSS: The infall of X-ray groups onto massive clusters

NASA Astrophysics Data System (ADS)

Haines, C. P.; Finoguenov, A.; Smith, G. P.; Babul, A.; Egami, E.; Mazzotta, P.; Okabe, N.; Pereira, M. J.; Bianconi, M.; McGee, S. L.; Ziparo, F.; Campusano, L. E.; Loyola, C.

2018-03-01

Galaxy clusters are expected to form hierarchically in a ΛCDM universe, growing primarily through mergers with lower mass clusters and the continual accretion of group-mass halos. Galaxy clusters assemble late, doubling their masses since z ˜ 0.5, and so the outer regions of clusters should be replete with accreting group-mass systems. We present an XMM-Newton survey to search for X-ray groups in the infall regions of 23 massive galaxy clusters ( ˜ 1015 M⊙) at z ˜ 0.2, identifying 39 X-ray groups that have been spectroscopically confirmed to lie at the cluster redshift. These groups have mass estimates in the range 2 × 1013 - 7 × 1014 M⊙, and group-to-cluster mass ratios as low as 0.02. The comoving number density of X-ray groups in the infall regions is ˜25 × higher than that seen for isolated X-ray groups from the XXL survey. The average mass per cluster contained within these X-ray groups is 2.2 × 1014 M⊙, or 19 ± 5% of the mass within the primary cluster itself. We estimate that ˜1015 M⊙ clusters increase their masses by 16 ± 4% between z = 0.223 and the present day due to the accretion of groups with M200 ≥ 1013.2 M⊙. This represents about half of the expected mass growth rate of clusters at these late epochs. The other half is likely to come from smooth accretion of matter not bound within halos. The mass function of the infalling X-ray groups appears significantly top heavy with respect to that of "field" X-ray systems, consistent with expectations from numerical simulations, and the basic consequences of collapsed massive dark matter halos being biased tracers of the underlying large-scale density distribution.
LoCuSS: The infall of X-ray groups on to massive clusters

NASA Astrophysics Data System (ADS)

Haines, C. P.; Finoguenov, A.; Smith, G. P.; Babul, A.; Egami, E.; Mazzotta, P.; Okabe, N.; Pereira, M. J.; Bianconi, M.; McGee, S. L.; Ziparo, F.; Campusano, L. E.; Loyola, C.

2018-07-01

Galaxy clusters are expected to form hierarchically in a Λ cold dark matter (ΛCDM) universe, growing primarily through mergers with lower mass clusters and the continual accretion of group-mass haloes. Galaxy clusters assemble late, doubling their masses since z ˜ 0.5, and so the outer regions of clusters should be replete with accreting group-mass systems. We present an XMM-Newton survey to search for X-ray groups in the infall regions of 23 massive galaxy clusters ( ˜ 1015 M⊙) at z ˜ 0.2, identifying 39 X-ray groups that have been spectroscopically confirmed to lie at the cluster redshift. These groups have mass estimates in the range 2 × 1013-7 × 1014 M⊙, and group-to-cluster mass ratios as low as 0.02. The comoving number density of X-ray groups in the infall regions is ˜25× higher than that seen for isolated X-ray groups from the XXL survey. The average mass per cluster contained within these X-ray groups is 2.2 × 1014 M⊙, or 19 ± 5 per cent of the mass within the primary cluster itself. We estimate that ˜1015 M⊙ clusters increase their masses by 16 ± 4 per cent between z = 0.223 and the present day due to the accretion of groups with M200 ≥ 1013.2 M⊙. This represents about half of the expected mass growth rate of clusters at these late epochs. The other half is likely to come from smooth accretion of matter not bound within haloes. The mass function of the infalling X-ray groups appears significantly top heavy with respect to that of `field' X-ray systems, consistent with expectations from numerical simulations, and the basic consequences of collapsed massive dark matter haloes being biased tracers of the underlying large-scale density distribution.
Counties eliminating racial disparities in colorectal cancer mortality.

PubMed

Rust, George; Zhang, Shun; Yu, Zhongyuan; Caplan, Lee; Jain, Sanjay; Ayer, Turgay; McRoy, Luceta; Levine, Robert S

2016-06-01

Although colorectal cancer (CRC) mortality rates are declining, racial-ethnic disparities in CRC mortality nationally are widening. Herein, the authors attempted to identify county-level variations in this pattern, and to characterize counties with improving disparity trends. The authors examined 20-year trends in US county-level black-white disparities in CRC age-adjusted mortality rates during the study period between 1989 and 2010. Using a mixed linear model, counties were grouped into mutually exclusive patterns of black-white racial disparity trends in age-adjusted CRC mortality across 20 three-year rolling average data points. County-level characteristics from census data and from the Area Health Resources File were normalized and entered into a principal component analysis. Multinomial logistic regression models were used to test the relation between these factors (clusters of related contextual variables) and the disparity trend pattern group for each county. Counties were grouped into 4 disparity trend pattern groups: 1) persistent disparity (parallel black and white trend lines); 2) diverging (widening disparity); 3) sustained equality; and 4) converging (moving from disparate outcomes toward equality). The initial principal component analysis clustered the 82 independent variables into a smaller number of components, 6 of which explained 47% of the county-level variation in disparity trend patterns. County-level variation in social determinants, health care workforce, and health systems all were found to contribute to variations in cancer mortality disparity trend patterns from 1990 through 2010. Counties sustaining equality over time or moving from disparities to equality in cancer mortality suggest that disparities are not inevitable, and provide hope that more communities can achieve optimal and equitable cancer outcomes for all. Cancer 2016;122:1735-48. © 2016 American Cancer Society. © 2016 American Cancer Society.
Efficacy of a technology-based, integrated smoking cessation and alcohol intervention for smoking cessation in adolescents: Results of a cluster-randomised controlled trial.

PubMed

Haug, Severin; Paz Castro, Raquel; Kowatsch, Tobias; Filler, Andreas; Schaub, Michael P

2017-11-01

To test the efficacy of a technology-based integrated smoking cessation and alcohol intervention versus a smoking cessation only intervention in adolescents. This was a two-arm, parallel-group, cluster-randomised controlled trial with assessments at baseline and six months follow-up. Subjects in both groups received tailored mobile phone text messages to support smoking cessation for 3months, and the option of registering for a program incorporating strategies for smoking cessation centred around a self-defined quit date. Subjects in the integrated intervention group also received tailored feedback regarding their consumption of alcohol and, for binge drinkers, tailored mobile phone text messages encouraging them to maintain their drinking within low-risk limits over a 3-month period. Primary outcome measures were the 7-day point prevalence of smoking abstinence and change in cigarette consumption. In 360 Swiss vocational and upper secondary school classes, 2127 students who smoked tobacco regularly and owned a mobile phone were invited to participate in the study. Of these, 1471 (69.2%) participated and 6-month follow-up data were obtained for 1116 (75.9%). No significant group differences were observed for any of the primary or secondary outcomes. Moderator analyses revealed beneficial intervention effects concerning 7-day smoking abstinence in participants with higher versus lower alcohol consumption. Overall, the integrated smoking cessation and alcohol intervention exhibited no advantages over a smoking cessation only intervention, but it might be more effective for the subgroup of adolescent smokers with higher alcohol consumption. Providing a combined smoking cessation and alcohol intervention might be recommended for adolescent smokers with higher-level alcohol consumption. Copyright © 2017 Elsevier Inc. All rights reserved.
Multi-gene phylogenetic analysis reveals that shochu-fermenting Saccharomyces cerevisiae strains form a distinct sub-clade of the Japanese sake cluster.

PubMed

Futagami, Taiki; Kadooka, Chihiro; Ando, Yoshinori; Okutsu, Kayu; Yoshizaki, Yumiko; Setoguchi, Shinji; Takamine, Kazunori; Kawai, Mikihiko; Tamaki, Hisanori

2017-10-01

Shochu is a traditional Japanese distilled spirit. The formation of the distinguishing flavour of shochu produced in individual distilleries is attributed to putative indigenous yeast strains. In this study, we performed the first (to our knowledge) phylogenetic classification of shochu strains based on nucleotide gene sequences. We performed phylogenetic classification of 21 putative indigenous shochu yeast strains isolated from 11 distilleries. All of these strains were shown or confirmed to be Saccharomyces cerevisiae, sharing species identification with 34 known S. cerevisiae strains (including commonly used shochu, sake, ale, whisky, bakery, bioethanol and laboratory yeast strains and clinical isolate) that were tested in parallel. Our analysis used five genes that reflect genome-level phylogeny for the strain-level classification. In a first step, we demonstrated that partial regions of the ZAP1, THI7, PXL1, YRR1 and GLG1 genes were sufficient to reproduce previous sub-species classifications. In a second step, these five analysed regions from each of 25 strains (four commonly used shochu strains and the 21 putative indigenous shochu strains) were concatenated and used to generate a phylogenetic tree. Further analysis revealed that the putative indigenous shochu yeast strains form a monophyletic group that includes both the shochu yeasts and a subset of the sake group strains; this cluster is a sister group to other sake yeast strains, together comprising a sake-shochu group. Differences among shochu strains were small, suggesting that it may be possible to correlate subtle phenotypic differences among shochu flavours with specific differences in genome sequences. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Improving the quality of hospital care for children by supportive supervision: a cluster randomized trial, Kyrgyzstan

PubMed Central

Shukurova, Venera; Davletbaeva, Marina; Monolbaev, Kubanychbek; Kulichenko, Tatiana; Akoev, Yuri; Bakradze, Maya; Margieva, Tea; Mityushino, Ilya; Namazova-Baranova, Leyla; Boronbayeva, Elnura; Kuttumuratova, Aigul; Weber, Martin Willy; Tamburlini, Giorgio

2017-01-01

Abstract Objective To determine whether periodic supportive supervision after a training course improved the quality of paediatric hospital care in Kyrgyzstan, where inappropriate care was common but in-hospital postnatal mortality was low. Methods In a cluster, randomized, parallel-group trial, 10 public hospitals were allocated to a 4-day World Health Organization (WHO) course on hospital care for children followed by periodic supportive supervision by paediatricians for 1 year, while 10 hospitals had no intervention. We assessed prospectively 10 key indicators of inappropriate paediatric case management, as indicated by WHO guidelines. The primary indicator was the combination of the three indicators: unnecessary hospitalization, increased iatrogenic risk and unnecessary painful procedures. An independent team evaluated the overall quality of care. Findings We prospectively reviewed the medical records of 4626 hospitalized children aged 2 to 60 months. In the intervention hospitals, the mean proportion of the primary indicator decreased from 46.9% (95% confidence interval, CI: 24.2 to 68.9) at baseline to 6.8% (95% CI: 1.1 to 12.1) at 1 year, but was unchanged in the control group (45.5%, 95% CI: 25.2 to 67.9, to 64.7%, 95% CI: 43.3 to 86.1). At 1 year, the risk ratio for the primary indicator in the intervention versus the control group was 0.09 (95% CI: 0.06 to 0.13). The proportions of the other nine indicators also decreased in the intervention group (P < 0.0001 for all). Overall quality of care improved significantly in intervention hospitals. Conclusion Periodic supportive supervision for 1 year after a training course improved both adherence to WHO guidelines on hospital care for children and the overall quality of paediatric care. PMID:28603306
Intracluster medium cooling, AGN feedback, and brightest cluster galaxy properties of galaxy groups. Five properties where groups differ from clusters

NASA Astrophysics Data System (ADS)

Bharadwaj, V.; Reiprich, T. H.; Schellenberger, G.; Eckmiller, H. J.; Mittal, R.; Israel, H.

2014-12-01

Aims: We aim to investigate cool-core and non-cool-core properties of galaxy groups through X-ray data and compare them to the AGN radio output to understand the network of intracluster medium (ICM) cooling and feedback by supermassive black holes. We also aim to investigate the brightest cluster galaxies (BCGs) to see how they are affected by cooling and heating processes, and compare the properties of groups to those of clusters. Methods: Using Chandra data for a sample of 26 galaxy groups, we constrained the central cooling times (CCTs) of the ICM and classified the groups as strong cool-core (SCC), weak cool-core (WCC), and non-cool-core (NCC) based on their CCTs. The total radio luminosity of the BCG was obtained using radio catalogue data and/or literature, which in turn was compared to the cooling time of the ICM to understand the link between gas cooling and radio output. We determined K-band luminosities of the BCG with 2MASS data, and used a scaling relation to constrain the masses of the supermassive black holes, which were then compared to the radio output. We also tested for correlations between the BCG luminosity and the overall X-ray luminosity and mass of the group. The results obtained for the group sample were also compared to previous results for clusters. Results: The observed cool-core/non-cool-core fractions for groups are comparable to those of clusters. However, notable differences are seen: 1) for clusters, all SCCs have a central temperature drop, but for groups this is not the case as some have centrally rising temperature profiles despite very short cooling times; 2) while for the cluster sample, all SCC clusters have a central radio source as opposed to only 45% of the NCCs, for the group sample, all NCC groups have a central radio source as opposed to 77% of the SCC groups; 3) for clusters, there are indications of an anticorrelation trend between radio luminosity and CCT. However, for groups this trend is absent; 4) the indication of a trend of radio luminosity with black hole mass observed in SCC clusters is absent for groups; and 5) similarly, the strong correlation observed between the BCG luminosity and the cluster X-ray luminosity/cluster mass weakens significantly for groups. Conclusions: We conclude that there are important differences between clusters and groups within the ICM cooling/AGN feedback paradigm and speculate that more gas is fueling star formation in groups than in clusters where much of the gas is thought to feed the central AGN. Table 6 and Appendices A-C are available in electronic form at http://www.aanda.org
Parallel Calculations in LS-DYNA

NASA Astrophysics Data System (ADS)

Vartanovich Mkrtychev, Oleg; Aleksandrovich Reshetov, Andrey

2017-11-01

Nowadays, structural mechanics exhibits a trend towards numeric solutions being found for increasingly extensive and detailed tasks, which requires that capacities of computing systems be enhanced. Such enhancement can be achieved by different means. E.g., in case a computing system is represented by a workstation, its components can be replaced and/or extended (CPU, memory etc.). In essence, such modification eventually entails replacement of the entire workstation, i.e. replacement of certain components necessitates exchange of others (faster CPUs and memory devices require buses with higher throughput etc.). Special consideration must be given to the capabilities of modern video cards. They constitute powerful computing systems capable of running data processing in parallel. Interestingly, the tools originally designed to render high-performance graphics can be applied for solving problems not immediately related to graphics (CUDA, OpenCL, Shaders etc.). However, not all software suites utilize video cards’ capacities. Another way to increase capacity of a computing system is to implement a cluster architecture: to add cluster nodes (workstations) and to increase the network communication speed between the nodes. The advantage of this approach is extensive growth due to which a quite powerful system can be obtained by combining not particularly powerful nodes. Moreover, separate nodes may possess different capacities. This paper considers the use of a clustered computing system for solving problems of structural mechanics with LS-DYNA software. To establish a range of dependencies a mere 2-node cluster has proven sufficient.
Seismotectonics of the Nicobar Swarm and the geodynamic implications for the 2004 Great Sumatran Earthquake

NASA Astrophysics Data System (ADS)

Lister, Gordon

2017-04-01

The Great Sumatran Earthquake took place on 26th December 2004. One month into the aftershock sequence, a dense swarm of earthquakes took place beneath the Andaman Sea, northeast of the Nicobar Islands. The swarm continued for ˜11 days, rapidly decreasing in intensity towards the end of that period. Unlike most earthquake swarms, the Nicobar cluster was characterised by a large number of shocks with moment magnitude exceeding five. This meant that centroid moment tensor data could be determined, and this data in turn allows geometric analysis of inferred fault plane motions. The classification obtained using program eQuakes shows aftershocks falling into distinct spatial groups. Thrusts dominate in the south (in the Sumatran domain), and normal faults dominate in the north (in the Andaman domain). Strike-slip faults are more evenly spread. They occur on the Sumatran wrench system, for example, but also on the Indian plate itself. Orientation groups readily emerge from such an analysis. Temporal variation in behaviour is immediately evident, changing after ˜12 months. Orientation groups in the first twelve months are consistent with margin perpendicular extension beneath the Andaman Sea (i.e. mode II megathrust behaviour) whereas afterward the pattern of deformation appears to have reverted to that expected in consequence of relative plate motion. In the first twelve months, strike-slip motion appears to have taken place on faults that are sub-parallel to spreading segments in the Andaman Sea. By early 2006 however normal fault clusters formed that showed ˜N-S extension across these spreading segments had resumed, while the overall density of aftershocks in the Andaman segment had considerably diminished. Throughout this entire period the Sumatran segment exhibited aftershock sequences consistent with ongoing Mode I megathrust behaviour. The Nicobar Swarm marks the transition from one sort of slab dynamics to the other. The earthquake swarm may have been facilitated by hydrothermal activity related to a seamount, or by magma intrusion. However, the swarm is located where the transpressional regime of the Sumatran strike-slip fault system changes to that of the 'microplate-bounding' transtensional wrench involved in the Andaman Sea spreading centre. The swarm thus may be the result of the confluence of two tectonic modes of afterslip on the main rupture, with arc-normal compression to the south, and arc-normal extension to the north. The orientations of the controlling faults can be related to the right-lateral Sumatran strike-slip system, and to oceanic transforms in the spreading system. Faults parallel to the Andaman Sea spreading system axis reactivated as left-lateral strike-slip faults during the period of afterslip. Analysis of the orientation groups shows that the swarm involved synchronous but geometrically incompatible movements on opposing but conjugate fault plane sets with trends that are consistent with Mohr-Coulomb failure, even though the orientation groups delineated require slip in many different directions on these planes. The fault planes allow inference of regional deviatoric stress axes with the principal compressive stress parallel to the prior distortion inferred using satellite geodesy.
irGPU.proton.Net: Irregular strong charge interaction networks of protonatable groups in protein molecules--a GPU solver using the fast multipole method and statistical thermodynamics.

PubMed

Kantardjiev, Alexander A

2015-04-05

A cluster of strongly interacting ionization groups in protein molecules with irregular ionization behavior is suggestive for specific structure-function relationship. However, their computational treatment is unconventional (e.g., lack of convergence in naive self-consistent iterative algorithm). The stringent evaluation requires evaluation of Boltzmann averaged statistical mechanics sums and electrostatic energy estimation for each microstate. irGPU: Irregular strong interactions in proteins--a GPU solver is novel solution to a versatile problem in protein biophysics--atypical protonation behavior of coupled groups. The computational severity of the problem is alleviated by parallelization (via GPU kernels) which is applied for the electrostatic interaction evaluation (including explicit electrostatics via the fast multipole method) as well as statistical mechanics sums (partition function) estimation. Special attention is given to the ease of the service and encapsulation of theoretical details without sacrificing rigor of computational procedures. irGPU is not just a solution-in-principle but a promising practical application with potential to entice community into deeper understanding of principles governing biomolecule mechanisms. © 2015 Wiley Periodicals, Inc.
Hierarchical clustering of HPV genotype patterns in the ASCUS-LSIL triage study

PubMed Central

Wentzensen, Nicolas; Wilson, Lauren E.; Wheeler, Cosette M.; Carreon, Joseph D.; Gravitt, Patti E.; Schiffman, Mark; Castle, Philip E.

2010-01-01

Anogenital cancers are associated with about 13 carcinogenic HPV types in a broader group that cause cervical intraepithelial neoplasia (CIN). Multiple concurrent cervical HPV infections are common which complicate the attribution of HPV types to different grades of CIN. Here we report the analysis of HPV genotype patterns in the ASCUS-LSIL triage study using unsupervised hierarchical clustering. Women who underwent colposcopy at baseline (n = 2780) were grouped into 20 disease categories based on histology and cytology. Disease groups and HPV genotypes were clustered using complete linkage. Risk of 2-year cumulative CIN3+, viral load, colposcopic impression, and age were compared between disease groups and major clusters. Hierarchical clustering yielded four major disease clusters: Cluster 1 included all CIN3 histology with abnormal cytology; Cluster 2 included CIN3 histology with normal cytology and combinations with either CIN2 or high-grade squamous intraepithelial lesion (HSIL) cytology; Cluster 3 included older women with normal or low grade histology/cytology and low viral load; Cluster 4 included younger women with low grade histology/cytology, multiple infections, and the highest viral load. Three major groups of HPV genotypes were identified: Group 1 included only HPV16; Group 2 included nine carcinogenic types plus non-carcinogenic HPV53 and HPV66; and Group 3 included non-carcinogenic types plus carcinogenic HPV33 and HPV45. Clustering results suggested that colposcopy missed a prevalent precancer in many women with no biopsy/normal histology and HSIL. This result was confirmed by an elevated 2-year risk of CIN3+ in these groups. Our novel approach to study multiple genotype infections in cervical disease using unsupervised hierarchical clustering can address complex genotype distributions on a population level. PMID:20959485
A MULTICORE BASED PARALLEL IMAGE REGISTRATION METHOD

PubMed Central

Yang, Lin; Gong, Leiguang; Zhang, Hong; Nosher, John L.; Foran, David J.

2012-01-01

Image registration is a crucial step for many image-assisted clinical applications such as surgery planning and treatment evaluation. In this paper we proposed a landmark based nonlinear image registration algorithm for matching 2D image pairs. The algorithm was shown to be effective and robust under conditions of large deformations. In landmark based registration, the most important step is establishing the correspondence among the selected landmark points. This usually requires an extensive search which is often computationally expensive. We introduced a nonregular data partition algorithm using the K-means clustering algorithm to group the landmarks based on the number of available processing cores. The step optimizes the memory usage and data transfer. We have tested our method using IBM Cell Broadband Engine (Cell/B.E.) platform. PMID:19964921
Posttraumatic idioms of distress among Darfur refugees: Hozun and Majnun.

PubMed

Rasmussen, Andrew; Katoni, Basila; Keller, Allen S; Wilkinson, John

2011-09-01

Although psychosocial programming is seen as essential to the humanitarian response to the Darfur conflict, aid groups lack culturally-appropriate assessment instruments for monitoring and evaluation. The current study used an emic-etic integrated approach to: (i) create a culturally-appropriate measure of distress (Study 1), and (ii) test the measure in structured interviews of 848 Darfuris living in two refugee camps in Chad (Study 2). Traditional healers identified two trauma-related idioms, hozun and majnun, which shared features with but were not identical to posttraumatic stress disorder and depression. Measures of these constructs were reliable and correlated with trauma, loss, and functional impairment. Exploratory factor analysis resulted in empirical symptom clusters conceptually parallel to general Western psychiatric constructs. Findings are discussed in terms of their implications for psychosocial programming.
GREEN SUPERCOMPUTING IN A DESKTOP BOX

DOE Office of Scientific and Technical Information (OSTI.GOV)

HSU, CHUNG-HSING; FENG, WU-CHUN; CHING, AVERY

2007-01-17

The computer workstation, introduced by Sun Microsystems in 1982, was the tool of choice for scientists and engineers as an interactive computing environment for the development of scientific codes. However, by the mid-1990s, the performance of workstations began to lag behind high-end commodity PCs. This, coupled with the disappearance of BSD-based operating systems in workstations and the emergence of Linux as an open-source operating system for PCs, arguably led to the demise of the workstation as we knew it. Around the same time, computational scientists started to leverage PCs running Linux to create a commodity-based (Beowulf) cluster that provided dedicatedmore » computer cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large supercomputers, i.e., supercomputing for the few. However, as the cluster movement has matured, with respect to cluster hardware and open-source software, these clusters have become much more like their large-scale supercomputing brethren - a shared (and power-hungry) datacenter resource that must reside in a machine-cooled room in order to operate properly. Consequently, the above observations, when coupled with the ever-increasing performance gap between the PC and cluster supercomputer, provide the motivation for a 'green' desktop supercomputer - a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation 1 'pizza box' workstation. In this paper, they present the hardware and software architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop supercomputer that achieves 14 Gflops on Linpack but sips only 185 watts of power at load, resulting in a performance-power ratio that is over 300% better than their reference SMP platform.« less
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Palamuttam, R. S.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; Verma, R.; Waliser, D. E.; Lee, H.

2015-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark under a NASA AIST grant (PI Mattmann). Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 10 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. We have implemented a parallel data ingest capability in which the user specifies desired variables (arrays) as several time-sorted lists of URL's (i.e. using OPeNDAP model.nc?varname, or local files). The specified variables are partitioned by time/space and then each Spark node pulls its bundle of arrays into memory to begin a computation pipeline. We also investigated the performance of several N-dim. array libraries (scala breeze, java jblas & netlib-java, and ND4J). We are currently developing science codes using ND4J and studying memory behavior on the JVM. On the pyspark side, many of our science codes already use the numpy and SciPy ecosystems. The talk will cover: the architecture of SciSpark, the design of the scientific RDD (sRDD) data structure, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

PubMed Central

Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man

2015-01-01

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation. PMID:26681933
Second Evaluation of Job Queuing/Scheduling Software. Phase 1

NASA Technical Reports Server (NTRS)

Jones, James Patton; Brickell, Cristy; Chancellor, Marisa (Technical Monitor)

1997-01-01

The recent proliferation of high performance workstations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, NAS compiled a requirements checklist for job queuing/scheduling software. Next, NAS evaluated the leading job management system (JMS) software packages against the checklist. A year has now elapsed since the first comparison was published, and NAS has repeated the evaluation. This report describes this second evaluation, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still lacking, however, definite progress has been made by the vendors to correct the deficiencies. This report is supplemented by a WWW interface to the data collected, to aid other sites in extracting the evaluation information on specific requirements of interest.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.

PubMed

Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man

2015-01-01

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.

The Wang Landau parallel algorithm for the simple grids. Optimizing OpenMPI parallel implementation

NASA Astrophysics Data System (ADS)

Kussainov, A. S.

2017-12-01

The Wang Landau Monte Carlo algorithm to calculate density of states for the different simple spin lattices was implemented. The energy space was split between the individual threads and balanced according to the expected runtime for the individual processes. Custom spin clustering mechanism, necessary for overcoming of the critical slowdown in the certain energy subspaces, was devised. Stable reconstruction of the density of states was of primary importance. Some data post-processing techniques were involved to produce the expected smooth density of states.
A Massively Parallel Code for Polarization Calculations

NASA Astrophysics Data System (ADS)

Akiyama, Shizuka; Höflich, Peter

2001-03-01

We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Dynamics of Oxidation of Aluminum Nanoclusters using Variable Charge Molecular-Dynamics Simulations on Parallel Computers

NASA Astrophysics Data System (ADS)

Campbell, Timothy; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Ogata, Shuji; Rodgers, Stephen

1999-06-01

Oxidation of aluminum nanoclusters is investigated with a parallel molecular-dynamics approach based on dynamic charge transfer among atoms. Structural and dynamic correlations reveal that significant charge transfer gives rise to large negative pressure in the oxide which dominates the positive pressure due to steric forces. As a result, aluminum moves outward and oxygen moves towards the interior of the cluster with the aluminum diffusivity 60% higher than that of oxygen. A stable 40 Å thick amorphous oxide is formed; this is in excellent agreement with experiments.
Content-addressable read/write memories for image analysis

NASA Technical Reports Server (NTRS)

Snyder, W. E.; Savage, C. D.

1982-01-01

The commonly encountered image analysis problems of region labeling and clustering are found to be cases of search-and-rename problem which can be solved in parallel by a system architecture that is inherently suitable for VLSI implementation. This architecture is a novel form of content-addressable memory (CAM) which provides parallel search and update functions, allowing speed reductions down to constant time per operation. It has been proposed in related investigations by Hall (1981) that, with VLSI, CAM-based structures with enhanced instruction sets for general purpose processing will be feasible.
Clustering and group selection of multiple criteria alternatives with application to space-based networks.

PubMed

Malakooti, Behnam; Yang, Ziyong

2004-02-01

In many real-world problems, the range of consequences of different alternatives are considerably different. In addition, sometimes, selection of a group of alternatives (instead of only one best alternative) is necessary. Traditional decision making approaches treat the set of alternatives with the same method of analysis and selection. In this paper, we propose clustering alternatives into different groups so that different methods of analysis, selection, and implementation for each group can be applied. As an example, consider the selection of a group of functions (or tasks) to be processed by a group of processors. The set of tasks can be grouped according to their similar criteria, and hence, each cluster of tasks to be processed by a processor. The selection of the best alternative for each clustered group can be performed using existing methods; however, the process of selecting groups is different than the process of selecting alternatives within a group. We develop theories and procedures for clustering discrete multiple criteria alternatives. We also demonstrate how the set of alternatives is clustered into mutually exclusive groups based on 1) similar features among alternatives; 2) ideal (or most representative) alternatives given by the decision maker; and 3) other preferential information of the decision maker. The clustering of multiple criteria alternatives also has the following advantages. 1) It decreases the set of alternatives to be considered by the decision maker (for example, different decision makers are assigned to different groups of alternatives). 2) It decreases the number of criteria. 3) It may provide a different approach for analyzing multiple decision makers problems. Each decision maker may cluster alternatives differently, and hence, clustering of alternatives may provide a basis for negotiation. The developed approach is applicable for solving a class of telecommunication networks problems where a set of objects (such as routers, processors, or intelligent autonomous vehicles) are to be clustered into similar groups. Objects are clustered based on several criteria and the decision maker's preferences.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan

This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tarmore » geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.« less
[Parallel virtual reality visualization of extreme large medical datasets].

PubMed

Tang, Min

2010-04-01

On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
The Portals 4.0 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2012-11-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
Shared Memory Parallelization of an Implicit ADI-type CFD Code

NASA Technical Reports Server (NTRS)

Hauser, Th.; Huang, P. G.

1999-01-01

A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
MHD Code Optimizations and Jets in Dense Gaseous Halos

NASA Astrophysics Data System (ADS)

Gaibler, Volker; Vigelius, Matthias; Krause, Martin; Camenzind, Max

We have further optimized and extended the 3D-MHD-code NIRVANA. The magnetized part runs in parallel, reaching 19 Gflops per SX-6 node, and has a passively advected particle population. In addition, the code is MPI-parallel now - on top of the shared memory parallelization. On a 512^3 grid, we reach 561 Gflops with 32 nodes on the SX-8. Also, we have successfully used FLASH on the Opteron cluster. Scientific results are preliminary so far. We report one computation of highly resolved cocoon turbulence. While we find some similarities to earlier 2D work by us and others, we note a strange reluctancy of cold material to enter the low density cocoon, which has to be investigated further.
Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

NASA Astrophysics Data System (ADS)

Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

2017-08-01

Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.
Hierarchical Image Segmentation of Remotely Sensed Data using Massively Parallel GNU-LINUX Software

NASA Technical Reports Server (NTRS)

Tilton, James C.

2003-01-01

A hierarchical set of image segmentations is a set of several image segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. In [1], Tilton, et a1 describes an approach for producing hierarchical segmentations (called HSEG) and gave a progress report on exploiting these hierarchical segmentations for image information mining. The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HSWO) approach to region growing, which was described as early as 1989 by Beaulieu and Goldberg. The HSWO approach seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing (e.g. Horowitz and T. Pavlidis, [3]). In addition, HSEG optionally interjects between HSWO region growing iterations, merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the utility of the segmentation results, especially for larger images, it also significantly increases HSEG s computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) was devised, which includes special code to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. The recursive nature of RHSEG makes for a straightforward parallel implementation. This paper describes the HSEG algorithm, its recursive formulation (referred to as RHSEG), and the implementation of RHSEG using massively parallel GNU-LINUX software. Results with Landsat TM data are included comparing RHSEG with classic region growing.
Using the GeoFEST Faulted Region Simulation System

NASA Technical Reports Server (NTRS)

Parker, Jay W.; Lyzenga, Gregory A.; Donnellan, Andrea; Judd, Michele A.; Norton, Charles D.; Baker, Teresa; Tisdale, Edwin R.; Li, Peggy

2004-01-01

GeoFEST (the Geophysical Finite Element Simulation Tool) simulates stress evolution, fault slip and plastic/elastic processes in realistic materials, and so is suitable for earthquake cycle studies in regions such as Southern California. Many new capabilities and means of access for GeoFEST are now supported. New abilities include MPI-based cluster parallel computing using automatic PYRAMID/Parmetis-based mesh partitioning, automatic mesh generation for layered media with rectangular faults, and results visualization that is integrated with remote sensing data. The parallel GeoFEST application has been successfully run on over a half-dozen computers, including Intel Xeon clusters, Itanium II and Altix machines, and the Apple G5 cluster. It is not separately optimized for different machines, but relies on good domain partitioning for load-balance and low communication, and careful writing of the parallel diagonally preconditioned conjugate gradient solver to keep communication overhead low. Demonstrated thousand-step solutions for over a million finite elements on 64 processors require under three hours, and scaling tests show high efficiency when using more than (order of) 4000 elements per processor. The source code and documentation for GeoFEST is available at no cost from Open Channel Foundation. In addition GeoFEST may be used through a browser-based portal environment available to approved users. That environment includes semi-automated geometry creation and mesh generation tools, GeoFEST, and RIVA-based visualization tools that include the ability to generate a flyover animation showing deformations and topography. Work is in progress to support simulation of a region with several faults using 16 million elements, using a strain energy metric to adapt the mesh to faithfully represent the solution in a region of widely varying strain.
Specialized Computer Systems for Environment Visualization

NASA Astrophysics Data System (ADS)

Al-Oraiqat, Anas M.; Bashkov, Evgeniy A.; Zori, Sergii A.

2018-06-01

The need for real time image generation of landscapes arises in various fields as part of tasks solved by virtual and augmented reality systems, as well as geographic information systems. Such systems provide opportunities for collecting, storing, analyzing and graphically visualizing geographic data. Algorithmic and hardware software tools for increasing the realism and efficiency of the environment visualization in 3D visualization systems are proposed. This paper discusses a modified path tracing algorithm with a two-level hierarchy of bounding volumes and finding intersections with Axis-Aligned Bounding Box. The proposed algorithm eliminates the branching and hence makes the algorithm more suitable to be implemented on the multi-threaded CPU and GPU. A modified ROAM algorithm is used to solve the qualitative visualization of reliefs' problems and landscapes. The algorithm is implemented on parallel systems—cluster and Compute Unified Device Architecture-networks. Results show that the implementation on MPI clusters is more efficient than Graphics Processing Unit/Graphics Processing Clusters and allows real-time synthesis. The organization and algorithms of the parallel GPU system for the 3D pseudo stereo image/video synthesis are proposed. With realizing possibility analysis on a parallel GPU-architecture of each stage, 3D pseudo stereo synthesis is performed. An experimental prototype of a specialized hardware-software system 3D pseudo stereo imaging and video was developed on the CPU/GPU. The experimental results show that the proposed adaptation of 3D pseudo stereo imaging to the architecture of GPU-systems is efficient. Also it accelerates the computational procedures of 3D pseudo-stereo synthesis for the anaglyph and anamorphic formats of the 3D stereo frame without performing optimization procedures. The acceleration is on average 11 and 54 times for test GPUs.
Exploiting Symmetry on Parallel Architectures.

NASA Astrophysics Data System (ADS)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
PREFACE: 3rd International Workshop on "State of the Art in Nuclear Cluster Physics"

NASA Astrophysics Data System (ADS)

Yamada, Taiichi; Kanada-En'yo, Yoshiko

2014-12-01

The 3rd International Workshop on "State of the Art in Nuclear Cluster Physics"(SOTANCP3) was held at KGU Kannai Media Center, Kanto Gakuin University, Yokohama, Japan, from May 26 to 30, 2014. Yokohama is the second largest city in Japan, about 25 km southeast of Tokyo. The first workshop of the series was held in Strasbourg, France, in 2008 and the second one was in Brussels, Belgium, in 2010. The purpose of SOTANCP3 was to discuss the present status and future perspectives of the nuclear cluster physics. The following nine topics were selected in order to cover most of the scientific programme and highlight an area where new ideas have emerged over recent years: (1) Cluster structures and many-body correlations in stable and unstable nuclei (2) Clustering aspects of nuclear reactions and resonances (3) Alpha condensates and analogy with condensed matter approaches (4) Role of tensor force in cluster physics and ab initio approaches (5) Clustering in hypernuclei (6) Nuclear fission, superheavy nuclei, and cluster decay (7) Cluster physics and nuclear astrophysics (8) Clustering in nuclear matter and neutron stars (9) Clustering in hadron and atomic physics There were 122 participants, including 53 from 17 foreign countries. In addition to invited talks, we had many talks selected from contributed papers. There were plenary, parallel, and poster sessions. Poster contributions were also presented as four-minute talks in parallel sessions. This proceedings contains the papers presented in invited and selected talks together with those presented in poster sessions. We would like to express our gratitude to the members of the International Advisory Committee and those of the Organizing Committee for their efforts which made this workshop successful. In particular we would like to present our great thanks to Drs. Y. Funaki, W. Horiuchi, N. Itagaki, M. Kimura, T. Myo, and T. Yoshida. We would like also to thank the following organizations for their sponsors: RCNP (Research Center for Nuclear Physics, Osaka University), CNS (Center for Nuclear Study, University of Tokyo), JICFuS (Joint Institute for Computational Fundamental Science), and RIKEN (Nishina Center for Accelerator-Based Science, Institute of Physical and Chemical Research). This workshop was supported by Yokohama Convention & Visitors Bureau and Kanto Gakuin University. It remains to be announced that the next, the fourth in this series of SOTANCP workshops, SOTANCP4, will be held in Galveston, Texas, USA, in 2018.
Graph-theoretic quantum system modelling for neuronal microtubules as hierarchical clustered quantum Hopfield networks

NASA Astrophysics Data System (ADS)

Srivastava, D. P.; Sahni, V.; Satsangi, P. S.

2014-08-01

Graph-theoretic quantum system modelling (GTQSM) is facilitated by considering the fundamental unit of quantum computation and information, viz. a quantum bit or qubit as a basic building block. Unit directional vectors "ket 0" and "ket 1" constitute two distinct fundamental quantum across variable orthonormal basis vectors, for the Hilbert space, specifying the direction of propagation of information, or computation data, while complementary fundamental quantum through, or flow rate, variables specify probability parameters, or amplitudes, as surrogates for scalar quantum information measure (von Neumann entropy). This paper applies GTQSM in continuum of protein heterodimer tubulin molecules of self-assembling polymers, viz. microtubules in the brain as a holistic system of interacting components representing hierarchical clustered quantum Hopfield network, hQHN, of networks. The quantum input/output ports of the constituent elemental interaction components, or processes, of tunnelling interactions and Coulombic bidirectional interactions are in cascade and parallel interconnections with each other, while the classical output ports of all elemental components are interconnected in parallel to accumulate micro-energy functions generated in the system as Hamiltonian, or Lyapunov, energy function. The paper presents an insight, otherwise difficult to gain, for the complex system of systems represented by clustered quantum Hopfield network, hQHN, through the application of GTQSM construct.
DOVIS: an implementation for high-throughput virtual screening using AutoDock.

PubMed

Zhang, Shuxing; Kumar, Kamal; Jiang, Xiaohui; Wallqvist, Anders; Reifman, Jaques

2008-02-27

Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial. We have developed an application termed DOVIS that uses AutoDock (version 3) as the docking engine and runs in parallel on a Linux cluster. DOVIS can efficiently dock large numbers (millions) of small molecules (ligands) to a receptor, screening 500 to 1,000 compounds per processor per day. Furthermore, in DOVIS, the docking session is fully integrated and automated in that the inputs are specified via a graphical user interface, the calculations are fully integrated with a Linux cluster queuing system for parallel processing, and the results can be visualized and queried. DOVIS removes most of the complexities and organizational problems associated with large-scale high-throughput virtual screening, and provides a convenient and efficient solution for AutoDock users to use this software in a Linux cluster platform.
A Structure-Based Distance Metric for High-Dimensional Space Exploration with Multi-Dimensional Scaling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Hyun Jung; McDonnell, Kevin T.; Zelenyuk, Alla

2014-03-01

Although the Euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging inter-cluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multi-dimensional scaling (MDS) where one can often observe non-intuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly inmore » high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our MDS plots also exhibit similar visual relationships as the method of parallel coordinates which is often used alongside to visualize the high-dimensional data in raw form. We then cast our metric into a bi-scale framework which distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate Euclidean distance.« less
A comprehensive HST BVI catalogue of star clusters in five Hickson compact groups of galaxies

NASA Astrophysics Data System (ADS)

Fedotov, K.; Gallagher, S. C.; Durrell, P. R.; Bastian, N.; Konstantopoulos, I. S.; Charlton, J.; Johnson, K. E.; Chandar, R.

2015-05-01

We present a photometric catalogue of star cluster candidates in Hickson compact groups (HCGs) 7, 31, 42, 59, and 92, based on observations with the Advanced Camera for Surveys and the Wide Field Camera 3 on the Hubble Space Telescope. The catalogue contains precise cluster positions (right ascension and declination), magnitudes, and colours in the BVI filters. The number of detected sources ranges from 2200 to 5600 per group, from which we construct the high-confidence sample by applying a number of criteria designed to reduce foreground and background contaminants. Furthermore, the high-confidence cluster candidates for each of the 16 galaxies in our sample are split into two subpopulations: one that may contain young star clusters and one that is dominated by globular older clusters. The ratio of young star cluster to globular cluster candidates varies from group to group, from equal numbers to the extreme of HCG 31 which has a ratio of 8 to 1, due to a recent starburst induced by interactions in the group. We find that the number of blue clusters with MV < -9 correlates well with the current star formation rate in an individual galaxy, while the number of globular cluster candidates with MV < -7.8 correlates well (though with large scatter) with the stellar mass. Analyses of the high-confidence sample presented in this paper show that star clusters can be successfully used to infer the gross star formation history of the host groups and therefore determine their placement in a proposed evolutionary sequence for compact galaxy groups.

High Performance Geostatistical Modeling of Biospheric Resources

NASA Astrophysics Data System (ADS)

Pedelty, J. A.; Morisette, J. T.; Smith, J. A.; Schnase, J. L.; Crosier, C. S.; Stohlgren, T. J.

2004-12-01

We are using parallel geostatistical codes to study spatial relationships among biospheric resources in several study areas. For example, spatial statistical models based on large- and small-scale variability have been used to predict species richness of both native and exotic plants (hot spots of diversity) and patterns of exotic plant invasion. However, broader use of geostastics in natural resource modeling, especially at regional and national scales, has been limited due to the large computing requirements of these applications. To address this problem, we implemented parallel versions of the kriging spatial interpolation algorithm. The first uses the Message Passing Interface (MPI) in a master/slave paradigm on an open source Linux Beowulf cluster, while the second is implemented with the new proprietary Xgrid distributed processing system on an Xserve G5 cluster from Apple Computer, Inc. These techniques are proving effective and provide the basis for a national decision support capability for invasive species management that is being jointly developed by NASA and the US Geological Survey.
Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks.

PubMed

Mulder, Samuel A; Wunsch, Donald C

2003-01-01

The Traveling Salesman Problem (TSP) is a very hard optimization problem in the field of operations research. It has been shown to be NP-complete, and is an often-used benchmark for new optimization techniques. One of the main challenges with this problem is that standard, non-AI heuristic approaches such as the Lin-Kernighan algorithm (LK) and the chained LK variant are currently very effective and in wide use for the common fully connected, Euclidean variant that is considered here. This paper presents an algorithm that uses adaptive resonance theory (ART) in combination with a variation of the Lin-Kernighan local optimization algorithm to solve very large instances of the TSP. The primary advantage of this algorithm over traditional LK and chained-LK approaches is the increased scalability and parallelism allowed by the divide-and-conquer clustering paradigm. Tours obtained by the algorithm are lower quality, but scaling is much better and there is a high potential for increasing performance using parallel hardware.
A microRNA-mRNA expression network during oral siphon regeneration in Ciona.

PubMed

Spina, Elijah J; Guzman, Elmer; Zhou, Hongjun; Kosik, Kenneth S; Smith, William C

2017-05-15

Here we present a parallel study of mRNA and microRNA expression during oral siphon (OS) regeneration in Ciona robusta , and the derived network of their interactions. In the process of identifying 248 mRNAs and 15 microRNAs as differentially expressed, we also identified 57 novel microRNAs, several of which are among the most highly differentially expressed. Analysis of functional categories identified enriched transcripts related to stress responses and apoptosis at the wound healing stage, signaling pathways including Wnt and TGFβ during early regrowth, and negative regulation of extracellular proteases in late stage regeneration. Consistent with the expression results, we found that inhibition of TGFβ signaling blocked OS regeneration. A correlation network was subsequently inferred for all predicted microRNA-mRNA target pairs expressed during regeneration. Network-based clustering associated transcripts into 22 non-overlapping groups, the functional analysis of which showed enrichment of stress response, signaling pathway and extracellular protease categories that could be related to specific microRNAs. Predicted targets of the miR-9 cluster suggest a role in regulating differentiation and the proliferative state of neural progenitors through regulation of the cytoskeleton and cell cycle. © 2017. Published by The Company of Biologists Ltd.
A microRNA-mRNA expression network during oral siphon regeneration in Ciona

PubMed Central

Spina, Elijah J.; Guzman, Elmer; Zhou, Hongjun; Kosik, Kenneth S.

2017-01-01

Here we present a parallel study of mRNA and microRNA expression during oral siphon (OS) regeneration in Ciona robusta, and the derived network of their interactions. In the process of identifying 248 mRNAs and 15 microRNAs as differentially expressed, we also identified 57 novel microRNAs, several of which are among the most highly differentially expressed. Analysis of functional categories identified enriched transcripts related to stress responses and apoptosis at the wound healing stage, signaling pathways including Wnt and TGFβ during early regrowth, and negative regulation of extracellular proteases in late stage regeneration. Consistent with the expression results, we found that inhibition of TGFβ signaling blocked OS regeneration. A correlation network was subsequently inferred for all predicted microRNA-mRNA target pairs expressed during regeneration. Network-based clustering associated transcripts into 22 non-overlapping groups, the functional analysis of which showed enrichment of stress response, signaling pathway and extracellular protease categories that could be related to specific microRNAs. Predicted targets of the miR-9 cluster suggest a role in regulating differentiation and the proliferative state of neural progenitors through regulation of the cytoskeleton and cell cycle. PMID:28432214
Parallel and serial grouping of image elements in visual perception.

PubMed

Houtkamp, Roos; Roelfsema, Pieter R

2010-12-01

The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some situations, but we demonstrate that there are also situations where Gestalt grouping becomes serial. We observe substantial time delays when image elements have to be grouped indirectly through a chain of local groupings. We call this chaining process incremental grouping and demonstrate that it can occur for only a single object at a time. We suggest that incremental grouping requires the gradual spread of object-based attention so that eventually all the object's parts become grouped explicitly by an attentional labeling process. Our findings inspire a new incremental grouping theory that relates the parallel, local grouping process to feedforward processing and the serial, incremental grouping process to recurrent processing in the visual cortex.
Self-assembled three-dimensional chiral colloidal architecture

NASA Astrophysics Data System (ADS)

Ben Zion, Matan Yah; He, Xiaojin; Maass, Corinna C.; Sha, Ruojie; Seeman, Nadrian C.; Chaikin, Paul M.

2017-11-01

Although stereochemistry has been a central focus of the molecular sciences since Pasteur, its province has previously been restricted to the nanometric scale. We have programmed the self-assembly of micron-sized colloidal clusters with structural information stemming from a nanometric arrangement. This was done by combining DNA nanotechnology with colloidal science. Using the functional flexibility of DNA origami in conjunction with the structural rigidity of colloidal particles, we demonstrate the parallel self-assembly of three-dimensional microconstructs, evincing highly specific geometry that includes control over position, dihedral angles, and cluster chirality.
STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens.

PubMed

Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D; Volz, Kerstin

2017-06-01

We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space. Copyright © 2017 Elsevier B.V. All rights reserved.
A rank-sum test for clustered data when the number of subjects in a group within a cluster is informative.

PubMed

Dutta, Sandipan; Datta, Somnath

2016-06-01

The Wilcoxon rank-sum test is a popular nonparametric test for comparing two independent populations (groups). In recent years, there have been renewed attempts in extending the Wilcoxon rank sum test for clustered data, one of which (Datta and Satten, 2005, Journal of the American Statistical Association 100, 908-915) addresses the issue of informative cluster size, i.e., when the outcomes and the cluster size are correlated. We are faced with a situation where the group specific marginal distribution in a cluster depends on the number of observations in that group (i.e., the intra-cluster group size). We develop a novel extension of the rank-sum test for handling this situation. We compare the performance of our test with the Datta-Satten test, as well as the naive Wilcoxon rank sum test. Using a naturally occurring simulation model of informative intra-cluster group size, we show that only our test maintains the correct size. We also compare our test with a classical signed rank test based on averages of the outcome values in each group paired by the cluster membership. While this test maintains the size, it has lower power than our test. Extensions to multiple group comparisons and the case of clusters not having samples from all groups are also discussed. We apply our test to determine whether there are differences in the attachment loss between the upper and lower teeth and between mesial and buccal sites of periodontal patients. © 2015, The International Biometric Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Setyawan, Wahyu; Nandipati, Giridhar; Kurtz, Richard J.

The stability of tungsten self-interstitial atom (SIA) clusters is studied using first-principles methods. Clusters from one to seven SIAs are systematically explored from 1264 unique configurations. Finite-size effect of the simulation cell is corrected based on the scaling of formation energy versus inverse volume cell. Furthermore, the accuracy of the calculations is improved by treating the 5p semicore states as valence states. Configurations of the three most stable clusters in each cluster size n are presented, which consist of parallel [111] dumbbells. The evolution of these clusters leading to small dislocation loops is discussed. The binding energy of size-n clustersmore » is analyzed relative to an n → (n-1) + 1 dissociation and is shown to increase with size. Extrapolation for n > 7 is presented using a dislocation loop model. In addition, the interaction of these clusters with a substitutional Re, Os, or Ta solute is explored by replacing one of the dumbbells with the solute. Re and Os strongly attract these clusters, but Ta strongly repels. The strongest interaction is found when the solute is located on the periphery of the cluster rather than in the middle. The magnitude of this interaction decreases with cluster size. Empirical fits to describe the trend of the solute binding energy are presented.« less
GRAVIDY, a GPU modular, parallel direct-summation N-body integrator: dynamics with softening

NASA Astrophysics Data System (ADS)

Maureira-Fredes, Cristián; Amaro-Seoane, Pau

2018-01-01

A wide variety of outstanding problems in astrophysics involve the motion of a large number of particles under the force of gravity. These include the global evolution of globular clusters, tidal disruptions of stars by a massive black hole, the formation of protoplanets and sources of gravitational radiation. The direct-summation of N gravitational forces is a complex problem with no analytical solution and can only be tackled with approximations and numerical methods. To this end, the Hermite scheme is a widely used integration method. With different numerical techniques and special-purpose hardware, it can be used to speed up the calculations. But these methods tend to be computationally slow and cumbersome to work with. We present a new graphics processing unit (GPU), direct-summation N-body integrator written from scratch and based on this scheme, which includes relativistic corrections for sources of gravitational radiation. GRAVIDY has high modularity, allowing users to readily introduce new physics, it exploits available computational resources and will be maintained by regular updates. GRAVIDY can be used in parallel on multiple CPUs and GPUs, with a considerable speed-up benefit. The single-GPU version is between one and two orders of magnitude faster than the single-CPU version. A test run using four GPUs in parallel shows a speed-up factor of about 3 as compared to the single-GPU version. The conception and design of this first release is aimed at users with access to traditional parallel CPU clusters or computational nodes with one or a few GPU cards.
Clustering on Magnesium Surfaces - Formation and Diffusion Energies.

PubMed

Chu, Haijian; Huang, Hanchen; Wang, Jian

2017-07-12

The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and [Formula: see text]. In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a [Formula: see text], clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface [Formula: see text] is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich-Schwoebel barriers converge as the step height is three atomic layers or thicker. Adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.
NGC 2548: clumpy spatial and kinematic structure in an intermediate-age Galactic cluster

NASA Astrophysics Data System (ADS)

Vicente, Belén; Sánchez, Néstor; Alfaro, Emilio J.

2016-09-01

NGC 2548 is a ˜400-500 Myr old open cluster with evidence of spatial substructures likely caused by its interaction with the Galactic disc. In this work we use precise astrometric data from the Carte du Ciel - San Fernando (CdC-SF) catalogue to study the clumpy structure in this cluster. We confirm the fragmented structure of NGC 2548 but, additionally, the relatively high precision of our kinematic data lead us to the first detection of substructures in the proper motion space of a stellar cluster. There are three spatially separated cores each of which has its own counterpart in the proper motion distribution. The two main cores lie nearly parallel to the Galactic plane whereas the third one is significantly fainter than the others and it moves towards the Galactic plane separating from the rest of the cluster. We derive core positions and proper motions, as well as the stars belonging to each core.
LoCuSS: pre-processing in galaxy groups falling into massive galaxy clusters at z = 0.2

NASA Astrophysics Data System (ADS)

Bianconi, M.; Smith, G. P.; Haines, C. P.; McGee, S. L.; Finoguenov, A.; Egami, E.

2018-01-01

We report direct evidence of pre-processing of the galaxies residing in galaxy groups falling into galaxy clusters drawn from the Local Cluster Substructure Survey (LoCuSS). 34 groups have been identified via their X-ray emission in the infall regions of 23 massive ( = 1015 M⊙) clusters at 0.15 < z < 0.3. Highly complete spectroscopic coverage combined with 24 μm imaging from Spitzer allows us to make a consistent and robust selection of cluster and group members including star-forming galaxies down to a stellar mass limit of M⋆ = 2 × 1010 M⊙. The fraction fSF of star-forming galaxies in infalling groups is lower and with a flatter trend with respect to clustercentric radius when compared to the rest of the cluster galaxy population. At R ≈ 1.3 r200, the fraction of star-forming galaxies in infalling groups is half that in the cluster galaxy population. This is direct evidence that star-formation quenching is effective in galaxies already prior to them settling in the cluster potential, and that groups are favourable locations for this process.
Exploring asynchronous brainstorming in large groups: a field comparison of serial and parallel subgroups.

PubMed

de Vreede, Gert-Jan; Briggs, Robert O; Reiter-Palmon, Roni

2010-04-01

The aim of this study was to compare the results of two different modes of using multiple groups (instead of one large group) to identify problems and develop solutions. Many of the complex problems facing organizations today require the use of very large groups or collaborations of groups from multiple organizations. There are many logistical problems associated with the use of such large groups, including the ability to bring everyone together at the same time and location. A field study involved two different organizations and compared productivity and satisfaction of group. The approaches included (a) multiple small groups, each completing the entire process from start to end and combining the results at the end (parallel mode); and (b) multiple subgroups, each building on the work provided by previous subgroups (serial mode). Groups using the serial mode produced more elaborations compared with parallel groups, whereas parallel groups produced more unique ideas compared with serial groups. No significant differences were found related to satisfaction with process and outcomes between the two modes. Preferred mode depends on the type of task facing the group. Parallel groups are more suited for tasks for which a variety of new ideas are needed, whereas serial groups are best suited when elaboration and in-depth thinking on the solution are required. Results of this research can guide the development of facilitated sessions of large groups or "teams of teams."
Effect of data truncation in an implementation of pixel clustering on a custom computing machine

NASA Astrophysics Data System (ADS)

Leeser, Miriam E.; Theiler, James P.; Estlick, Michael; Kitaryeva, Natalya V.; Szymanski, John J.

2000-10-01

We investigate the effect of truncating the precision of hyperspectral image data for the purpose of more efficiently segmenting the image using a variant of k-means clustering. We describe the implementation of the algorithm on field-programmable gate array (FPGA) hardware. Truncating the data to only a few bits per pixel in each spectral channel permits a more compact hardware design, enabling greater parallelism, and ultimately a more rapid execution. It also enables the storage of larger images in the onboard memory. In exchange for faster clustering, however, one trades off the quality of the produced segmentation. We find, however, that the clustering algorithm can tolerate considerable data truncation with little degradation in cluster quality. This robustness to truncated data can be extended by computing the cluster centers to a few more bits of precision than the data. Since there are so many more pixels than centers, the more aggressive data truncation leads to significant gains in the number of pixels that can be stored in memory and processed in hardware concurrently.
Blocked inverted indices for exact clustering of large chemical spaces.

PubMed

Thiel, Philipp; Sach-Peltason, Lisa; Ottmann, Christian; Kohlbacher, Oliver

2014-09-22

The calculation of pairwise compound similarities based on fingerprints is one of the fundamental tasks in chemoinformatics. Methods for efficient calculation of compound similarities are of the utmost importance for various applications like similarity searching or library clustering. With the increasing size of public compound databases, exact clustering of these databases is desirable, but often computationally prohibitively expensive. We present an optimized inverted index algorithm for the calculation of all pairwise similarities on 2D fingerprints of a given data set. In contrast to other algorithms, it neither requires GPU computing nor yields a stochastic approximation of the clustering. The algorithm has been designed to work well with multicore architectures and shows excellent parallel speedup. As an application example of this algorithm, we implemented a deterministic clustering application, which has been designed to decompose virtual libraries comprising tens of millions of compounds in a short time on current hardware. Our results show that our implementation achieves more than 400 million Tanimoto similarity calculations per second on a common desktop CPU. Deterministic clustering of the available chemical space thus can be done on modern multicore machines within a few days.
Some facts about aftershocks to large earthquakes in California

USGS Publications Warehouse

Jones, Lucile M.; Reasenberg, Paul A.

1996-01-01

Earthquakes occur in clusters. After one earthquake happens, we usually see others at nearby (or identical) locations. To talk about this phenomenon, seismologists coined three terms foreshock , mainshock , and aftershock. In any cluster of earthquakes, the one with the largest magnitude is called the mainshock; earthquakes that occur before the mainshock are called foreshocks while those that occur after the mainshock are called aftershocks. A mainshock will be redefined as a foreshock if a subsequent event in the cluster has a larger magnitude. Aftershock sequences follow predictable patterns. That is, a sequence of aftershocks follows certain global patterns as a group, but the individual earthquakes comprising the group are random and unpredictable. This relationship between the pattern of a group and the randomness (stochastic nature) of the individuals has a close parallel in actuarial statistics. We can describe the pattern that aftershock sequences tend to follow with well-constrained equations. However, we must keep in mind that the actual aftershocks are only probabilistically described by these equations. Once the parameters in these equations have been estimated, we can determine the probability of aftershocks occurring in various space, time and magnitude ranges as described below. Clustering of earthquakes usually occurs near the location of the mainshock. The stress on the mainshock's fault changes drastically during the mainshock and that fault produces most of the aftershocks. This causes a change in the regional stress, the size of which decreases rapidly with distance from the mainshock. Sometimes the change in stress caused by the mainshock is great enough to trigger aftershocks on other, nearby faults. While there is no hard "cutoff" distance beyond which an earthquake is totally incapable of triggering an aftershock, the vast majority of aftershocks are located close to the mainshock. As a rule of thumb, we consider earthquakes to be aftershocks if they are located within a characteristic distance from the mainshock. This distance is usually taken to be one or two times the length of the fault rupture associated with the mainshock. For example, if the mainshock ruptured a 100 km length of a fault, subsequent earthquakes up to 100-200 km away from the mainshock rupture would be considered aftershocks. The fault rupture length was approximately 15 km in the 1994 Northridge earthquake, and 430 km in the great 1906 earthquake.
Cross-language information retrieval using PARAFAC2.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bader, Brett William; Chew, Peter; Abdelali, Ahmed

A standard approach to cross-language information retrieval (CLIR) uses Latent Semantic Analysis (LSA) in conjunction with a multilingual parallel aligned corpus. This approach has been shown to be successful in identifying similar documents across languages - or more precisely, retrieving the most similar document in one language to a query in another language. However, the approach has severe drawbacks when applied to a related task, that of clustering documents 'language-independently', so that documents about similar topics end up closest to one another in the semantic space regardless of their language. The problem is that documents are generally more similar tomore » other documents in the same language than they are to documents in a different language, but on the same topic. As a result, when using multilingual LSA, documents will in practice cluster by language, not by topic. We propose a novel application of PARAFAC2 (which is a variant of PARAFAC, a multi-way generalization of the singular value decomposition [SVD]) to overcome this problem. Instead of forming a single multilingual term-by-document matrix which, under LSA, is subjected to SVD, we form an irregular three-way array, each slice of which is a separate term-by-document matrix for a single language in the parallel corpus. The goal is to compute an SVD for each language such that V (the matrix of right singular vectors) is the same across all languages. Effectively, PARAFAC2 imposes the constraint, not present in standard LSA, that the 'concepts' in all documents in the parallel corpus are the same regardless of language. Intuitively, this constraint makes sense, since the whole purpose of using a parallel corpus is that exactly the same concepts are expressed in the translations. We tested this approach by comparing the performance of PARAFAC2 with standard LSA in solving a particular CLIR problem. From our results, we conclude that PARAFAC2 offers a very promising alternative to LSA not only for multilingual document clustering, but also for solving other problems in cross-language information retrieval.« less
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Loikith, P.; Lee, H.; McGibbney, L. J.; Whitehall, K. D.

2014-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark. Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk, and makes iterative algorithms feasible. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 100 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning (ML) based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. The goals of SciSpark are to: (1) Decrease the time to compute comparison statistics and plots from minutes to seconds; (2) Allow for interactive exploration of time-series properties over seasons and years; (3) Decrease the time for satellite data ingestion into RCMES to hours; (4) Allow for Level-2 comparisons with higher-order statistics or PDF's in minutes to hours; and (5) Move RCMES into a near real time decision-making platform. We will report on: the architecture and design of SciSpark, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning (sharding) of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.
Balancing computation and communication power in power constrained clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piga, Leonardo; Paul, Indrani; Huang, Wei

Systems, apparatuses, and methods for balancing computation and communication power in power constrained environments. A data processing cluster with a plurality of compute nodes may perform parallel processing of a workload in a power constrained environment. Nodes that finish tasks early may be power-gated based on one or more conditions. In some scenarios, a node may predict a wait duration and go into a reduced power consumption state if the wait duration is predicted to be greater than a threshold. The power saved by power-gating one or more nodes may be reassigned for use by other nodes. A cluster agentmore » may be configured to reassign the unused power to the active nodes to expedite workload processing.« less

Ultrawideband asynchronous tracking system and method

NASA Technical Reports Server (NTRS)

Arndt, G. Dickey (Inventor); Ngo, Phong H. (Inventor); Phan, Chau T. (Inventor); Gross, Julia A. (Inventor); Ni, Jianjun (Inventor); Dusl, John (Inventor)

2012-01-01

A passive tracking system is provided with a plurality of ultrawideband (UWB) receivers that is asynchronous with respect to a UWB transmitter. A geometry of the tracking system may utilize a plurality of clusters with each cluster comprising a plurality of antennas. Time Difference of Arrival (TDOA) may be determined for the antennas in each cluster and utilized to determine Angle of Arrival (AOA) based on a far field assumption regarding the geometry. Parallel software communication sockets may be established with each of the plurality of UWB receivers. Transfer of waveform data may be processed by alternately receiving packets of waveform data from each UWB receiver. Cross Correlation Peak Detection (CCPD) is utilized to estimate TDOA information to reduce errors in a noisy, multipath environment.
Simple, efficient allocation of modelling runs on heterogeneous clusters with MPI

USGS Publications Warehouse

Donato, David I.

2017-01-01

In scientific modelling and computation, the choice of an appropriate method for allocating tasks for parallel processing depends on the computational setting and on the nature of the computation. The allocation of independent but similar computational tasks, such as modelling runs or Monte Carlo trials, among the nodes of a heterogeneous computational cluster is a special case that has not been specifically evaluated previously. A simulation study shows that a method of on-demand (that is, worker-initiated) pulling from a bag of tasks in this case leads to reliably short makespans for computational jobs despite heterogeneity both within and between cluster nodes. A simple reference implementation in the C programming language with the Message Passing Interface (MPI) is provided.
A Hundred-Year-Old Experiment Re-evaluated: Accurate Ab-Initio Monte-Carlo Simulations of the Melting of Radon.

PubMed

Schwerdtfeger, Peter; Smits, Odile; Pahl, Elke; Jerabek, Paul

2018-06-12

State-of-the-art relativistic coupled-cluster theory is used to construct many-body potentials for the rare gas element radon in order to determine its bulk properties including the solid-to-liquid phase transition from parallel tempering Monte Carlo simulations through either direct sampling of the bulk or from a finite cluster approach. The calculated melting temperature are 201(3) K and 201(6) K from bulk simulations and from extrapolation of finite cluster values, respectively. This is in excellent agreement with the often debated (but widely cited) and only available value of 202 K, dating back to measurements by Gray and Ramsay in 1909. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Using Interactive Graphics to Teach Multivariate Data Analysis to Psychology Students

ERIC Educational Resources Information Center

Valero-Mora, Pedro M.; Ledesma, Ruben D.

2011-01-01

This paper discusses the use of interactive graphics to teach multivariate data analysis to Psychology students. Three techniques are explored through separate activities: parallel coordinates/boxplots; principal components/exploratory factor analysis; and cluster analysis. With interactive graphics, students may perform important parts of the…
Rapid Disaster Damage Estimation

NASA Astrophysics Data System (ADS)

Vu, T. T.

2012-07-01

The experiences from recent disaster events showed that detailed information derived from high-resolution satellite images could accommodate the requirements from damage analysts and disaster management practitioners. Richer information contained in such high-resolution images, however, increases the complexity of image analysis. As a result, few image analysis solutions can be practically used under time pressure in the context of post-disaster and emergency responses. To fill the gap in employment of remote sensing in disaster response, this research develops a rapid high-resolution satellite mapping solution built upon a dual-scale contextual framework to support damage estimation after a catastrophe. The target objects are building (or building blocks) and their condition. On the coarse processing level, statistical region merging deployed to group pixels into a number of coarse clusters. Based on majority rule of vegetation index, water and shadow index, it is possible to eliminate the irrelevant clusters. The remaining clusters likely consist of building structures and others. On the fine processing level details, within each considering clusters, smaller objects are formed using morphological analysis. Numerous indicators including spectral, textural and shape indices are computed to be used in a rule-based object classification. Computation time of raster-based analysis highly depends on the image size or number of processed pixels in order words. Breaking into 2 level processing helps to reduce the processed number of pixels and the redundancy of processing irrelevant information. In addition, it allows a data- and tasks- based parallel implementation. The performance is demonstrated with QuickBird images captured a disaster-affected area of Phanga, Thailand by the 2004 Indian Ocean tsunami are used for demonstration of the performance. The developed solution will be implemented in different platforms as well as a web processing service for operational uses.
High Performance Input/Output for Parallel Computer Systems

NASA Technical Reports Server (NTRS)

Ligon, W. B.

1996-01-01

The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification

NASA Astrophysics Data System (ADS)

Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

2016-12-01

Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

2016-12-01

Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification

PubMed Central

Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

2016-01-01

Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
Ontology-based topic clustering for online discussion data

NASA Astrophysics Data System (ADS)

Wang, Yongheng; Cao, Kening; Zhang, Xiaoming

2013-03-01

With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.
Sample size calculations for the design of cluster randomized trials: A summary of methodology.

PubMed

Gao, Fei; Earnest, Arul; Matchar, David B; Campbell, Michael J; Machin, David

2015-05-01

Cluster randomized trial designs are growing in popularity in, for example, cardiovascular medicine research and other clinical areas and parallel statistical developments concerned with the design and analysis of these trials have been stimulated. Nevertheless, reviews suggest that design issues associated with cluster randomized trials are often poorly appreciated and there remain inadequacies in, for example, describing how the trial size is determined and the associated results are presented. In this paper, our aim is to provide pragmatic guidance for researchers on the methods of calculating sample sizes. We focus attention on designs with the primary purpose of comparing two interventions with respect to continuous, binary, ordered categorical, incidence rate and time-to-event outcome variables. Issues of aggregate and non-aggregate cluster trials, adjustment for variation in cluster size and the effect size are detailed. The problem of establishing the anticipated magnitude of between- and within-cluster variation to enable planning values of the intra-cluster correlation coefficient and the coefficient of variation are also described. Illustrative examples of calculations of trial sizes for each endpoint type are included. Copyright © 2015 Elsevier Inc. All rights reserved.
The Auroral Field-aligned Acceleration - Cluster Results

NASA Astrophysics Data System (ADS)

Vaivads, A.; Cluster Auroral Team

The four Cluster satellites cross the auroral field lines at altitudes well above most of acceleration region. Thus, the orbit is appropriate for studies of the generator side of this region. We consider the energy transport towards the acceleration region and different mechanisms for generating the potential drop. Using data from Cluster we can also for the first time study the dynamics of the generator on a minute scale. We present data from a few auroral field crossings where Cluster are in conjunction with DMSP satellites. We use electric and magnetic field data to estimate electrostatic po- tential along the satellite orbit, Poynting flux as well as the presence of plasma waves. These we can compare with data from particle and wave instruments on Cluster and on low latitude satellites to try to make a consistent picture of the acceleration region formation in these cases. Preliminary results show close agreement both between in- tegrated potential values at Cluster and electron peak energies at DMSP as well as close agreement between the integrated Poynting flux values at Cluster and the elec- tron energy flux at DMSP. At the end we draw a parallels between auroral electron acceleration and electron acceleration at the magnetopause.
Accounting for One-Group Clustering in Effect-Size Estimation

ERIC Educational Resources Information Center

Citkowicz, Martyna; Hedges, Larry V.

2013-01-01

In some instances, intentionally or not, study designs are such that there is clustering in one group but not in the other. This paper describes methods for computing effect size estimates and their variances when there is clustering in only one group and the analysis has not taken that clustering into account. The authors provide the effect size…
Measurement Error Correction Formula for Cluster-Level Group Differences in Cluster Randomized and Observational Studies

ERIC Educational Resources Information Center

Cho, Sun-Joo; Preacher, Kristopher J.

2016-01-01

Multilevel modeling (MLM) is frequently used to detect cluster-level group differences in cluster randomized trial and observational studies. Group differences on the outcomes (posttest scores) are detected by controlling for the covariate (pretest scores) as a proxy variable for unobserved factors that predict future attributes. The pretest and…
Visualization of Unsteady Computational Fluid Dynamics

NASA Technical Reports Server (NTRS)

Haimes, Robert

1997-01-01

The current compute environment that most researchers are using for the calculation of 3D unsteady Computational Fluid Dynamic (CFD) results is a super-computer class machine. The Massively Parallel Processors (MPP's) such as the 160 node IBM SP2 at NAS and clusters of workstations acting as a single MPP (like NAS's SGI Power-Challenge array and the J90 cluster) provide the required computation bandwidth for CFD calculations of transient problems. If we follow the traditional computational analysis steps for CFD (and we wish to construct an interactive visualizer) we need to be aware of the following: (1) Disk space requirements. A single snap-shot must contain at least the values (primitive variables) stored at the appropriate locations within the mesh. For most simple 3D Euler solvers that means 5 floating point words. Navier-Stokes solutions with turbulence models may contain 7 state-variables. (2) Disk speed vs. Computational speeds. The time required to read the complete solution of a saved time frame from disk is now longer than the compute time for a set number of iterations from an explicit solver. Depending, on the hardware and solver an iteration of an implicit code may also take less time than reading the solution from disk. If one examines the performance improvements in the last decade or two, it is easy to see that depending on disk performance (vs. CPU improvement) may not be the best method for enhancing interactivity. (3) Cluster and Parallel Machine I/O problems. Disk access time is much worse within current parallel machines and cluster of workstations that are acting in concert to solve a single problem. In this case we are not trying to read the volume of data, but are running the solver and the solver outputs the solution. These traditional network interfaces must be used for the file system. (4) Numerics of particle traces. Most visualization tools can work upon a single snap shot of the data but some visualization tools for transient problems require dealing with time.
The portals 4.0.1 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2013-04-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Parallel scalability of Hartree-Fock calculations

NASA Astrophysics Data System (ADS)

Chow, Edmond; Liu, Xing; Smelyanskiy, Mikhail; Hammond, Jeff R.

2015-03-01

Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree-Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are network-bandwidth bound, making purification methods potentially faster than eigendecomposition methods.
Massively parallel quantum computer simulator

NASA Astrophysics Data System (ADS)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Automated Production of Movies on a Cluster of Computers

NASA Technical Reports Server (NTRS)

Nail, Jasper; Le, Duong; Nail, William L.; Nail, William

2008-01-01

A method of accelerating and facilitating production of video and film motion-picture products, and software and generic designs of computer hardware to implement the method, are undergoing development. The method provides for automation of most of the tedious and repetitive tasks involved in editing and otherwise processing raw digitized imagery into final motion-picture products. The method was conceived to satisfy requirements, in industrial and scientific testing, for rapid processing of multiple streams of simultaneously captured raw video imagery into documentation in the form of edited video imagery and video derived data products for technical review and analysis. In the production of such video technical documentation, unlike in production of motion-picture products for entertainment, (1) it is often necessary to produce multiple video derived data products, (2) there are usually no second chances to repeat acquisition of raw imagery, (3) it is often desired to produce final products within minutes rather than hours, days, or months, and (4) consistency and quality, rather than aesthetics, are the primary criteria for judging the products. In the present method, the workflow has both serial and parallel aspects: processing can begin before all the raw imagery has been acquired, each video stream can be subjected to different stages of processing simultaneously on different computers that may be grouped into one or more cluster(s), and the final product may consist of multiple video streams. Results of processing on different computers are shared, so that workers can collaborate effectively.

Multimorbidity and patterns of chronic conditions in a primary care population in Switzerland: a cross-sectional study

PubMed Central

Déruaz-Luyet, Anouk; N'Goran, A Alexandra; Senn, Nicolas; Bodenmann, Patrick; Pasquier, Jérôme; Widmer, Daniel; Tandjung, Ryan; Rosemann, Thomas; Frey, Peter; Streit, Sven; Zeller, Andreas; Excoffier, Sophie; Burnand, Bernard; Herzig, Lilli

2017-01-01

Objective To characterise in details a random sample of multimorbid patients in Switzerland and to evaluate the clustering of chronic conditions in that sample. Methods 100 general practitioners (GPs) each enrolled 10 randomly selected multimorbid patients aged ≥18 years old and suffering from at least three chronic conditions. The prevalence of 75 separate chronic conditions from the International Classification of Primary Care-2 (ICPC-2) was evaluated in these patients. Clusters of chronic conditions were studied in parallel. Results The final database included 888 patients. Mean (SD) patient age was 73.0 (12.0) years old. They suffered from 5.5 (2.2) chronic conditions and were prescribed 7.7 (3.5) drugs; 25.7% suffered from depression. Psychological conditions were more prevalent among younger individuals (≤66 years old). Cluster analysis of chronic conditions with a prevalence ≥5% in the sample revealed four main groups of conditions: (1) cardiovascular risk factors and conditions, (2) general age-related and metabolic conditions, (3) tobacco and alcohol dependencies, and (4) pain, musculoskeletal and psychological conditions. Conclusion Given the emerging epidemic of multimorbidity in industrialised countries, accurately depicting the multiple expressions of multimorbidity in family practices’ patients is a high priority. Indeed, even in a setting where patients have direct access to medical specialists, GPs nevertheless retain a key role as coordinators and often as the sole medical reference for multimorbid patients. PMID:28674127
Evidence-based care of older people with suspected cognitive impairment in general practice: protocol for the IRIS cluster randomised trial.

PubMed

McKenzie, Joanne E; French, Simon D; O'Connor, Denise A; Mortimer, Duncan S; Browning, Colette J; Russell, Grant M; Grimshaw, Jeremy M; Eccles, Martin P; Francis, Jill J; Michie, Susan; Murphy, Kerry; Kossenas, Fiona; Green, Sally E

2013-08-19

Dementia is a common and complex condition. Evidence-based guidelines for the management of people with dementia in general practice exist; however, detection, diagnosis and disclosure of dementia have been identified as potential evidence-practice gaps. Interventions to implement guidelines into practice have had varying success. The use of theory in designing implementation interventions has been limited, but is advocated because of its potential to yield more effective interventions and aid understanding of factors modifying the magnitude of intervention effects across trials. This protocol describes methods of a randomised trial that tests a theory-informed implementation intervention that, if effective, may provide benefits for patients with dementia and their carers. This trial aims to estimate the effectiveness of a theory-informed intervention to increase GPs' (in Victoria, Australia) adherence to a clinical guideline for the detection, diagnosis, and management of dementia in general practice, compared with providing GPs with a printed copy of the guideline. Primary objectives include testing if the intervention is effective in increasing the percentage of patients with suspected cognitive impairment who receive care consistent with two key guideline recommendations: receipt of a i) formal cognitive assessment, and ii) depression assessment using a validated scale (primary outcomes for the trial). The design is a parallel cluster randomised trial, with clusters being general practices. We aim to recruit 60 practices per group. Practices will be randomised to the intervention and control groups using restricted randomisation. Patients meeting the inclusion criteria, and GPs' detection and diagnosis behaviours directed toward these patients, will be identified and measured via an electronic search of the medical records nine months after the start of the intervention. Practitioners in the control group will receive a printed copy of the guideline. In addition to receipt of the printed guideline, practitioners in the intervention group will be invited to participate in an interactive, opinion leader-led, educational face-to-face workshop. The theory-informed intervention aims to address identified barriers to and enablers of implementation of recommendations. Researchers responsible for identifying the cohort of patients with suspected cognitive impairment, and their detection and diagnosis outcomes, will be blind to group allocation. Australian New Zealand Clinical Trials Registry: ACTRN12611001032943 (date registered 28 September, 2011).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ogden, K; O’Dwyer, R; Bradford, T

Purpose: To reduce differences in features calculated from MRI brain scans acquired at different field strengths with or without Gadolinium contrast. Methods: Brain scans were processed for 111 epilepsy patients to extract hippocampus and thalamus features. Scans were acquired on 1.5 T scanners with Gadolinium contrast (group A), 1.5T scanners without Gd (group B), and 3.0 T scanners without Gd (group C). A total of 72 features were extracted. Features were extracted from original scans and from scans where the image pixel values were rescaled to the mean of the hippocampi and thalami values. For each data set, cluster analysismore » was performed on the raw feature set and for feature sets with normalization (conversion to Z scores). Two methods of normalization were used: The first was over all values of a given feature, and the second by normalizing within the patient group membership. The clustering software was configured to produce 3 clusters. Group fractions in each cluster were calculated. Results: For features calculated from both the non-rescaled and rescaled data, cluster membership was identical for both the non-normalized and normalized data sets. Cluster 1 was comprised entirely of Group A data, Cluster 2 contained data from all three groups, and Cluster 3 contained data from only groups 1 and 2. For the categorically normalized data sets there was a more uniform distribution of group data in the three Clusters. A less pronounced effect was seen in the rescaled image data features. Conclusion: Image Rescaling and feature renormalization can have a significant effect on the results of clustering analysis. These effects are also likely to influence the results of supervised machine learning algorithms. It may be possible to partly remove the influence of scanner field strength and the presence of Gadolinium based contrast in feature extraction for radiomics applications.« less
Chronology of the halo globular cluster system formation.

NASA Astrophysics Data System (ADS)

Salaris, M.; Weiss, A.

1997-11-01

Using up-to-date stellar models and isochrones we determine the age of 25 galactic halo clusters. The clusters are distributed into four groups according to metallicity. We measure the absolute age of a reference cluster in each group, and then find the relative ages of the other clusters relative to this one. This combination yields the most reliable results. We find that the oldest cluster group on average is 11.8+/-0.9Gyr or 12.3+/-0.3Gyr old, depending on whether we include Arp 2 and Rup 106. The average age of all clusters is about 10.5Gyr. Questions concerning a common age for all clusters and a relation between metallicity and age are addressed. The groups of lower metallicity appear to be coeval, but our results indicate that globally the sample has an age spread, and age and metallicity are correlated but not with a simple linear relation.
Highly efficient spatial data filtering in parallel using the opensource library CPPPO

NASA Astrophysics Data System (ADS)

Municchi, Federico; Goniva, Christoph; Radl, Stefan

2016-10-01

CPPPO is a compilation of parallel data processing routines developed with the aim to create a library for "scale bridging" (i.e. connecting different scales by mean of closure models) in a multi-scale approach. CPPPO features a number of parallel filtering algorithms designed for use with structured and unstructured Eulerian meshes, as well as Lagrangian data sets. In addition, data can be processed on the fly, allowing the collection of relevant statistics without saving individual snapshots of the simulation state. Our library is provided with an interface to the widely-used CFD solver OpenFOAM®, and can be easily connected to any other software package via interface modules. Also, we introduce a novel, extremely efficient approach to parallel data filtering, and show that our algorithms scale super-linearly on multi-core clusters. Furthermore, we provide a guideline for choosing the optimal Eulerian cell selection algorithm depending on the number of CPU cores used. Finally, we demonstrate the accuracy and the parallel scalability of CPPPO in a showcase focusing on heat and mass transfer from a dense bed of particles.
Broadcasting a message in a parallel computer

DOEpatents

Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

2011-08-02

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

PubMed Central

Diaz-Ordaz, Karla; Bartlett, Jonathan W

2016-01-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

PubMed

Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

2017-06-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
Cluster management.

PubMed

Katz, R

1992-11-01

Cluster management is a management model that fosters decentralization of management, develops leadership potential of staff, and creates ownership of unit-based goals. Unlike shared governance models, there is no formal structure created by committees and it is less threatening for managers. There are two parts to the cluster management model. One is the formation of cluster groups, consisting of all staff and facilitated by a cluster leader. The cluster groups function for communication and problem-solving. The second part of the cluster management model is the creation of task forces. These task forces are designed to work on short-term goals, usually in response to solving one of the unit's goals. Sometimes the task forces are used for quality improvement or system problems. Clusters are groups of not more than five or six staff members, facilitated by a cluster leader. A cluster is made up of individuals who work the same shift. For example, people with job titles who work days would be in a cluster. There would be registered nurses, licensed practical nurses, nursing assistants, and unit clerks in the cluster. The cluster leader is chosen by the manager based on certain criteria and is trained for this specialized role. The concept of cluster management, criteria for choosing leaders, training for leaders, using cluster groups to solve quality improvement issues, and the learning process necessary for manager support are described.
Camino Verde (The Green Way): evidence-based community mobilisation for dengue control in Nicaragua and Mexico: feasibility study and study protocol for a randomised controlled trial.

PubMed

Andersson, Neil; Arostegui, Jorge; Nava-Aguilera, Elizabeth; Harris, Eva; Ledogar, Robert J

2017-05-30

Since the Aedes aegypti mosquitoes that transmit dengue virus can breed in clean water, WHO-endorsed vector control strategies place sachets of organophosphate pesticide, temephos (Abate), in household water storage containers. These and other pesticide-dependent approaches have failed to curb the spread of dengue and multiple dengue virus serotypes continue to spread throughout tropical and subtropical regions worldwide. A feasibility study in Managua, Nicaragua, generated instruments, intervention protocols, training schedules and impact assessment tools for a cluster randomised controlled trial of community-based approaches to vector control comprising an alternative strategy for dengue prevention and control in Nicaragua and Mexico. The Camino Verde (Green Way) is a pragmatic parallel group trial of pesticide-free dengue vector control, adding effectiveness to the standard government dengue control. A random sample from the most recent census in three coastal regions of Guerrero state in Mexico will generate 90 study clusters and the equivalent sampling frame in Managua, Nicaragua will generate 60 clusters, making a total of 150 clusters each of 137-140 households. After a baseline study, computer-driven randomisation will allocate to intervention one half of the sites, stratified by country, evidence of recent dengue virus infection in children aged 3-9 years and, in Nicaragua, level of community organisation. Following a common evidence-based education protocol, each cluster will develop and implement its own collective interventions including house-to-house visits, school-based programmes and inter-community visits. After 18 months, a follow-up study will compare dengue history, serological evidence of recent dengue virus infection (via measurement of anti-dengue virus antibodies in saliva samples) and entomological indices between intervention and control sites. Our hypothesis is that informed community mobilisation adds effectiveness in controlling dengue. ISRCTN27581154 .
The need to balance merits and limitations from different disciplines when considering the stepped wedge cluster randomized trial design.

PubMed

de Hoop, Esther; van der Tweel, Ingeborg; van der Graaf, Rieke; Moons, Karel G M; van Delden, Johannes J M; Reitsma, Johannes B; Koffijberg, Hendrik

2015-10-30

Various papers have addressed pros and cons of the stepped wedge cluster randomized trial design (SWD). However, some issues have not or only limitedly been addressed. Our aim was to provide a comprehensive overview of all merits and limitations of the SWD to assist researchers, reviewers and medical ethics committees when deciding on the appropriateness of the SWD for a particular study. We performed an initial search to identify articles with a methodological focus on the SWD, and categorized and discussed all reported advantages and disadvantages of the SWD. Additional aspects were identified during multidisciplinary meetings in which ethicists, biostatisticians, clinical epidemiologists and health economists participated. All aspects of the SWD were compared to the parallel group cluster randomized design. We categorized the merits and limitations of the SWD to distinct phases in the design and conduct of such studies, highlighting that their impact may vary depending on the context of the study or that benefits may be offset by drawbacks across study phases. Furthermore, a real-life illustration is provided. New aspects are identified within all disciplines. Examples of newly identified aspects of an SWD are: the possibility to measure a treatment effect in each cluster to examine the (in)consistency in effects across clusters, the detrimental effect of lower than expected inclusion rates, deviation from the ordinary informed consent process and the question whether studies using the SWD are likely to have sufficient social value. Discussions are provided on e.g. clinical equipoise, social value, health economical decision making, number of study arms, and interim analyses. Deciding on the use of the SWD involves aspects and considerations from different disciplines not all of which have been discussed before. Pros and cons of this design should be balanced in comparison to other feasible design options as to choose the optimal design for a particular intervention study.
Scalable Visual Analytics of Massive Textual Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.

2007-04-01

This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
Hypercluster - Parallel processing for computational mechanics

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1988-01-01

An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Parallel Implementation of Numerical Solution of Few-Body Problem Using Feynman's Continual Integrals

NASA Astrophysics Data System (ADS)

Naumenko, Mikhail; Samarin, Viacheslav

2018-02-01

Modern parallel computing algorithm has been applied to the solution of the few-body problem. The approach is based on Feynman's continual integrals method implemented in C++ programming language using NVIDIA CUDA technology. A wide range of 3-body and 4-body bound systems has been considered including nuclei described as consisting of protons and neutrons (e.g., 3,4He) and nuclei described as consisting of clusters and nucleons (e.g., 6He). The correctness of the results was checked by the comparison with the exactly solvable 4-body oscillatory system and experimental data.
A numerical differentiation library exploiting parallel architectures

NASA Astrophysics Data System (ADS)

Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

2009-08-01

We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
The Scale Sizes of Globular Clusters: Tidal Limits, Evolution, and the Outer Halo

NASA Astrophysics Data System (ADS)

Harris, William

2011-10-01

The physical factors that determine the linear sizes of massive star clusters are not well understood. Their scale sizes were long thought to be governed by the tidal field of the parent galaxy, but major questions are now emerging. Globular clusters, for example, have mean sizes nearly independent of location in the halo. Paradoxically, the recently discovered "anomalous extended clusters" in M31 and elsewhere have scale sizes that fit much better with tidal theory, but they are puzzlingly rare. Lastly, the persistent size difference between metal-poor and metal-rich clusters still lacks a quantitative explanation. Many aspects of these observations call for better modelling of dynamical evolution in the outskirts of clusters, and also their conditions of formation including the early rapid mass loss phase of protoclusters. A new set of accurate measurements of scale sizes and structural parameters, for a large and homogeneous set of globular clusters, would represent a major advance in this subject. We propose to carry out a {WFC3+ACS} imaging survey of the globular clusters in the supergiant Virgo elliptical M87 to cover the complete run of the halo. M87 is an optimum target system because of its huge numbers of clusters and HST's ability to resolve the cluster profiles accurately. We will derive cluster effective radii, central concentrations, luminosities, and colors for more than 4000 clusters using PSF-convolved King-model profile fitting. In parallel, we are developing theoretical tools to model the expected distribution of cluster sizes versus galactocentric distance as functions of cluster mass, concentration, and orbital anisotropy.
Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets

Treesearch

Jitendra Kumar; Richard T. Mills; Forrest M Hoffman; William W Hargrove

2011-01-01

Identification of geographic ecoregions has long been of interest to environmental scientists and ecologists for identifying regions of similar ecological and environmental conditions. Such classifications are important for predicting suitable species ranges, for stratification of ecological samples, and to help prioritize habitat preservation and remediation efforts....
rfpipe: Radio interferometric transient search pipeline

NASA Astrophysics Data System (ADS)

Law, Casey J.

2017-10-01

rfpipe supports Python-based analysis of radio interferometric data (especially from the Very Large Array) and searches for fast radio transients. This extends on the rtpipe library (ascl:1706.002) with new approaches to parallelization, acceleration, and more portable data products. rfpipe can run in standalone mode or be in a cluster environment.
Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

PubMed

Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

2017-12-01

Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.
Model selection for clustering of pharmacokinetic responses.

PubMed

Guerra, Rui P; Carvalho, Alexandra M; Mateus, Paulo

2018-08-01

Pharmacokinetics comprises the study of drug absorption, distribution, metabolism and excretion over time. Clinical pharmacokinetics, focusing on therapeutic management, offers important insights towards personalised medicine through the study of efficacy and toxicity of drug therapies. This study is hampered by subject's high variability in drug blood concentration, when starting a therapy with the same drug dosage. Clustering of pharmacokinetics responses has been addressed recently as a way to stratify subjects and provide different drug doses for each stratum. This clustering method, however, is not able to automatically determine the correct number of clusters, using an user-defined parameter for collapsing clusters that are closer than a given heuristic threshold. We aim to use information-theoretical approaches to address parameter-free model selection. We propose two model selection criteria for clustering pharmacokinetics responses, founded on the Minimum Description Length and on the Normalised Maximum Likelihood. Experimental results show the ability of model selection schemes to unveil the correct number of clusters underlying the mixture of pharmacokinetics responses. In this work we were able to devise two model selection criteria to determine the number of clusters in a mixture of pharmacokinetics curves, advancing over previous works. A cost-efficient parallel implementation in Java of the proposed method is publicly available for the community. Copyright © 2018 Elsevier B.V. All rights reserved.

Simulations of the Formation and Evolution of X-ray Clusters

NASA Astrophysics Data System (ADS)

Bryan, G. L.; Klypin, A.; Norman, M. L.

1994-05-01

We describe results from a set of Omega = 1 Cold plus Hot Dark Matter (CHDM) and Cold Dark Matter (CDM) simulations. We examine the formation and evolution of X-ray clusters in a cosmological setting with sufficient numbers to perform statistical analysis. We find that CDM, normalized to COBE, seems to produce too many large clusters, both in terms of the luminosity (dn/dL) and temperature (dn/dT) functions. The CHDM simulation produces fewer clusters and the temperature distribution (our numerically most secure result) matches observations where they overlap. The computed cluster luminosity function drops below observations, but we are almost surely underestimating the X-ray luminosity. Because of the lower fluctuations in CHDM, there are only a small number of bright clusters in our simulation volume; however we can use the simulated clusters to fix the relation between temperature and velocity dispersion, allowing us to use collisionless N-body codes to probe larger length scales with correspondingly brighter clusters. The hydrodynamic simulations have been performed with a hybrid particle-mesh scheme for the dark matter and a high resolution grid-based piecewise parabolic method for the adiabatic gas dynamics. This combination has been implemented for massively parallel computers, allowing us to achive grids as large as 512(3) .
Regional health care planning: a methodology to cluster facilities using community utilization patterns

PubMed Central

2013-01-01

Background Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state’s Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. Methods The clustering methodology employs a 2-step K-means + Ward’s clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Results Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Conclusions Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units. PMID:23964905
Regional health care planning: a methodology to cluster facilities using community utilization patterns.

PubMed

Delamater, Paul L; Shortridge, Ashton M; Messina, Joseph P

2013-08-22

Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state's Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. The clustering methodology employs a 2-step K-means + Ward's clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units.
The preBötzinger complex as a hub for network activity along the ventral respiratory column in the neonate rat.

PubMed

Gourévitch, Boris; Mellen, Nicholas

2014-09-01

In vertebrates, respiratory control is ascribed to heterogeneous respiration-modulated neurons along the Ventral Respiratory Column (VRC) in medulla, which includes the preBötzinger Complex (preBötC), the putative respiratory rhythm generator. Here, the functional anatomy of the VRC was characterized via optical recordings in the sagittaly sectioned neonate rat hindbrain, at sampling rates permitting coupling estimation between neuron pairs, so that each neuron was described using unitary, neuron-system, and coupling attributes. Structured coupling relations in local networks, significantly oriented coupling in the peri-inspiratory interval detected in pooled data, and significant correlations between firing rate and expiratory duration in subsets of neurons revealed network regulation at multiple timescales. Spatially averaged neuronal attributes, including coupling vectors, revealed a sharp boundary at the rostral margin of the preBötC, as well as other functional anatomical features congruent with identified structures, including the parafacial respiratory group and the nucleus ambiguus. Cluster analysis of attributes identified two spatially compact, homogenous groups: the first overlapped with the preBötC, and was characterized by strong respiratory modulation and dense bidirectional coupling with itself and other groups, consistent with a central role for the preBötC in respiratory control; the second lay between preBötC and the facial nucleus, and was characterized by weak respiratory modulation and weak coupling with other respiratory neurons, which is congruent with cardiovascular regulatory networks that are found in this region. Other groups identified using cluster analysis suggested that networks along VRC regulated expiratory duration, and the transition to and from inspiration, but these groups were heterogeneous and anatomically dispersed. Thus, by recording local networks in parallel, this study found evidence for respiratory regulation at multiple timescales along the VRC, as well as a role for the preBötC in the integration of functionally disparate respiratory neurons. Copyright © 2014 Elsevier Inc. All rights reserved.
Procedure of Partitioning Data Into Number of Data Sets or Data Group - A Review

NASA Astrophysics Data System (ADS)

Kim, Tai-Hoon

The goal of clustering is to decompose a dataset into similar groups based on a objective function. Some already well established clustering algorithms are there for data clustering. Objective of these data clustering algorithms are to divide the data points of the feature space into a number of groups (or classes) so that a predefined set of criteria are satisfied. The article considers the comparative study about the effectiveness and efficiency of traditional data clustering algorithms. For evaluating the performance of the clustering algorithms, Minkowski score is used here for different data sets.
Parallel Numerical Simulations of Water Reservoirs

NASA Astrophysics Data System (ADS)

Torres, Pedro; Mangiavacchi, Norberto

2010-11-01

The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.
CloudMC: a cloud computing application for Monte Carlo simulation.

PubMed

Miras, H; Jiménez, R; Miras, C; Gomà, C

2013-04-21

This work presents CloudMC, a cloud computing application-developed in Windows Azure®, the platform of the Microsoft® cloud-for the parallelization of Monte Carlo simulations in a dynamic virtual cluster. CloudMC is a web application designed to be independent of the Monte Carlo code in which the simulations are based-the simulations just need to be of the form: input files → executable → output files. To study the performance of CloudMC in Windows Azure®, Monte Carlo simulations with penelope were performed on different instance (virtual machine) sizes, and for different number of instances. The instance size was found to have no effect on the simulation runtime. It was also found that the decrease in time with the number of instances followed Amdahl's law, with a slight deviation due to the increase in the fraction of non-parallelizable time with increasing number of instances. A simulation that would have required 30 h of CPU on a single instance was completed in 48.6 min when executed on 64 instances in parallel (speedup of 37 ×). Furthermore, the use of cloud computing for parallel computing offers some advantages over conventional clusters: high accessibility, scalability and pay per usage. Therefore, it is strongly believed that cloud computing will play an important role in making Monte Carlo dose calculation a reality in future clinical practice.
High performance computing aspects of a dimension independent semi-Lagrangian discontinuous Galerkin code

NASA Astrophysics Data System (ADS)

Einkemmer, Lukas

2016-05-01

The recently developed semi-Lagrangian discontinuous Galerkin approach is used to discretize hyperbolic partial differential equations (usually first order equations). Since these methods are conservative, local in space, and able to limit numerical diffusion, they are considered a promising alternative to more traditional semi-Lagrangian schemes (which are usually based on polynomial or spline interpolation). In this paper, we consider a parallel implementation of a semi-Lagrangian discontinuous Galerkin method for distributed memory systems (so-called clusters). Both strong and weak scaling studies are performed on the Vienna Scientific Cluster 2 (VSC-2). In the case of weak scaling we observe a parallel efficiency above 0.8 for both two and four dimensional problems and up to 8192 cores. Strong scaling results show good scalability to at least 512 cores (we consider problems that can be run on a single processor in reasonable time). In addition, we study the scaling of a two dimensional Vlasov-Poisson solver that is implemented using the framework provided. All of the simulations are conducted in the context of worst case communication overhead; i.e., in a setting where the CFL (Courant-Friedrichs-Lewy) number increases linearly with the problem size. The framework introduced in this paper facilitates a dimension independent implementation of scientific codes (based on C++ templates) using both an MPI and a hybrid approach to parallelization. We describe the essential ingredients of our implementation.
The novel implicit LU-SGS parallel iterative method based on the diffusion equation of a nuclear reactor on a GPU cluster

NASA Astrophysics Data System (ADS)

Zhang, Jilin; Sha, Chaoqun; Wu, Yusen; Wan, Jian; Zhou, Li; Ren, Yongjian; Si, Huayou; Yin, Yuyu; Jing, Ya

2017-02-01

GPU not only is used in the field of graphic technology but also has been widely used in areas needing a large number of numerical calculations. In the energy industry, because of low carbon, high energy density, high duration and other characteristics, the development of nuclear energy cannot easily be replaced by other energy sources. Management of core fuel is one of the major areas of concern in a nuclear power plant, and it is directly related to the economic benefits and cost of nuclear power. The large-scale reactor core expansion equation is large and complicated, so the calculation of the diffusion equation is crucial in the core fuel management process. In this paper, we use CUDA programming technology on a GPU cluster to run the LU-SGS parallel iterative calculation against the background of the diffusion equation of the reactor. We divide one-dimensional and two-dimensional mesh into a plurality of domains, with each domain evenly distributed on the GPU blocks. A parallel collision scheme is put forward that defines the virtual boundary of the grid exchange information and data transmission by non-stop collision. Compared with the serial program, the experiment shows that GPU greatly improves the efficiency of program execution and verifies that GPU is playing a much more important role in the field of numerical calculations.
The specificity of learned parallelism in dual-memory retrieval.

PubMed

Strobach, Tilo; Schubert, Torsten; Pashler, Harold; Rickard, Timothy

2014-05-01

Retrieval of two responses from one visually presented cue occurs sequentially at the outset of dual-retrieval practice. Exclusively for subjects who adopt a mode of grouping (i.e., synchronizing) their response execution, however, reaction times after dual-retrieval practice indicate a shift to learned retrieval parallelism (e.g., Nino & Rickard, in Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 373-388, 2003). In the present study, we investigated how this learned parallelism is achieved and why it appears to occur only for subjects who group their responses. Two main accounts were considered: a task-level versus a cue-level account. The task-level account assumes that learned retrieval parallelism occurs at the level of the task as a whole and is not limited to practiced cues. Grouping response execution may thus promote a general shift to parallel retrieval following practice. The cue-level account states that learned retrieval parallelism is specific to practiced cues. This type of parallelism may result from cue-specific response chunking that occurs uniquely as a consequence of grouped response execution. The results of two experiments favored the second account and were best interpreted in terms of a structural bottleneck model.
TU-AB-BRC-12: Optimized Parallel MonteCarlo Dose Calculations for Secondary MU Checks

DOE Office of Scientific and Technical Information (OSTI.GOV)

French, S; Nazareth, D; Bellor, M

Purpose: Secondary MU checks are an important tool used during a physics review of a treatment plan. Commercial software packages offer varying degrees of theoretical dose calculation accuracy, depending on the modality involved. Dose calculations of VMAT plans are especially prone to error due to the large approximations involved. Monte Carlo (MC) methods are not commonly used due to their long run times. We investigated two methods to increase the computational efficiency of MC dose simulations with the BEAMnrc code. Distributed computing resources, along with optimized code compilation, will allow for accurate and efficient VMAT dose calculations. Methods: The BEAMnrcmore » package was installed on a high performance computing cluster accessible to our clinic. MATLAB and PYTHON scripts were developed to convert a clinical VMAT DICOM plan into BEAMnrc input files. The BEAMnrc installation was optimized by running the VMAT simulations through profiling tools which indicated the behavior of the constituent routines in the code, e.g. the bremsstrahlung splitting routine, and the specified random number generator. This information aided in determining the most efficient compiling parallel configuration for the specific CPU’s available on our cluster, resulting in the fastest VMAT simulation times. Our method was evaluated with calculations involving 10{sup 8} – 10{sup 9} particle histories which are sufficient to verify patient dose using VMAT. Results: Parallelization allowed the calculation of patient dose on the order of 10 – 15 hours with 100 parallel jobs. Due to the compiler optimization process, further speed increases of 23% were achieved when compared with the open-source compiler BEAMnrc packages. Conclusion: Analysis of the BEAMnrc code allowed us to optimize the compiler configuration for VMAT dose calculations. In future work, the optimized MC code, in conjunction with the parallel processing capabilities of BEAMnrc, will be applied to provide accurate and efficient secondary MU checks.« less
The Parallel System for Integrating Impact Models and Sectors (pSIMS)

NASA Technical Reports Server (NTRS)

Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian

2014-01-01

We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
MPgrafic: A parallel MPI version of Grafic-1

NASA Astrophysics Data System (ADS)

Prunet, Simon; Pichon, Christophe

2013-04-01

MPgrafic is a parallel MPI version of Grafic-1 which can produce large cosmological initial conditions on a cluster without requiring shared memory. The real Fourier transforms are carried in place using fftw while minimizing the amount of used memory (at the expense of performance) in the spirit of Grafic-1. The writing of the output file is also carried in parallel. In addition to the technical parallelization, it provides three extensions over Grafic-1: it can produce power spectra with baryon wiggles (DJ Eisenstein and W. Hu, Ap. J. 496);it has the optional ability to load a lower resolution noise map corresponding to the low frequency component which will fix the larger scale modes of the simulation (extra flag 0/1 at the end of the input process) in the spirit of Grafic-2;it can be used in conjunction with constrfield, which generates initial conditions phases from a list of local constraints on density, tidal field density gradient and velocity.
Processing large remote sensing image data sets on Beowulf clusters

USGS Publications Warehouse

Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail

2003-01-01

High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.
Distributed parallel computing in stochastic modeling of groundwater systems.

PubMed

Dong, Yanhui; Li, Guomin; Xu, Haizhen

2013-03-01

Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.
Parallelized implicit propagators for the finite-difference Schrödinger equation

NASA Astrophysics Data System (ADS)

Parker, Jonathan; Taylor, K. T.

1995-08-01

We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory

NASA Astrophysics Data System (ADS)

Challacombe, Matt

2000-06-01

A general approach to the parallel sparse-blocked matrix-matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.
Parallel Rendering of Large Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Garbutt, Alexander E.

2005-01-01

Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

PubMed

Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

2014-09-09

A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.
Mining algorithm for association rules in big data based on Hadoop

NASA Astrophysics Data System (ADS)

Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying

2018-04-01

In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.