clustering method called: Topics by Science.gov

Sample records for clustering method called

Beluga whale, Delphinapterus leucas, vocalizations from the Churchill River, Manitoba, Canada.

PubMed

Chmelnitsky, Elly G; Ferguson, Steven H

2012-06-01

Classification of animal vocalizations is often done by a human observer using aural and visual analysis but more efficient, automated methods have also been utilized to reduce bias and increase reproducibility. Beluga whale, Delphinapterus leucas, calls were described from recordings collected in the summers of 2006-2008, in the Churchill River, Manitoba. Calls (n=706) were classified based on aural and visual analysis, and call characteristics were measured; calls were separated into 453 whistles (64.2%; 22 types), 183 pulsed∕noisy calls (25.9%; 15 types), and 70 combined calls (9.9%; seven types). Measured parameters varied within each call type but less variation existed in pulsed and noisy call types and some combined call types than in whistles. A more efficient and repeatable hierarchical clustering method was applied to 200 randomly chosen whistles using six call characteristics as variables; twelve groups were identified. Call characteristics varied less in cluster analysis groups than in whistle types described by visual and aural analysis and results were similar to the whistle contours described. This study provided the first description of beluga calls in Hudson Bay and using two methods provides more robust interpretations and an assessment of appropriate methods for future studies.
The implementation of hybrid clustering using fuzzy c-means and divisive algorithm for analyzing DNA human Papillomavirus cause of cervical cancer

NASA Astrophysics Data System (ADS)

Andryani, Diyah Septi; Bustamam, Alhadi; Lestari, Dian

2017-03-01

Clustering aims to classify the different patterns into groups called clusters. In this clustering method, we use n-mers frequency to calculate the distance matrix which is considered more accurate than using the DNA alignment. The clustering results could be used to discover biologically important sub-sections and groups of genes. Many clustering methods have been developed, while hard clustering methods considered less accurate than fuzzy clustering methods, especially if it is used for outliers data. Among fuzzy clustering methods, fuzzy c-means is one the best known for its accuracy and simplicity. Fuzzy c-means clustering uses membership function variable, which refers to how likely the data could be members into a cluster. Fuzzy c-means clustering works using the principle of minimizing the objective function. Parameters of membership function in fuzzy are used as a weighting factor which is also called the fuzzier. In this study we implement hybrid clustering using fuzzy c-means and divisive algorithm which could improve the accuracy of cluster membership compare to traditional partitional approach only. In this study fuzzy c-means is used in the first step to find partition results. Furthermore divisive algorithms will run on the second step to find sub-clusters and dendogram of phylogenetic tree. To find the best number of clusters is determined using the minimum value of Davies Bouldin Index (DBI) of the cluster results. In this research, the results show that the methods introduced in this paper is better than other partitioning methods. Finally, we found 3 clusters with DBI value of 1.126628 at first step of clustering. Moreover, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values.
Environmental Gradient Analysis, Ordination, and Classification in Environmental Impact Assessments.

DTIC Science & Technology

1987-09-01

agglomerative clustering algorithms for mainframe computers: (1) the unweighted pair-group method that V uses arithmetic averages ( UPGMA ), (2) the...hierarchical agglomerative unweighted pair-group method using arithmetic averages ( UPGMA ), which is also called average linkage clustering. This method was...dendrograms produced by weighted clustering (93). Sneath and Sokal (94), Romesburg (84), and Seber• (90) also strongly recommend the UPGMA . A dendrogram
Application of a Taxonomical Structure for Classifying Goods Procured by the Federal Government

DTIC Science & Technology

1991-12-01

between all pairs of objects. Also called a "tree" or "phenogram". "• UPGMA Clustering Method- (Un--weighted pair-group method using weighted averages...clustering arrangement, specifically, the unweighted pair-group method using arithmetic averages ( UPGMA ) (more commonly known as the 49 average linkage method
Multi-Optimisation Consensus Clustering

NASA Astrophysics Data System (ADS)

Li, Jian; Swift, Stephen; Liu, Xiaohui

Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.
ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations

PubMed Central

Wright, Mark H.; Tung, Chih-Wei; Zhao, Keyan; Reynolds, Andy; McCouch, Susan R.; Bustamante, Carlos D.

2010-01-01

Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster. Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called ‘ALCHEMY’ based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples. Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/ Contact: mhw6@cornell.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20926420
Procedural Guide for Designation Surveys of Ocean Dredged Material Disposal Sites. Revision

DTIC Science & Technology

1990-04-01

data standardization." One of the most frequently used clustering strategies is called UPGMA (unweighted pair-group method using arithmetic averages...Sneath and Sokal 1973). Romesburg (1984) 151 evaluated many possible methods and concluded that UPGMA is appropriate for most types of cluster
Cluster Correspondence Analysis.

PubMed

van de Velden, M; D'Enza, A Iodice; Palumbo, F

2017-03-01

A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
Methods of Conceptual Clustering and their Relation to Numerical Taxonomy.

DTIC Science & Technology

1985-07-22

the conceptual clustering problem is to first solve theaggregation problem, and then the characterization problem. In machine learning, the...cluster- ings by first generating some number of possible clusterings. For each clustering generated, one calls a learning from examples subroutine, which...class 1 from class 2, and vice versa, only the first combination implies a partition over the set of theoretically possible objects. The first
A Web service substitution method based on service cluster nets

NASA Astrophysics Data System (ADS)

Du, YuYue; Gai, JunJing; Zhou, MengChu

2017-11-01

Service substitution is an important research topic in the fields of Web services and service-oriented computing. This work presents a novel method to analyse and substitute Web services. A new concept, called a Service Cluster Net Unit, is proposed based on Web service clusters. A service cluster is converted into a Service Cluster Net Unit. Then it is used to analyse whether the services in the cluster can satisfy some service requests. Meanwhile, the substitution methods of an atomic service and a composite service are proposed. The correctness of the proposed method is proved, and the effectiveness is shown and compared with the state-of-the-art method via an experiment. It can be readily applied to e-commerce service substitution to meet the business automation needs.
Communication: Time-dependent optimized coupled-cluster method for multielectron dynamics

NASA Astrophysics Data System (ADS)

Sato, Takeshi; Pathak, Himadri; Orimo, Yuki; Ishikawa, Kenichi L.

2018-02-01

Time-dependent coupled-cluster method with time-varying orbital functions, called time-dependent optimized coupled-cluster (TD-OCC) method, is formulated for multielectron dynamics in an intense laser field. We have successfully derived the equations of motion for CC amplitudes and orthonormal orbital functions based on the real action functional, and implemented the method including double excitations (TD-OCCD) and double and triple excitations (TD-OCCDT) within the optimized active orbitals. The present method is size extensive and gauge invariant, a polynomial cost-scaling alternative to the time-dependent multiconfiguration self-consistent-field method. The first application of the TD-OCC method of intense-laser driven correlated electron dynamics in Ar atom is reported.
Communication: Time-dependent optimized coupled-cluster method for multielectron dynamics.

PubMed

Sato, Takeshi; Pathak, Himadri; Orimo, Yuki; Ishikawa, Kenichi L

2018-02-07

Time-dependent coupled-cluster method with time-varying orbital functions, called time-dependent optimized coupled-cluster (TD-OCC) method, is formulated for multielectron dynamics in an intense laser field. We have successfully derived the equations of motion for CC amplitudes and orthonormal orbital functions based on the real action functional, and implemented the method including double excitations (TD-OCCD) and double and triple excitations (TD-OCCDT) within the optimized active orbitals. The present method is size extensive and gauge invariant, a polynomial cost-scaling alternative to the time-dependent multiconfiguration self-consistent-field method. The first application of the TD-OCC method of intense-laser driven correlated electron dynamics in Ar atom is reported.
Clustering high dimensional data using RIA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aziz, Nazrina

2015-05-15

Clustering may simply represent a convenient method for organizing a large data set so that it can easily be understood and information can efficiently be retrieved. However, identifying cluster in high dimensionality data sets is a difficult task because of the curse of dimensionality. Another challenge in clustering is some traditional functions cannot capture the pattern dissimilarity among objects. In this article, we used an alternative dissimilarity measurement called Robust Influence Angle (RIA) in the partitioning method. RIA is developed using eigenstructure of the covariance matrix and robust principal component score. We notice that, it can obtain cluster easily andmore » hence avoid the curse of dimensionality. It is also manage to cluster large data sets with mixed numeric and categorical value.« less
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

PubMed Central

Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

2009-01-01

Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124
GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

PubMed

Schulz, Tizian; Stoye, Jens; Doerr, Daniel

2018-05-08

Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.
Examining the effectiveness of discriminant function analysis and cluster analysis in species identification of male field crickets based on their calling songs.

PubMed

Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

2013-01-01

Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.
Rigid geometry solves “curse of dimensionality” effects in clustering methods: An application to omics data

PubMed Central

2017-01-01

The quality of samples preserved long term at ultralow temperatures has not been adequately studied. To improve our understanding, we need a strategy to analyze protein degradation and metabolism at subfreezing temperatures. To do this, we obtained liquid chromatography-mass spectrometry (LC/MS) data of calculated protein signal intensities in HEK-293 cells. Our first attempt at directly clustering the values failed, most likely due to the so-called “curse of dimensionality”. The clusters were not reproducible, and the outputs differed with different methods. By utilizing rigid geometry with a prime ideal I-adic (p-adic) metric, however, we rearranged the sample clusters into a meaningful and reproducible order, and the results were the same with each of the different clustering methods tested. Furthermore, we have also succeeded in application of this method to expression array data in similar situations. Thus, we eliminated the “curse of dimensionality” from the data set, at least in clustering methods. It is possible that our approach determines a characteristic value of systems that follow a Boltzmann distribution. PMID:28614363
On the use of big-bang method to generate low-energy structures of atomic clusters modeled with pair potentials of different ranges.

PubMed

Marques, J M C; Pais, A A C C; Abreu, P E

2012-02-05

The efficiency of the so-called big-bang method for the optimization of atomic clusters is analysed in detail for Morse pair potentials with different ranges; here, we have used Morse potentials with four different ranges, from long- ρ = 3) to short-ranged ρ = 14) interactions. Specifically, we study the efficacy of the method in discovering low-energy structures, including the putative global minimum, as a function of the potential range and the cluster size. A new global minimum structure for long-ranged ρ = 3) Morse potential at the cluster size of n= 240 is reported. The present results are useful to assess the maximum cluster size for each type of interaction where the global minimum can be discovered with a limited number of big-bang trials. Copyright © 2011 Wiley Periodicals, Inc.
Clustering methods for the optimization of atomic cluster structure

NASA Astrophysics Data System (ADS)

Bagattini, Francesco; Schoen, Fabio; Tigli, Luca

2018-04-01

In this paper, we propose a revised global optimization method and apply it to large scale cluster conformation problems. In the 1990s, the so-called clustering methods were considered among the most efficient general purpose global optimization techniques; however, their usage has quickly declined in recent years, mainly due to the inherent difficulties of clustering approaches in large dimensional spaces. Inspired from the machine learning literature, we redesigned clustering methods in order to deal with molecular structures in a reduced feature space. Our aim is to show that by suitably choosing a good set of geometrical features coupled with a very efficient descent method, an effective optimization tool is obtained which is capable of finding, with a very high success rate, all known putative optima for medium size clusters without any prior information, both for Lennard-Jones and Morse potentials. The main result is that, beyond being a reliable approach, the proposed method, based on the idea of starting a computationally expensive deep local search only when it seems worth doing so, is capable of saving a huge amount of searches with respect to an analogous algorithm which does not employ a clustering phase. In this paper, we are not claiming the superiority of the proposed method compared to specific, refined, state-of-the-art procedures, but rather indicating a quite straightforward way to save local searches by means of a clustering scheme working in a reduced variable space, which might prove useful when included in many modern methods.
Spatial location influences vocal interactions in bullfrog choruses

PubMed Central

Bates, Mary E.; Cropp, Brett F.; Gonchar, Marina; Knowles, Jeffrey; Simmons, James A.; Simmons, Andrea Megela

2010-01-01

A multiple sensor array was employed to identify the spatial locations of all vocalizing male bullfrogs (Rana catesbeiana) in five natural choruses. Patterns of vocal activity collected with this array were compared with computer simulations of chorus activity. Bullfrogs were not randomly spaced within choruses, but tended to cluster into closely spaced groups of two to five vocalizing males. There were nonrandom, differing patterns of vocal interactions within clusters of closely spaced males and between different clusters. Bullfrogs located within the same cluster tended to overlap or alternate call notes with two or more other males in that cluster. These near-simultaneous calling bouts produced advertisement calls with more pronounced amplitude modulation than occurred in nonoverlapping notes or calls. Bullfrogs located in different clusters more often alternated entire calls or overlapped only small segments of their calls. They also tended to respond sequentially to calls of their farther neighbors compared to their nearer neighbors. Results of computational analyses showed that the observed patterns of vocal interactions were significantly different than expected based on random activity. The use of a multiple sensor array provides a richer view of the dynamics of choruses than available based on single microphone techniques. PMID:20370047

Hierarchical clustering using mutual information

NASA Astrophysics Data System (ADS)

Kraskov, A.; Stögbauer, H.; Andrzejak, R. G.; Grassberger, P.

2005-04-01

We present a conceptually simple method for hierarchical clustering of data called mutual information clustering (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). We use this both in the Shannon (probabilistic) version of information theory and in the Kolmogorov (algorithmic) version. We apply our method to the construction of phylogenetic trees from mitochondrial DNA sequences and to the output of independent components analysis (ICA) as illustrated with the ECG of a pregnant woman.
Speckle reduction of OCT images using an adaptive cluster-based filtering

NASA Astrophysics Data System (ADS)

Adabi, Saba; Rashedi, Elaheh; Conforto, Silvia; Mehregan, Darius; Xu, Qiuyun; Nasiriavanaki, Mohammadreza

2017-02-01

Optical coherence tomography (OCT) has become a favorable device in the dermatology discipline due to its moderate resolution and penetration depth. OCT images however contain grainy pattern, called speckle, due to the broadband source that has been used in the configuration of OCT. So far, a variety of filtering techniques is introduced to reduce speckle in OCT images. Most of these methods are generic and can be applied to OCT images of different tissues. In this paper, we present a method for speckle reduction of OCT skin images. Considering the architectural structure of skin layers, it seems that a skin image can benefit from being segmented in to differentiable clusters, and being filtered separately in each cluster by using a clustering method and filtering methods such as Wiener. The proposed algorithm was tested on an optical solid phantom with predetermined optical properties. The algorithm was also tested on healthy skin images. The results show that the cluster-based filtering method can reduce the speckle and increase the signal-to-noise ratio and contrast while preserving the edges in the image.
Cooperative epidemics on multiplex networks.

PubMed

Azimi-Tafreshi, N

2016-04-01

The spread of one disease, in some cases, can stimulate the spreading of another infectious disease. Here, we treat analytically a symmetric coinfection model for spreading of two diseases on a two-layer multiplex network. We allow layer overlapping, but we assume that each layer is random and locally loopless. Infection with one of the diseases increases the probability of getting infected with the other. Using the generating function method, we calculate exactly the fraction of individuals infected with both diseases (so-called coinfected clusters) in the stationary state, as well as the epidemic spreading thresholds and the phase diagram of the model. With increasing cooperation, we observe a tricritical point and the type of transition changes from continuous to hybrid. Finally, we compare the coinfected clusters in the case of cooperating diseases with the so-called "viable" clusters in networks with dependencies.
Cooperative epidemics on multiplex networks

NASA Astrophysics Data System (ADS)

Azimi-Tafreshi, N.

2016-04-01

The spread of one disease, in some cases, can stimulate the spreading of another infectious disease. Here, we treat analytically a symmetric coinfection model for spreading of two diseases on a two-layer multiplex network. We allow layer overlapping, but we assume that each layer is random and locally loopless. Infection with one of the diseases increases the probability of getting infected with the other. Using the generating function method, we calculate exactly the fraction of individuals infected with both diseases (so-called coinfected clusters) in the stationary state, as well as the epidemic spreading thresholds and the phase diagram of the model. With increasing cooperation, we observe a tricritical point and the type of transition changes from continuous to hybrid. Finally, we compare the coinfected clusters in the case of cooperating diseases with the so-called "viable" clusters in networks with dependencies.
Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

PubMed

Shen, Chung-Wei; Chen, Yi-Hau

2018-03-13

We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
An improved method to detect correct protein folds using partial clustering.

PubMed

Zhou, Jianjun; Wishart, David S

2013-01-16

Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either C(α) RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.
An improved method to detect correct protein folds using partial clustering

PubMed Central

2013-01-01

Background Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. Results We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. Conclusions The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance. PMID:23323835
A graph-Laplacian-based feature extraction algorithm for neural spike sorting.

PubMed

Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos

2009-01-01

Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.
Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

NASA Astrophysics Data System (ADS)

Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

2017-03-01

Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.
Local Higher-Order Graph Clustering

PubMed Central

Yin, Hao; Benson, Austin R.; Leskovec, Jure; Gleich, David F.

2018-01-01

Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal motif conductance, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology. PMID:29770258
Examining the Effectiveness of Discriminant Function Analysis and Cluster Analysis in Species Identification of Male Field Crickets Based on Their Calling Songs

PubMed Central

Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

2013-01-01

Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6–7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification. PMID:24086666
Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

NASA Astrophysics Data System (ADS)

Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

2018-04-01

Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

PubMed

Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

2018-03-13

Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Image reconstruction of muon tomographic data using a density-based clustering method

NASA Astrophysics Data System (ADS)

Perry, Kimberly B.

Muons are subatomic particles capable of reaching the Earth's surface before decaying. When these particles collide with an object that has a high atomic number (Z), their path of travel changes substantially. Tracking muon movement through shielded containers can indicate what types of materials lie inside. This thesis proposes using a density-based clustering algorithm called OPTICS to perform image reconstructions using muon tomographic data. The results show that this method is capable of detecting high-Z materials quickly, and can also produce detailed reconstructions with large amounts of data.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.

PubMed

Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

2015-01-01

Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.
Nanoscale thin film growth of Au on Si(111)-7 × 7 surface by pulsed laser deposition method

NASA Astrophysics Data System (ADS)

Yokotani, Atsushi; Kameyama, Akihiro; Nakayoshi, Kohei; Matsunaga, Yuta

2017-03-01

To obtain important information for fabricating atomic-scale Au thin films that are used for biosensors, we have observed the morphology of Au particles adsorbed on a Si(111)-7 × 7 surface, which is supposed to be the initial stage of Au atomistic thin film formation. Au particles were adsorbed on the clean Si surface using a PLD method, and the adsorbed particles were observed using a scanning tunneling microscope. As the number of laser shots was increased in the PLD method, the size of the adsorbed particle became larger. The larger particles seemed to form clusters, which are aggregations of particles in which each particle is distinguished, so we call this type of cluster a film-shaped cluster. In this work, we have mainly analyzed this type of cluster. As a result the film-shaped clusters were found to have a structure of nearly monoatomic layers. The particles in the clusters were gathered closely in roughly a 3-fold structure with an inter particle distance of 0.864 nm. We propose a model for the cluster structure by modifying Au(111) face so that each observed particle consists of three Au atoms.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation

PubMed Central

Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

2015-01-01

Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Nonlinear dimensionality reduction of data lying on the multicluster manifold.

PubMed

Meng, Deyu; Leung, Yee; Fung, Tung; Xu, Zongben

2008-08-01

A new method, which is called decomposition-composition (D-C) method, is proposed for the nonlinear dimensionality reduction (NLDR) of data lying on the multicluster manifold. The main idea is first to decompose a given data set into clusters and independently calculate the low-dimensional embeddings of each cluster by the decomposition procedure. Based on the intercluster connections, the embeddings of all clusters are then composed into their proper positions and orientations by the composition procedure. Different from other NLDR methods for multicluster data, which consider associatively the intracluster and intercluster information, the D-C method capitalizes on the separate employment of the intracluster neighborhood structures and the intercluster topologies for effective dimensionality reduction. This, on one hand, isometrically preserves the rigid-body shapes of the clusters in the embedding process and, on the other hand, guarantees the proper locations and orientations of all clusters. The theoretical arguments are supported by a series of experiments performed on the synthetic and real-life data sets. In addition, the computational complexity of the proposed method is analyzed, and its efficiency is theoretically analyzed and experimentally demonstrated. Related strategies for automatic parameter selection are also examined.
Pandora Cluster Seen by Spitzer

NASA Image and Video Library

2016-09-28

This image of galaxy cluster Abell 2744, also called Pandora's Cluster, was taken by the Spitzer Space Telescope. The gravity of this galaxy cluster is strong enough that it acts as a lens to magnify images of more distant background galaxies. This technique is called gravitational lensing. The fuzzy blobs in this Spitzer image are the massive galaxies at the core of this cluster, but astronomers will be poring over the images in search of the faint streaks of light created where the cluster magnifies a distant background galaxy. The cluster is also being studied by NASA's Hubble Space Telescope and Chandra X-Ray Observatory in a collaboration called the Frontier Fields project. In this image, light from Spitzer's infrared channels is colored blue at 3.6 microns and green at 4.5 microns. http://photojournal.jpl.nasa.gov/catalog/PIA20920
Modulated Modularity Clustering as an Exploratory Tool for Functional Genomic Inference

PubMed Central

Stone, Eric A.; Ayroles, Julien F.

2009-01-01

In recent years, the advent of high-throughput assays, coupled with their diminishing cost, has facilitated a systems approach to biology. As a consequence, massive amounts of data are currently being generated, requiring efficient methodology aimed at the reduction of scale. Whole-genome transcriptional profiling is a standard component of systems-level analyses, and to reduce scale and improve inference clustering genes is common. Since clustering is often the first step toward generating hypotheses, cluster quality is critical. Conversely, because the validation of cluster-driven hypotheses is indirect, it is critical that quality clusters not be obtained by subjective means. In this paper, we present a new objective-based clustering method and demonstrate that it yields high-quality results. Our method, modulated modularity clustering (MMC), seeks community structure in graphical data. MMC modulates the connection strengths of edges in a weighted graph to maximize an objective function (called modularity) that quantifies community structure. The result of this maximization is a clustering through which tightly-connected groups of vertices emerge. Our application is to systems genetics, and we quantitatively compare MMC both to the hierarchical clustering method most commonly employed and to three popular spectral clustering approaches. We further validate MMC through analyses of human and Drosophila melanogaster expression data, demonstrating that the clusters we obtain are biologically meaningful. We show MMC to be effective and suitable to applications of large scale. In light of these features, we advocate MMC as a standard tool for exploration and hypothesis generation. PMID:19424432

Nearest neighbor-density-based clustering methods for large hyperspectral images

NASA Astrophysics Data System (ADS)

Cariou, Claude; Chehdi, Kacem

2017-10-01

We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral imaging sensor.
Sparse subspace clustering for data with missing entries and high-rank matrix completion.

PubMed

Fan, Jicong; Chow, Tommy W S

2017-09-01

Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Video based object representation and classification using multiple covariance matrices.

PubMed

Zhang, Yurong; Liu, Quan

2017-01-01

Video based object recognition and classification has been widely studied in computer vision and image processing area. One main issue of this task is to develop an effective representation for video. This problem can generally be formulated as image set representation. In this paper, we present a new method called Multiple Covariance Discriminative Learning (MCDL) for image set representation and classification problem. The core idea of MCDL is to represent an image set using multiple covariance matrices with each covariance matrix representing one cluster of images. Firstly, we use the Nonnegative Matrix Factorization (NMF) method to do image clustering within each image set, and then adopt Covariance Discriminative Learning on each cluster (subset) of images. At last, we adopt KLDA and nearest neighborhood classification method for image set classification. Promising experimental results on several datasets show the effectiveness of our MCDL method.
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.

PubMed

Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip

2014-11-01

This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.
GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts.

PubMed

Zheng, Hai-Tao; Borchert, Charles; Kim, Hong-Gee

2010-02-01

Concurrent with progress in biomedical sciences, an overwhelming of textual knowledge is accumulating in the biomedical literature. PubMed is the most comprehensive database collecting and managing biomedical literature. To help researchers easily understand collections of PubMed abstracts, numerous clustering methods have been proposed to group similar abstracts based on their shared features. However, most of these methods do not explore the semantic relationships among groupings of documents, which could help better illuminate the groupings of PubMed abstracts. To address this issue, we proposed an ontological clustering method called GOClonto for conceptualizing PubMed abstracts. GOClonto uses latent semantic analysis (LSA) and gene ontology (GO) to identify key gene-related concepts and their relationships as well as allocate PubMed abstracts based on these key gene-related concepts. Based on two PubMed abstract collections, the experimental results show that GOClonto is able to identify key gene-related concepts and outperforms the STC (suffix tree clustering) algorithm, the Lingo algorithm, the Fuzzy Ants algorithm, and the clustering based TRS (tolerance rough set) algorithm. Moreover, the two ontologies generated by GOClonto show significant informative conceptual structures.
Combining self-organizing mapping and supervised affinity propagation clustering approach to investigate functional brain networks involved in motor imagery and execution with fMRI measurements.

PubMed

Zhang, Jiang; Liu, Qi; Chen, Huafu; Yuan, Zhen; Huang, Jin; Deng, Lihua; Lu, Fengmei; Zhang, Junpeng; Wang, Yuqing; Wang, Mingwen; Chen, Liangyin

2015-01-01

Clustering analysis methods have been widely applied to identifying the functional brain networks of a multitask paradigm. However, the previously used clustering analysis techniques are computationally expensive and thus impractical for clinical applications. In this study a novel method, called SOM-SAPC that combines self-organizing mapping (SOM) and supervised affinity propagation clustering (SAPC), is proposed and implemented to identify the motor execution (ME) and motor imagery (MI) networks. In SOM-SAPC, SOM was first performed to process fMRI data and SAPC is further utilized for clustering the patterns of functional networks. As a result, SOM-SAPC is able to significantly reduce the computational cost for brain network analysis. Simulation and clinical tests involving ME and MI were conducted based on SOM-SAPC, and the analysis results indicated that functional brain networks were clearly identified with different response patterns and reduced computational cost. In particular, three activation clusters were clearly revealed, which include parts of the visual, ME and MI functional networks. These findings validated that SOM-SAPC is an effective and robust method to analyze the fMRI data with multitasks.
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

PubMed Central

Hallac, David; Vare, Sagar; Boyd, Stephen; Leskovec, Jure

2018-01-01

Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios. PMID:29770257
InCHlib - interactive cluster heatmap for web applications.

PubMed

Skuta, Ctibor; Bartůněk, Petr; Svozil, Daniel

2014-12-01

Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust . The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
Density-cluster NMA: A new protein decomposition technique for coarse-grained normal mode analysis.

PubMed

Demerdash, Omar N A; Mitchell, Julie C

2012-07-01

Normal mode analysis has emerged as a useful technique for investigating protein motions on long time scales. This is largely due to the advent of coarse-graining techniques, particularly Hooke's Law-based potentials and the rotational-translational blocking (RTB) method for reducing the size of the force-constant matrix, the Hessian. Here we present a new method for domain decomposition for use in RTB that is based on hierarchical clustering of atomic density gradients, which we call Density-Cluster RTB (DCRTB). The method reduces the number of degrees of freedom by 85-90% compared with the standard blocking approaches. We compared the normal modes from DCRTB against standard RTB using 1-4 residues in sequence in a single block, with good agreement between the two methods. We also show that Density-Cluster RTB and standard RTB perform well in capturing the experimentally determined direction of conformational change. Significantly, we report superior correlation of DCRTB with B-factors compared with 1-4 residue per block RTB. Finally, we show significant reduction in computational cost for Density-Cluster RTB that is nearly 100-fold for many examples. Copyright © 2012 Wiley Periodicals, Inc.
Automated tetraploid genotype calling by hierarchical clustering

USDA-ARS?s Scientific Manuscript database

SNP arrays are transforming breeding and genetics research for autotetraploids. To fully utilize these arrays, however, the relationship between signal intensity and allele dosage must be inferred independently for each marker. We developed an improved computational method to automate this process, ...
Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

PubMed

Williams, N J; Nasuto, S J; Saddy, J D

2015-07-30

The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.
Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health

PubMed Central

Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S. Stanley

2016-01-01

Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability. PMID:27213413
Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health.

PubMed

Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S Stanley

2016-05-18

Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability.
ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.

PubMed

Oluwadare, Oluwatosin; Cheng, Jianlin

2017-11-14

With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .
Identification of lethal cluster of genes in the yeast transcription network

NASA Astrophysics Data System (ADS)

Rho, K.; Jeong, H.; Kahng, B.

2006-05-01

Identification of essential or lethal genes would be one of the ultimate goals in drug designs. Here we introduce an in silico method to select the cluster with a high population of lethal genes, called lethal cluster, through microarray assay. We construct a gene transcription network based on the microarray expression level. Links are added one by one in the descending order of the Pearson correlation coefficients between two genes. As the link density p increases, two meaningful link densities pm and ps are observed. At pm, which is smaller than the percolation threshold, the number of disconnected clusters is maximum, and the lethal genes are highly concentrated in a certain cluster that needs to be identified. Thus the deletion of all genes in that cluster could efficiently lead to a lethal inviable mutant. This lethal cluster can be identified by an in silico method. As p increases further beyond the percolation threshold, the power law behavior in the degree distribution of a giant cluster appears at ps. We measure the degree of each gene at ps. With the information pertaining to the degrees of each gene at ps, we return to the point pm and calculate the mean degree of genes of each cluster. We find that the lethal cluster has the largest mean degree.
Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering.

PubMed

Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

2014-12-30

Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

PubMed

Bhattacharya, Anindya; De, Rajat K

2010-08-01

Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. Copyright 2010 Elsevier Inc. All rights reserved.
Using clustering and a modified classification algorithm for automatic text summarization

NASA Astrophysics Data System (ADS)

Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar

2013-01-01

In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.
Conversion events in gene clusters

PubMed Central

2011-01-01

Background Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments. Results To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at http://www.bx.psu.edu/miller_lab. Conclusions These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes. PMID:21798034
A Fast Projection-Based Algorithm for Clustering Big Data.

PubMed

Wu, Yun; He, Zhiquan; Lin, Hao; Zheng, Yufei; Zhang, Jingfen; Xu, Dong

2018-06-07

With the fast development of various techniques, more and more data have been accumulated with the unique properties of large size (tall) and high dimension (wide). The era of big data is coming. How to understand and discover new knowledge from these data has attracted more and more scholars' attention and has become the most important task in data mining. As one of the most important techniques in data mining, clustering analysis, a kind of unsupervised learning, could group a set data into objectives(clusters) that are meaningful, useful, or both. Thus, the technique has played very important role in knowledge discovery in big data. However, when facing the large-sized and high-dimensional data, most of the current clustering methods exhibited poor computational efficiency and high requirement of computational source, which will prevent us from clarifying the intrinsic properties and discovering the new knowledge behind the data. Based on this consideration, we developed a powerful clustering method, called MUFOLD-CL. The principle of the method is to project the data points to the centroid, and then to measure the similarity between any two points by calculating their projections on the centroid. The proposed method could achieve linear time complexity with respect to the sample size. Comparison with K-Means method on very large data showed that our method could produce better accuracy and require less computational time, demonstrating that the MUFOLD-CL can serve as a valuable tool, at least may play a complementary role to other existing methods, for big data clustering. Further comparisons with state-of-the-art clustering methods on smaller datasets showed that our method was fastest and achieved comparable accuracy. For the convenience of most scholars, a free soft package was constructed.

ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering.

PubMed

Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier

2015-02-22

Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.
AntiClustal: Multiple Sequence Alignment by antipole clustering and linear approximate 1-median computation.

PubMed

Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V

2003-01-01

In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.
Effects of weather conditions on emergency ambulance calls for acute coronary syndromes

NASA Astrophysics Data System (ADS)

Vencloviene, Jone; Babarskiene, Ruta; Dobozinskas, Paulius; Siurkaite, Viktorija

2015-08-01

The aim of this study was to evaluate the relationship between weather conditions and daily emergency ambulance calls for acute coronary syndromes (ACS). The study included data on 3631 patients who called the ambulance for chest pain and were admitted to the department of cardiology as patients with ACS. We investigated the effect of daily air temperature ( T), barometric pressure (BP), relative humidity, and wind speed (WS) to detect the risk areas for low and high daily volume (DV) of emergency calls. We used the classification and regression tree method as well as cluster analysis. The clusters were created by applying the k-means cluster algorithm using the standardized daily weather variables. The analysis was performed separately during cold (October-April) and warm (May-September) seasons. During the cold period, the greatest DV was observed on days of low T during the 3-day sequence, on cold and windy days, and on days of low BP and high WS during the 3-day sequence; low DV was associated with high BP and decreased WS on the previous day. During June-September, a lower DV was associated with low BP, windless days, and high BP and low WS during the 3-day sequence. During the warm period, the greatest DV was associated with increased BP and changing WS during the 3-day sequence. These results suggest that daily T, BP, and WS on the day of the ambulance call and on the two previous days may be prognostic variables for the risk of ACS.
Defining functioning levels in patients with schizophrenia: A combination of a novel clustering method and brain SPECT analysis.

PubMed

Catherine, Faget-Agius; Aurélie, Vincenti; Eric, Guedj; Pierre, Michel; Raphaëlle, Richieri; Marine, Alessandrini; Pascal, Auquier; Christophe, Lançon; Laurent, Boyer

2017-12-30

This study aims to define functioning levels of patients with schizophrenia by using a method of interpretable clustering based on a specific functioning scale, the Functional Remission Of General Schizophrenia (FROGS) scale, and to test their validity regarding clinical and neuroimaging characterization. In this observational study, patients with schizophrenia have been classified using a hierarchical top-down method called clustering using unsupervised binary trees (CUBT). Socio-demographic, clinical, and neuroimaging SPECT perfusion data were compared between the different clusters to ensure their clinical relevance. A total of 242 patients were analyzed. A four-group functioning level structure has been identified: 54 are classified as "minimal", 81 as "low", 64 as "moderate", and 43 as "high". The clustering shows satisfactory statistical properties, including reproducibility and discriminancy. The 4 clusters consistently differentiate patients. "High" functioning level patients reported significantly the lowest scores on the PANSS and the CDSS, and the highest scores on the GAF, the MARS and S-QoL 18. Functioning levels were significantly associated with cerebral perfusion of two relevant areas: the left inferior parietal cortex and the anterior cingulate. Our study provides relevant functioning levels in schizophrenia, and may enhance the use of functioning scale. Copyright © 2017 Elsevier B.V. All rights reserved.
Comparison of Poisson and Bernoulli spatial cluster analyses of pediatric injuries in a fire district

PubMed Central

Warden, Craig R

2008-01-01

Background With limited resources available, injury prevention efforts need to be targeted both geographically and to specific populations. As part of a pediatric injury prevention project, data was obtained on all pediatric medical and injury incidents in a fire district to evaluate geographical clustering of pediatric injuries. This will be the first step in attempting to prevent these injuries with specific interventions depending on locations and mechanisms. Results There were a total of 4803 incidents involving patients less than 15 years of age that the fire district responded to during 2001–2005 of which 1997 were categorized as injuries and 2806 as medical calls. The two cohorts (injured versus medical) differed in age distribution (7.7 ± 4.4 years versus 5.4 ± 4.8 years, p < 0.001) and location type of incident (school or church 12% versus 15%, multifamily residence 22% versus 13%, single family residence 51% versus 28%, sport, park or recreational facility 3% versus 8%, public building 8% versus 7%, and street or road 3% versus 30%, respectively, p < 0.001). Using the medical incident locations as controls, there was no significant clustering for environmental or assault injuries using the Bernoulli method while there were four significant clusters for all injury mechanisms combined, 13 clusters for motor vehicle collisions, one for falls, and two for pedestrian or bicycle injuries. Using the Poisson cluster method on incidence rates by census tract identified four clusters for all injuries, three for motor vehicle collisions, four for fall injuries, and one each for environmental and assault injuries. The two detection methods shared a minority of overlapping geographical clusters. Conclusion Significant clustering occurs overall for all injury mechanisms combined and for each mechanism depending on the cluster detection method used. There was some overlap in geographic clusters identified by both methods. The Bernoulli method allows more focused cluster mapping and evaluation since it directly uses location data. Once clusters are found, interventions can be targeted to specific geographic locations, location types, ages of victims, and mechanisms of injury. PMID:18808720
Comparison of Poisson and Bernoulli spatial cluster analyses of pediatric injuries in a fire district.

PubMed

Warden, Craig R

2008-09-22

With limited resources available, injury prevention efforts need to be targeted both geographically and to specific populations. As part of a pediatric injury prevention project, data was obtained on all pediatric medical and injury incidents in a fire district to evaluate geographical clustering of pediatric injuries. This will be the first step in attempting to prevent these injuries with specific interventions depending on locations and mechanisms. There were a total of 4803 incidents involving patients less than 15 years of age that the fire district responded to during 2001-2005 of which 1997 were categorized as injuries and 2806 as medical calls. The two cohorts (injured versus medical) differed in age distribution (7.7 +/- 4.4 years versus 5.4 +/- 4.8 years, p < 0.001) and location type of incident (school or church 12% versus 15%, multifamily residence 22% versus 13%, single family residence 51% versus 28%, sport, park or recreational facility 3% versus 8%, public building 8% versus 7%, and street or road 3% versus 30%, respectively, p < 0.001). Using the medical incident locations as controls, there was no significant clustering for environmental or assault injuries using the Bernoulli method while there were four significant clusters for all injury mechanisms combined, 13 clusters for motor vehicle collisions, one for falls, and two for pedestrian or bicycle injuries. Using the Poisson cluster method on incidence rates by census tract identified four clusters for all injuries, three for motor vehicle collisions, four for fall injuries, and one each for environmental and assault injuries. The two detection methods shared a minority of overlapping geographical clusters. Significant clustering occurs overall for all injury mechanisms combined and for each mechanism depending on the cluster detection method used. There was some overlap in geographic clusters identified by both methods. The Bernoulli method allows more focused cluster mapping and evaluation since it directly uses location data. Once clusters are found, interventions can be targeted to specific geographic locations, location types, ages of victims, and mechanisms of injury.
A segmentation/clustering model for the analysis of array CGH data.

PubMed

Picard, F; Robin, S; Lebarbier, E; Daudin, J-J

2007-09-01

Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.

PubMed

Bourobou, Serge Thomas Mickala; Yoo, Younghwan

2015-05-21

This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
VarBin, a novel method for classifying true and false positive variants in NGS data

PubMed Central

2013-01-01

Background Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction. Methods VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4). Results To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4. Conclusions These data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2. PMID:24266885
Complete characterization of the stability of cluster synchronization in complex dynamical networks.

PubMed

Sorrentino, Francesco; Pecora, Louis M; Hagerstrom, Aaron M; Murphy, Thomas E; Roy, Rajarshi

2016-04-01

Synchronization is an important and prevalent phenomenon in natural and engineered systems. In many dynamical networks, the coupling is balanced or adjusted to admit global synchronization, a condition called Laplacian coupling. Many networks exhibit incomplete synchronization, where two or more clusters of synchronization persist, and computational group theory has recently proved to be valuable in discovering these cluster states based on the topology of the network. In the important case of Laplacian coupling, additional synchronization patterns can exist that would not be predicted from the group theory analysis alone. Understanding how and when clusters form, merge, and persist is essential for understanding collective dynamics, synchronization, and failure mechanisms of complex networks such as electric power grids, distributed control networks, and autonomous swarming vehicles. We describe a method to find and analyze all of the possible cluster synchronization patterns in a Laplacian-coupled network, by applying methods of computational group theory to dynamically equivalent networks. We present a general technique to evaluate the stability of each of the dynamically valid cluster synchronization patterns. Our results are validated in an optoelectronic experiment on a five-node network that confirms the synchronization patterns predicted by the theory.
Preparation of Gelatin Layer Film with Gold Clusters in Using Photographic Film

NASA Astrophysics Data System (ADS)

Kuge, Ken'ichi; Arisawa, Michiko; Aoki, Naokazu; Hasegawa, Akira

2000-12-01

A gelatin layer film with gold clusters is produced by taking advantage of the photosensitivity of silver halide photography. Through exposure silver specks, which are called latent-image specks and are composed of several reduced silver atoms, are formed on the surface of silver halide grains in the photographic film. As the latent-image specks act as a catalyst for redox reaction, reduced gold atoms are deposited on the latent-image specks when the exposed film is immersed in a gold (I) thiocyanate complex solution for 5-20 days. Subsequently, when the silver halide grains are dissolved and removed, the gelatin layer film with gold clusters remains. The film produced by this method is purple and showed an absorption spectrum having a maximum of approximately 560 nm as a result of plasmon absorption. The clusters continued to grow with immersion time, and the growth rate increased as the concentration of the gold complex solution was increased. The cluster diameter changed from 20 nm to 100 nm. By this method, it is possible to produce a gelatin film of a large area with evenly dispersed gold clusters, and since it is produced only on the exposed area, pattern forming is also possible.
A Deterministic Annealing Approach to Clustering AIRS Data

NASA Technical Reports Server (NTRS)

Guillaume, Alexandre; Braverman, Amy; Ruzmaikin, Alexander

2012-01-01

We will examine the validity of means and standard deviations as a basis for climate data products. We will explore the conditions under which these two simple statistics are inadequate summaries of the underlying empirical probability distributions by contrasting them with a nonparametric, method called Deterministic Annealing technique
Automated segmentation of white matter fiber bundles using diffusion tensor imaging data and a new density based clustering algorithm.

PubMed

Kamali, Tahereh; Stashuk, Daniel

2016-10-01

Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright © 2016 Elsevier B.V. All rights reserved.
Eb&D: A new clustering approach for signed social networks based on both edge-betweenness centrality and density of subgraphs

NASA Astrophysics Data System (ADS)

Qi, Xingqin; Song, Huimin; Wu, Jianliang; Fuller, Edgar; Luo, Rong; Zhang, Cun-Quan

2017-09-01

Clustering algorithms for unsigned social networks which have only positive edges have been studied intensively. However, when a network has like/dislike, love/hate, respect/disrespect, or trust/distrust relationships, unsigned social networks with only positive edges are inadequate. Thus we model such kind of networks as signed networks which can have both negative and positive edges. Detecting the cluster structures of signed networks is much harder than for unsigned networks, because it not only requires that positive edges within clusters are as many as possible, but also requires that negative edges between clusters are as many as possible. Currently, we have few clustering algorithms for signed networks, and most of them requires the number of final clusters as an input while it is actually hard to predict beforehand. In this paper, we will propose a novel clustering algorithm called Eb &D for signed networks, where both the betweenness of edges and the density of subgraphs are used to detect cluster structures. A hierarchically nested system will be constructed to illustrate the inclusion relationships of clusters. To show the validity and efficiency of Eb &D, we test it on several classical social networks and also hundreds of synthetic data sets, and all obtain better results compared with other methods. The biggest advantage of Eb &D compared with other methods is that the number of clusters do not need to be known prior.
eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes.

PubMed

Chang, Zheng; Wang, Zhenjia; Ashby, Cody; Zhou, Chuan; Li, Guojun; Zhang, Shuzhong; Huang, Xiuzhen

2014-01-01

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checker-board patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice.
eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes

PubMed Central

Chang, Zheng; Wang, Zhenjia; Ashby, Cody; Zhou, Chuan; Li, Guojun; Zhang, Shuzhong; Huang, Xiuzhen

2014-01-01

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checker-board patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice. PMID:25374455
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kang, Hyeonggon; Attota, Ravikiran, E-mail: ravikiran.attota@nist.gov; Tondare, Vipin

We present a method that uses conventional optical microscopes to determine the number of nanoparticles in a cluster, which is typically not possible using traditional image-based optical methods due to the diffraction limit. The method, called through-focus scanning optical microscopy (TSOM), uses a series of optical images taken at varying focus levels to achieve this. The optical images cannot directly resolve the individual nanoparticles, but contain information related to the number of particles. The TSOM method makes use of this information to determine the number of nanoparticles in a cluster. Initial good agreement between the simulations and the measurements ismore » also presented. The TSOM method can be applied to fluorescent and non-fluorescent as well as metallic and non-metallic nano-scale materials, including soft materials, making it attractive for tag-less, high-speed, optical analysis of nanoparticles down to 45 nm diameter.« less
Stereotypy and variability of social calls among clustering female big-footed myotis (Myotis macrodactylus).

PubMed

Xiao, Yan-Hong; Wang, Lei; Hoyt, Joseph R; Jiang, Ting-Lei; Lin, Ai-Qing; Feng, Jiang

2018-03-18

Echolocating bats have developed advanced auditory perception systems, predominantly using acoustic signaling to communicate with each other. They can emit a diverse range of social calls in complex behavioral contexts. This study examined the vocal repertoire of five pregnant big-footed myotis bats (Myotis macrodactylus). In the process of clustering, the last individual to return to the colony (LI) emitted social calls that correlated with behavior, as recorded on a PC-based digital recorder. These last individuals could emit 10 simple monosyllabic and 27 complex multisyllabic types of calls, constituting four types of syllables. The social calls were composed of highly stereotyped syllables, hierarchically organized by a common set of syllables. However, intra-specific variation was also found in the number of syllables, syllable order and patterns of syllable repetition across call renditions. Data were obtained to characterize the significant individual differences that existed in the maximum frequency and duration of calls. Time taken to return to the roost was negatively associated with the diversity of social calls. Our findings indicate that variability in social calls may be an effective strategy taken by individuals during reintegration into clusters of female M. macrodactylus.
New mechanisms of cluster diffusion on metal fcc(100) surfaces

NASA Astrophysics Data System (ADS)

Trushin, Oleg; Salo, Petri; Alatalo, Matti; Ala-Nissila, Tapio

2001-03-01

We have studied atomic mechanisms of the diffusion of small clusters on the fcc(100) metal surfaces using semi-empirical and ab-initio molecular static calculations. Primary goal of these studies was to investigate possible many-body mechanisms of cluster motion which can contribute to low temperature crystal growth. We used embedded atom and Glue potentials in semi-empirical simulations of Cu and Al. Combination of the Nudged Elastic Band and Eigenvector Following methods allowed us to find all the possible transition paths for cluster movements on flat terrace. In case of Cu(001) we have found several new mechanisms for diffusion of clusters, including mechanisms called row-shearing and dimer-rotating in which a whole row inside an island moves according to a concerted jump and a dimer rotates at the periphery of an island, respectively. In some cases these mechanisms yield a lower energy barrier than the standard mechanisms.
Determination of the masses of globular clusters using proper motions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ninkovich, S.

1984-09-01

Published proper motions of stars in the fields of the globular clusters M 15, M 92, and M 13 (Cudworth, 1976 Cudworth and Monet, 1979) are compiled in tables and used to estimate the masses of the clusters by the method of Naumova and Ogorodnikov (1973). Masses of the order of 10 to the 8th solar mass are calculated, as compared to an M 13 mass of about 10 to the 6th solar mass determined by the virial theorem. The higher masses are considered indicative of the actual cluster masses despite the distortion introduced by the presence in the fieldmore » of stars not belonging to the clusters. It is suggested that the difference between these estimates and the smaller masses proposed by previous authors may represent unobservable peripheral dwarf stars or some invisible mass (like the so-called missing mass of the Galaxy).« less

Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

PubMed Central

Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

2013-01-01

As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950
Improving real-time efficiency of case-based reasoning for medical diagnosis.

PubMed

Park, Yoon-Joo

2014-01-01

Conventional case-based reasoning (CBR) does not perform efficiently for high volume dataset because of case-retrieval time. Some previous researches overcome this problem by clustering a case-base into several small groups, and retrieve neighbors within a corresponding group to a target case. However, this approach generally produces less accurate predictive performances than the conventional CBR. This paper suggests a new case-based reasoning method called the Clustering-Merging CBR (CM-CBR) which produces similar level of predictive performances than the conventional CBR with spending significantly less computational cost.
Clustangles: An Open Library for Clustering Angular Data.

PubMed

Sargsyan, Karen; Hua, Yun Hao; Lim, Carmay

2015-08-24

Dihedral angles are good descriptors of the numerous conformations visited by large, flexible systems, but their analysis requires directional statistics. A single package including the various multivariate statistical methods for angular data that accounts for the distinct topology of such data does not exist. Here, we present a lightweight standalone, operating-system independent package called Clustangles to fill this gap. Clustangles will be useful in analyzing the ever-increasing number of structures in the Protein Data Bank and clustering the copious conformations from increasingly long molecular dynamics simulations.
Dynamic Call Admission Control Scheme Based on Predictive User Mobility Behavior for Cellular Networks

NASA Astrophysics Data System (ADS)

Intarasothonchun, Silada; Thipchaksurat, Sakchai; Varakulsiripunth, Ruttikorn; Onozato, Yoshikuni

In this paper, we propose a modified scheme of MSODB and PMS, called Predictive User Mobility Behavior (PUMB) to improve performance of resource reservation and call admission control for cellular networks. This algorithm is proposed in which bandwidth is allocated more efficiently to neighboring cells by key mobility parameters in order to provide QoS guarantees for transferring traffic. The probability is used to form a cluster of cells and the shadow cluster, where a mobile unit is likely to visit. When a mobile unit may change the direction and migrate to the cell that does not belong to its shadow cluster, we can support it by making efficient use of predicted nonconforming call. Concomitantly, to ensure continuity of on-going calls with better utilization of resources, bandwidth is borrowed from predicted nonconforming calls and existing adaptive calls without affecting the minimum QoS guarantees. The performance of the PUMB is demonstrated by simulation results in terms of new call blocking probability, handoff call dropping probability, bandwidth utilization, call successful probability, and overhead message transmission when arrival rate and speed of mobile units are varied. Our results show that PUMB provides the better performances comparing with those of MSODB and PMS under different traffic conditions.
NoFold: RNA structure clustering without folding or alignment.

PubMed

Middleton, Sarah A; Kim, Junhyong

2014-11-01

Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Biclustering of gene expression data using reactive greedy randomized adaptive search procedure.

PubMed

Dharan, Smitha; Nair, Achuthsankar S

2009-01-30

Biclustering algorithms belong to a distinct class of clustering algorithms that perform simultaneous clustering of both rows and columns of the gene expression matrix and can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse. Cheng and Church have introduced a measure called mean squared residue score to evaluate the quality of a bicluster and has become one of the most popular measures to search for biclusters. In this paper, we review basic concepts of the metaheuristics Greedy Randomized Adaptive Search Procedure (GRASP)-construction and local search phases and propose a new method which is a variant of GRASP called Reactive Greedy Randomized Adaptive Search Procedure (Reactive GRASP) to detect significant biclusters from large microarray datasets. The method has two major steps. First, high quality bicluster seeds are generated by means of k-means clustering. In the second step, these seeds are grown using the Reactive GRASP, in which the basic parameter that defines the restrictiveness of the candidate list is self-adjusted, depending on the quality of the solutions found previously. We performed statistical and biological validations of the biclusters obtained and evaluated the method against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results indicate that the Reactive GRASP approach outperforms the basic GRASP algorithm and Cheng and Church approach. The Reactive GRASP approach for the detection of significant biclusters is robust and does not require calibration efforts.
Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects

PubMed Central

Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

2017-01-01

An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called “cluster randomization”). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies. PMID:28584874
Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects.

PubMed

Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

2017-01-01

An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called "cluster randomization"). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies.
Graph-based analysis of kinetics on multidimensional potential-energy surfaces.

PubMed

Okushima, T; Niiyama, T; Ikeda, K S; Shimizu, Y

2009-09-01

The aim of this paper is twofold: one is to give a detailed description of an alternative graph-based analysis method, which we call saddle connectivity graph, for analyzing the global topography and the dynamical properties of many-dimensional potential-energy landscapes and the other is to give examples of applications of this method in the analysis of the kinetics of realistic systems. A Dijkstra-type shortest path algorithm is proposed to extract dynamically dominant transition pathways by kinetically defining transition costs. The applicability of this approach is first confirmed by an illustrative example of a low-dimensional random potential. We then show that a coarse-graining procedure tailored for saddle connectivity graphs can be used to obtain the kinetic properties of 13- and 38-atom Lennard-Jones clusters. The coarse-graining method not only reduces the complexity of the graphs, but also, with iterative use, reveals a self-similar hierarchical structure in these clusters. We also propose that the self-similarity is common to many-atom Lennard-Jones clusters.
GEsture: an online hand-drawing tool for gene expression pattern search.

PubMed

Wang, Chunyan; Xu, Yiqing; Wang, Xuelin; Zhang, Li; Wei, Suyun; Ye, Qiaolin; Zhu, Youxiang; Yin, Hengfu; Nainwal, Manoj; Tanon-Reyes, Luis; Cheng, Feng; Yin, Tongming; Ye, Ning

2018-01-01

Gene expression profiling data provide useful information for the investigation of biological function and process. However, identifying a specific expression pattern from extensive time series gene expression data is not an easy task. Clustering, a popular method, is often used to classify similar expression genes, however, genes with a 'desirable' or 'user-defined' pattern cannot be efficiently detected by clustering methods. To address these limitations, we developed an online tool called GEsture. Users can draw, or graph a curve using a mouse instead of inputting abstract parameters of clustering methods. GEsture explores genes showing similar, opposite and time-delay expression patterns with a gene expression curve as input from time series datasets. We presented three examples that illustrate the capacity of GEsture in gene hunting while following users' requirements. GEsture also provides visualization tools (such as expression pattern figure, heat map and correlation network) to display the searching results. The result outputs may provide useful information for researchers to understand the targets, function and biological processes of the involved genes.
Tracing the Arms of our Milky Way Galaxy

NASA Image and Video Library

2015-06-03

Astronomers using data from NASA's Wide-field Infrared Survey Explorer, or WISE, are helping to trace the shape of our Milky Way galaxy's spiral arms. This illustration shows where WISE data revealed clusters of young stars shrouded in dust, called embedded clusters, which are known to reside in spiral arms. The bars represent uncertainties in the data. The nearly 100 clusters shown here were found in the arms called Perseus, Sagittarius-Carina, and Outer -- three of the galaxy's four proposed primary arms. Our sun resides in a spur to an arm, or a minor arm, called Orion Cygnus. http://photojournal.jpl.nasa.gov/catalog/PIA19341
Adaptive Localization of Focus Point Regions via Random Patch Probabilistic Density from Whole-Slide, Ki-67-Stained Brain Tumor Tissue

PubMed Central

Alomari, Yazan M.; MdZin, Reena Rahayu

2015-01-01

Analysis of whole-slide tissue for digital pathology images has been clinically approved to provide a second opinion to pathologists. Localization of focus points from Ki-67-stained histopathology whole-slide tissue microscopic images is considered the first step in the process of proliferation rate estimation. Pathologists use eye pooling or eagle-view techniques to localize the highly stained cell-concentrated regions from the whole slide under microscope, which is called focus-point regions. This procedure leads to a high variety of interpersonal observations and time consuming, tedious work and causes inaccurate findings. The localization of focus-point regions can be addressed as a clustering problem. This paper aims to automate the localization of focus-point regions from whole-slide images using the random patch probabilistic density method. Unlike other clustering methods, random patch probabilistic density method can adaptively localize focus-point regions without predetermining the number of clusters. The proposed method was compared with the k-means and fuzzy c-means clustering methods. Our proposed method achieves a good performance, when the results were evaluated by three expert pathologists. The proposed method achieves an average false-positive rate of 0.84% for the focus-point region localization error. Moreover, regarding RPPD used to localize tissue from whole-slide images, 228 whole-slide images have been tested; 97.3% localization accuracy was achieved. PMID:25793010
A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

PubMed

Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

2014-12-01

In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
CMOS: Efficient Clustered Data Monitoring in Sensor Networks

PubMed Central

2013-01-01

Tiny and smart sensors enable applications that access a network of hundreds or thousands of sensors. Thus, recently, many researchers have paid attention to wireless sensor networks (WSNs). The limitation of energy is critical since most sensors are battery-powered and it is very difficult to replace batteries in cases that sensor networks are utilized outdoors. Data transmission between sensor nodes needs more energy than computation in a sensor node. In order to reduce the energy consumption of sensors, we present an approximate data gathering technique, called CMOS, based on the Kalman filter. The goal of CMOS is to efficiently obtain the sensor readings within a certain error bound. In our approach, spatially close sensors are grouped as a cluster. Since a cluster header generates approximate readings of member nodes, a user query can be answered efficiently using the cluster headers. In addition, we suggest an energy efficient clustering method to distribute the energy consumption of cluster headers. Our simulation results with synthetic data demonstrate the efficiency and accuracy of our proposed technique. PMID:24459444
CMOS: efficient clustered data monitoring in sensor networks.

PubMed

Min, Jun-Ki

2013-01-01

Tiny and smart sensors enable applications that access a network of hundreds or thousands of sensors. Thus, recently, many researchers have paid attention to wireless sensor networks (WSNs). The limitation of energy is critical since most sensors are battery-powered and it is very difficult to replace batteries in cases that sensor networks are utilized outdoors. Data transmission between sensor nodes needs more energy than computation in a sensor node. In order to reduce the energy consumption of sensors, we present an approximate data gathering technique, called CMOS, based on the Kalman filter. The goal of CMOS is to efficiently obtain the sensor readings within a certain error bound. In our approach, spatially close sensors are grouped as a cluster. Since a cluster header generates approximate readings of member nodes, a user query can be answered efficiently using the cluster headers. In addition, we suggest an energy efficient clustering method to distribute the energy consumption of cluster headers. Our simulation results with synthetic data demonstrate the efficiency and accuracy of our proposed technique.
A Stationary Wavelet Entropy-Based Clustering Approach Accurately Predicts Gene Expression

PubMed Central

Nguyen, Nha; Vo, An; Choi, Inchan

2015-01-01

Abstract Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation. PMID:25383910
An Optimal Method for Detecting Internal and External Intrusion in MANET

NASA Astrophysics Data System (ADS)

Rafsanjani, Marjan Kuchaki; Aliahmadipour, Laya; Javidi, Mohammad M.

Mobile Ad hoc Network (MANET) is formed by a set of mobile hosts which communicate among themselves through radio waves. The hosts establish infrastructure and cooperate to forward data in a multi-hop fashion without a central administration. Due to their communication type and resources constraint, MANETs are vulnerable to diverse types of attacks and intrusions. In this paper, we proposed a method for prevention internal intruder and detection external intruder by using game theory in mobile ad hoc network. One optimal solution for reducing the resource consumption of detection external intruder is to elect a leader for each cluster to provide intrusion service to other nodes in the its cluster, we call this mode moderate mode. Moderate mode is only suitable when the probability of attack is low. Once the probability of attack is high, victim nodes should launch their own IDS to detect and thwart intrusions and we call robust mode. In this paper leader should not be malicious or selfish node and must detect external intrusion in its cluster with minimum cost. Our proposed method has three steps: the first step building trust relationship between nodes and estimation trust value for each node to prevent internal intrusion. In the second step we propose an optimal method for leader election by using trust value; and in the third step, finding the threshold value for notifying the victim node to launch its IDS once the probability of attack exceeds that value. In first and third step we apply Bayesian game theory. Our method due to using game theory, trust value and honest leader can effectively improve the network security, performance and reduce resource consumption.
Factors Influencing Teachers' Implementation of an Innovative Tobacco Prevention Curriculum for Multiethnic Youth: Project SPLASH

ERIC Educational Resources Information Center

Sy, Angela; Glanz, Karen

2008-01-01

Background: The effectiveness of school-based tobacco use prevention programs depends on proper implementation. This study examined factors associated with teachers' implementation of a smoking prevention curriculum in a cluster randomized trial called Project SPLASH (Smoking Prevention Launch Among Students in Hawaii). Methods: A process…
m-BIRCH: an online clustering approach for computer vision applications

NASA Astrophysics Data System (ADS)

Madan, Siddharth K.; Dana, Kristin J.

2015-03-01

We adapt a classic online clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), to incrementally cluster large datasets of features commonly used in multimedia and computer vision. We call the adapted version modified-BIRCH (m-BIRCH). The algorithm uses only a fraction of the dataset memory to perform clustering, and updates the clustering decisions when new data comes in. Modifications made in m-BIRCH enable data driven parameter selection and effectively handle varying density regions in the feature space. Data driven parameter selection automatically controls the level of coarseness of the data summarization. Effective handling of varying density regions is necessary to well represent the different density regions in data summarization. We use m-BIRCH to cluster 840K color SIFT descriptors, and 60K outlier corrupted grayscale patches. We use the algorithm to cluster datasets consisting of challenging non-convex clustering patterns. Our implementation of the algorithm provides an useful clustering tool and is made publicly available.
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

PubMed

Sim, Mikang; Kim, Jaebum

2015-02-01

The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.

User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm

PubMed Central

Bourobou, Serge Thomas Mickala; Yoo, Younghwan

2015-01-01

This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.

PubMed

Rieder, Vera; Schork, Karin U; Kerschke, Laura; Blank-Landeshammer, Bernhard; Sickmann, Albert; Rahnenführer, Jörg

2017-11-03

In proteomics, liquid chromatography-tandem mass spectrometry (LC-MS/MS) is established for identifying peptides and proteins. Duplicated spectra, that is, multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, as performed for instance by Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. We compare different clustering algorithms based on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, hierarchical clustering, DBSCAN, and connected components of a graph, as well as the new method N-Cluster. All algorithms are evaluated on real data with varied parameter settings. Cluster results are compared with each other and with peptide annotations based on validation measures such as purity. Quality control, regarding the detection of wrongly (un)annotated spectra, is discussed for exemplary resulting clusters. N-Cluster proves to be highly competitive. All clustering results benefit from the so-called DISMS2 filter that integrates additional information, for example, on precursor mass.
Multiscale Embedded Gene Co-expression Network Analysis

PubMed Central

Song, Won-Min; Zhang, Bin

2015-01-01

Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma. PMID:26618778
Multiscale Embedded Gene Co-expression Network Analysis.

PubMed

Song, Won-Min; Zhang, Bin

2015-11-01

Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.
Machine-learned cluster identification in high-dimensional data.

PubMed

Ultsch, Alfred; Lötsch, Jörn

2017-02-01

High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright Â© 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

USGS Publications Warehouse

Ellefsen, Karl J.; Smith, David

2016-01-01

Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.
Atomic dynamics and the problem of the structural stability of free clusters of solidified inert gases

NASA Astrophysics Data System (ADS)

Verkhovtseva, É. T.; Gospodarev, I. A.; Grishaev, A. V.; Kovalenko, S. I.; Solnyshkin, D. D.; Syrkin, E. S.; Feodos'ev, S. B.

2003-05-01

The dependence of the rms amplitudes of atoms in free clusters of solidified inert gases on the cluster size is investigated theoretically and experimentally. Free clusters are produced by homogeneous nucleation in an adiabatically expanding supersonic stream. Electron diffraction is used to measure the rms amplitudes of the atoms; the Jacobi-matrix method is used for theoretical calculations. A series of distinguishing features of the atomic dynamics of microclusters was found. This was necessary to determine the character of the formation and the stability conditions of the crystal structure. It wass shown that for clusters consisting of less than N˜103 atoms, as the cluster size decreases, the rms amplitudes grow much more rapidly than expected from the increase in the specific contribution of the surface. It is also established that an fcc structure of a free cluster, as a rule, contains twinning defects (nuclei of an hcp phase). One reason for the appearance of such defects is the so-called vertex instability (anomalously large oscillation amplitudes) of the atoms in coordination spheres.
Clustering methods applied in the detection of Ki67 hot-spots in whole tumor slide images: an efficient way to characterize heterogeneous tissue-based biomarkers.

PubMed

Lopez, Xavier Moles; Debeir, Olivier; Maris, Calliope; Rorive, Sandrine; Roland, Isabelle; Saerens, Marco; Salmon, Isabelle; Decaestecker, Christine

2012-09-01

Whole-slide scanners allow the digitization of an entire histological slide at very high resolution. This new acquisition technique opens a wide range of possibilities for addressing challenging image analysis problems, including the identification of tissue-based biomarkers. In this study, we use whole-slide scanner technology for imaging the proliferating activity patterns in tumor slides based on Ki67 immunohistochemistry. Faced with large images, pathologists require tools that can help them identify tumor regions that exhibit high proliferating activity, called "hot-spots" (HSs). Pathologists need tools that can quantitatively characterize these HS patterns. To respond to this clinical need, the present study investigates various clustering methods with the aim of identifying Ki67 HSs in whole tumor slide images. This task requires a method capable of identifying an unknown number of clusters, which may be highly variable in terms of shape, size, and density. We developed a hybrid clustering method, referred to as Seedlink. Compared to manual HS selections by three pathologists, we show that Seedlink provides an efficient way of detecting Ki67 HSs and improves the agreement among pathologists when identifying HSs. Copyright © 2012 International Society for Advancement of Cytometry.
Tweets clustering using latent semantic analysis

NASA Astrophysics Data System (ADS)

Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

2017-04-01

Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.
Towards enhancement of performance of K-means clustering using nature-inspired optimization algorithms.

PubMed

Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan

2014-01-01

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
Towards Enhancement of Performance of K-Means Clustering Using Nature-Inspired Optimization Algorithms

PubMed Central

Deb, Suash; Yang, Xin-She

2014-01-01

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

NASA Astrophysics Data System (ADS)

Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

2014-05-01

Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the minority samples and generated high total accuracy meanwhile. The proposed approach makes CBR useful in imbalanced forecasting.
Beluga whale, Delphinapterus leucas, vocalizations and their relation to behaviour in the Churchill River, Manitoba, Canada

NASA Astrophysics Data System (ADS)

Chmelnitsky, Elly Golda

The investigation of a species' repertoire and the contexts in which different calls are used is central to understanding vocal communication among animals. Beluga whale, Delphinapterus leucas, calls were classified and described in association with behaviours, from recordings collected in the Churchill River, Manitoba, during the summers of 2006-2008. Calls were subjectively classified based on sound and visual analysis into whistles (64.2% of total calls; 22 call types), pulsed or noisy calls (25.9%; 15 call types), and combined calls (9.9%; seven types). A hierarchical cluster analysis, using six call measurements as variables, separated whistles into 12 groups and results were compared to subjective classification. Beluga calls associated with social interactions, travelling, feeding, and interactions with the boat were described. Call type percentages, relative proportions of different whistle contours (shapes), average frequency, and call duration varied with behaviour. Generally, higher percentages of whistles, more broadband pulsed and noisy calls, and shorter calls (<0.49s) were produced during behaviours associated with higher levels of activity and/or apparent arousal. Information on call types, call characteristics, and behavioural context of calls can be used for automated detection and classification methods and in future studies on call meaning and function.
Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering.

PubMed

Peng, Xi; Yu, Zhiding; Yi, Zhang; Tang, Huajin

2017-04-01

Under the framework of graph-based learning, the key to robust subspace clustering and subspace learning is to obtain a good similarity graph that eliminates the effects of errors and retains only connections between the data points from the same subspace (i.e., intrasubspace data points). Recent works achieve good performance by modeling errors into their objective functions to remove the errors from the inputs. However, these approaches face the limitations that the structure of errors should be known prior and a complex convex problem must be solved. In this paper, we present a novel method to eliminate the effects of the errors from the projection space (representation) rather than from the input space. We first prove that l 1 -, l 2 -, l ∞ -, and nuclear-norm-based linear projection spaces share the property of intrasubspace projection dominance, i.e., the coefficients over intrasubspace data points are larger than those over intersubspace data points. Based on this property, we introduce a method to construct a sparse similarity graph, called L2-graph. The subspace clustering and subspace learning algorithms are developed upon L2-graph. We conduct comprehensive experiment on subspace learning, image clustering, and motion segmentation and consider several quantitative benchmarks classification/clustering accuracy, normalized mutual information, and running time. Results show that L2-graph outperforms many state-of-the-art methods in our experiments, including L1-graph, low rank representation (LRR), and latent LRR, least square regression, sparse subspace clustering, and locally linear representation.
Classification of attempted suicide by cluster analysis: A study of 888 suicide attempters presenting to the emergency department.

PubMed

Kim, Hyeyoung; Kim, Bora; Kim, Se Hyun; Park, C Hyung Keun; Kim, Eun Young; Ahn, Yong Min

2018-08-01

It is essential to understand the latent structure of the population of suicide attempters for effective suicide prevention. The aim of this study was to identify subgroups among Korean suicide attempters in terms of the details of the suicide attempt. A total of 888 people who attempted suicide and were subsequently treated in the emergency rooms of 17 medical centers between May and November of 2013 were included in the analysis. The variables assessed included demographic characteristics, clinical information, and details of the suicide attempt assessed by the Suicide Intent Scale (SIS) and Columbia-Suicide Severity Rating Scale (C-SSRS). Cluster analysis was performed using the Ward method. Of the participants, 85.4% (n = 758) fell into a cluster characterized by less planning, low lethality methods, and ambivalence towards death ("impulsive"). The other cluster (n = 130) involved a more severe and well-planned attempt, used highly lethal methods, and took more precautions to avoid being interrupted ("planned"). The first cluster was dominated by women, while the second cluster was associated more with men, older age, and physical illness. We only included participants who visited the emergency department after their suicide attempt and had no missing values for SIS or C-SSRS. Cluster analysis extracted two distinct subgroups of Korean suicide attempters showing different patterns of suicidal behaviors. Understanding that a significant portion of suicide attempts occur impulsively calls for new prevention strategies tailored to differing subgroup profiles. Copyright © 2018 Elsevier B.V. All rights reserved.
Biclustering of gene expression data using reactive greedy randomized adaptive search procedure

PubMed Central

Dharan, Smitha; Nair, Achuthsankar S

2009-01-01

Background Biclustering algorithms belong to a distinct class of clustering algorithms that perform simultaneous clustering of both rows and columns of the gene expression matrix and can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse. Cheng and Church have introduced a measure called mean squared residue score to evaluate the quality of a bicluster and has become one of the most popular measures to search for biclusters. In this paper, we review basic concepts of the metaheuristics Greedy Randomized Adaptive Search Procedure (GRASP)-construction and local search phases and propose a new method which is a variant of GRASP called Reactive Greedy Randomized Adaptive Search Procedure (Reactive GRASP) to detect significant biclusters from large microarray datasets. The method has two major steps. First, high quality bicluster seeds are generated by means of k-means clustering. In the second step, these seeds are grown using the Reactive GRASP, in which the basic parameter that defines the restrictiveness of the candidate list is self-adjusted, depending on the quality of the solutions found previously. Results We performed statistical and biological validations of the biclusters obtained and evaluated the method against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results indicate that the Reactive GRASP approach outperforms the basic GRASP algorithm and Cheng and Church approach. Conclusion The Reactive GRASP approach for the detection of significant biclusters is robust and does not require calibration efforts. PMID:19208127
Clustering Tree-structured Data on Manifold

PubMed Central

Lu, Na; Miao, Hongyu

2016-01-01

Tree-structured data usually contain both topological and geometrical information, and are necessarily considered on manifold instead of Euclidean space for appropriate data parameterization and analysis. In this study, we propose a novel tree-structured data parameterization, called Topology-Attribute matrix (T-A matrix), so the data clustering task can be conducted on matrix manifold. We incorporate the structure constraints embedded in data into the non-negative matrix factorization method to determine meta-trees from the T-A matrix, and the signature vector of each single tree can then be extracted by meta-tree decomposition. The meta-tree space turns out to be a cone space, in which we explore the distance metric and implement the clustering algorithm based on the concepts like Fréchet mean. Finally, the T-A matrix based clustering (TAMBAC) framework is evaluated and compared using both simulated data and real retinal images to illus trate its efficiency and accuracy. PMID:26660696
A revised moving cluster distance to the Pleiades open cluster

NASA Astrophysics Data System (ADS)

Galli, P. A. B.; Moraux, E.; Bouy, H.; Bouvier, J.; Olivares, J.; Teixeira, R.

2017-02-01

Context. The distance to the Pleiades open cluster has been extensively debated in the literature over several decades. Although different methods point to a discrepancy in the trigonometric parallaxes produced by the Hipparcos mission, the number of individual stars with known distances is still small compared to the number of cluster members to help solve this problem. Aims: We provide a new distance estimate for the Pleiades based on the moving cluster method, which will be useful to further discuss the so-called Pleiades distance controversy and compare it with the very precise parallaxes from the Gaia space mission. Methods: We apply a refurbished implementation of the convergent point search method to an updated census of Pleiades stars to calculate the convergent point position of the cluster from stellar proper motions. Then, we derive individual parallaxes for 64 cluster members using radial velocities compiled from the literature, and approximate parallaxes for another 1146 stars based on the spatial velocity of the cluster. This represents the largest sample of Pleiades stars with individual distances to date. Results: The parallaxes derived in this work are in good agreement with previous results obtained in different studies (excluding Hipparcos) for individual stars in the cluster. We report a mean parallax of 7.44 ± 0.08 mas and distance of pc that is consistent with the weighted mean of 135.0 ± 0.6 pc obtained from the non-Hipparcos results in the literature. Conclusions: Our result for the distance to the Pleiades open cluster is not consistent with the Hipparcos catalog, but favors the recent and more precise distance determination of 136.2 ± 1.2 pc obtained from Very Long Baseline Interferometry observations. It is also in good agreement with the mean distance of 133 ± 5 pc obtained from the first trigonometric parallaxes delivered by the Gaia satellite for the brightest cluster members in common with our sample. Full Table B.2 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A48
Whole-Genome Sequencing of Recent Listeria monocytogenes Isolates from Germany Reveals Population Structure and Disease Clusters.

PubMed

Halbedel, Sven; Prager, Rita; Fuchs, Stephan; Trost, Eva; Werner, Guido; Flieger, Antje

2018-06-01

Listeria monocytogenes causes foodborne outbreaks with high mortality. For improvement of outbreak cluster detection, the German consiliary laboratory for listeriosis implemented whole-genome sequencing (WGS) in 2015. A total of 424 human L. monocytogenes isolates collected in 2007 to 2017 were subjected to WGS and core-genome multilocus sequence typing (cgMLST). cgMLST grouped the isolates into 38 complexes, reflecting 4 known and 34 unknown disease clusters. Most of these complexes were confirmed by single nucleotide polymorphism (SNP) calling, but some were further differentiated. Interestingly, several cgMLST cluster types were further subtyped by pulsed-field gel electrophoresis, partly due to phage insertions in the accessory genome. Our results highlight the usefulness of cgMLST for routine cluster detection but also show that cgMLST complexes require validation by methods providing higher typing resolution. Twelve cgMLST clusters included recent cases, suggesting activity of the source. Therefore, the cgMLST nomenclature data presented here may support future public health actions. Copyright © 2018 American Society for Microbiology.
Implementation of Self Organizing Map (SOM) as decision support: Indonesian telematics services MSMEs empowerment

NASA Astrophysics Data System (ADS)

Tosida, E. T.; Maryana, S.; Thaheer, H.; Hardiani

2017-01-01

Information technology and communication (telematics) is one of the most rapidly developing business sectors in Indonesia. It has strategic position in its contribution towards planning and implementation of developmental, economics, social, politics and defence strategies in business, communication and education. Aid absorption for the national telecommunication SMEs is relatively low; therefore, improvement is needed using analysis on business support cluster of which basis is types of business. In the study, the business support cluster analysis is specifically implemented for Indonesian telecommunication service. The data for the business are obtained from the National Census of Economic (Susenas 2006). The method used to develop cluster model is an Artificial Neural Network (ANN) system called Self-Organizing Maps (SOM) algorithm. Based on Index of Davies Bouldin (IDB), the accuracy level of the cluster model is 0.37 or can be categorized as good. The cluster model is developed to find out telecommunication business clusters that has influence towards the national economy so that it is easier for the government to supervise telecommunication business.

Accurate recapture identification for genetic mark–recapture studies with error-tolerant likelihood-based match calling and sample clustering

USGS Publications Warehouse

Sethi, Suresh; Linden, Daniel; Wenburg, John; Lewis, Cara; Lemons, Patrick R.; Fuller, Angela K.; Hare, Matthew P.

2016-01-01

Error-tolerant likelihood-based match calling presents a promising technique to accurately identify recapture events in genetic mark–recapture studies by combining probabilities of latent genotypes and probabilities of observed genotypes, which may contain genotyping errors. Combined with clustering algorithms to group samples into sets of recaptures based upon pairwise match calls, these tools can be used to reconstruct accurate capture histories for mark–recapture modelling. Here, we assess the performance of a recently introduced error-tolerant likelihood-based match-calling model and sample clustering algorithm for genetic mark–recapture studies. We assessed both biallelic (i.e. single nucleotide polymorphisms; SNP) and multiallelic (i.e. microsatellite; MSAT) markers using a combination of simulation analyses and case study data on Pacific walrus (Odobenus rosmarus divergens) and fishers (Pekania pennanti). A novel two-stage clustering approach is demonstrated for genetic mark–recapture applications. First, repeat captures within a sampling occasion are identified. Subsequently, recaptures across sampling occasions are identified. The likelihood-based matching protocol performed well in simulation trials, demonstrating utility for use in a wide range of genetic mark–recapture studies. Moderately sized SNP (64+) and MSAT (10–15) panels produced accurate match calls for recaptures and accurate non-match calls for samples from closely related individuals in the face of low to moderate genotyping error. Furthermore, matching performance remained stable or increased as the number of genetic markers increased, genotyping error notwithstanding.
Hierarchical Regional Disparities and Potential Sector Identification Using Modified Agglomerative Clustering

NASA Astrophysics Data System (ADS)

Munandar, T. A.; Azhari; Mushdholifah, A.; Arsyad, L.

2017-03-01

Disparities in regional development methods are commonly identified using the Klassen Typology and Location Quotient. Both methods typically use the data on the gross regional domestic product (GRDP) sectors of a particular region. The Klassen approach can identify regional disparities by classifying the GRDP sector data into four classes, namely Quadrants I, II, III, and IV. Each quadrant indicates a certain level of regional disparities based on the GRDP sector value of the said region. Meanwhile, the Location Quotient (LQ) is usually used to identify potential sectors in a particular region so as to determine which sectors are potential and which ones are not potential. LQ classifies each sector into three classes namely, the basic sector, the non-basic sector with a competitive advantage, and the non-basic sector which can only meet its own necessities. Both Klassen Typology and LQ are unable to visualize the relationship of achievements in the development clearly of each region and sector. This research aimed to develop a new approach to the identification of disparities in regional development in the form of hierarchical clustering. The method of Hierarchical Agglomerative Clustering (HAC) was employed as the basis of the hierarchical clustering model for identifying disparities in regional development. Modifications were made to HAC using the Klassen Typology and LQ. Then, HAC which had been modified using the Klassen Typology was called MHACK while HAC which had been modified using LQ was called MACLoQ. Both algorithms can be used to identify regional disparities (MHACK) and potential sectors (MACLoQ), respectively, in the form of hierarchical clusters. Based on the MHACK in 31 regencies in Central Java Province, it is identified that 3 regencies (Demak, Jepara, and Magelang City) fall into the category of developed and rapidly-growing regions, while the other 28 regencies fall into the category of developed but depressed regions. Results of the MACLoQ implementation suggest that there is only 1 regency which falls into the basic-sector category (Banyumas), while the other regencies fall into the non-basic non-competitive sector category.
The accuracy of ab initio calculations without ab initio calculations for charged systems: Kriging predictions of atomistic properties for ions in aqueous solutions

NASA Astrophysics Data System (ADS)

Di Pasquale, Nicodemo; Davie, Stuart J.; Popelier, Paul L. A.

2018-06-01

Using the machine learning method kriging, we predict the energies of atoms in ion-water clusters, consisting of either Cl- or Na+ surrounded by a number of water molecules (i.e., without Na+Cl- interaction). These atomic energies are calculated following the topological energy partitioning method called Interacting Quantum Atoms (IQAs). Kriging predicts atomic properties (in this case IQA energies) by a model that has been trained over a small set of geometries with known property values. The results presented here are part of the development of an advanced type of force field, called FFLUX, which offers quantum mechanical information to molecular dynamics simulations without the limiting computational cost of ab initio calculations. The results reported for the prediction of the IQA components of the energy in the test set exhibit an accuracy of a few kJ/mol, corresponding to an average error of less than 5%, even when a large cluster of water molecules surrounding an ion is considered. Ions represent an important chemical system and this work shows that they can be correctly taken into account in the framework of the FFLUX force field.
Automatic reconstruction of fault networks from seismicity catalogs: Three-dimensional optimal anisotropic dynamic clustering

NASA Astrophysics Data System (ADS)

Ouillon, G.; Ducorbier, C.; Sornette, D.

2008-01-01

We propose a new pattern recognition method that is able to reconstruct the three-dimensional structure of the active part of a fault network using the spatial location of earthquakes. The method is a generalization of the so-called dynamic clustering (or k means) method, that partitions a set of data points into clusters, using a global minimization criterion of the variance of the hypocenters locations about their center of mass. The new method improves on the original k means method by taking into account the full spatial covariance tensor of each cluster in order to partition the data set into fault-like, anisotropic clusters. Given a catalog of seismic events, the output is the optimal set of plane segments that fits the spatial structure of the data. Each plane segment is fully characterized by its location, size, and orientation. The main tunable parameter is the accuracy of the earthquake locations, which fixes the resolution, i.e., the residual variance of the fit. The resolution determines the number of fault segments needed to describe the earthquake catalog: the better the resolution, the finer the structure of the reconstructed fault segments. The algorithm successfully reconstructs the fault segments of synthetic earthquake catalogs. Applied to the real catalog constituted of a subset of the aftershock sequence of the 28 June 1992 Landers earthquake in southern California, the reconstructed plane segments fully agree with faults already known on geological maps or with blind faults that appear quite obvious in longer-term catalogs. Future improvements of the method are discussed, as well as its potential use in the multiscale study of the inner structure of fault zones.
On the modification Highly Connected Subgraphs (HCS) algorithm in graph clustering for weighted graph

NASA Astrophysics Data System (ADS)

Albirri, E. R.; Sugeng, K. A.; Aldila, D.

2018-04-01

Nowadays, in the modern world, since technology and human civilization start to progress, all city in the world is almost connected. The various places in this world are easier to visit. It is an impact of transportation technology and highway construction. The cities which have been connected can be represented by graph. Graph clustering is one of ways which is used to answer some problems represented by graph. There are some methods in graph clustering to solve the problem spesifically. One of them is Highly Connected Subgraphs (HCS) method. HCS is used to identify cluster based on the graph connectivity k for graph G. The connectivity in graph G is denoted by k(G)> \\frac{n}{2} that n is the total of vertices in G, then it is called as HCS or the cluster. This research used literature review and completed with simulation of program in a software. We modified HCS algorithm by using weighted graph. The modification is located in the Process Phase. Process Phase is used to cut the connected graph G into two subgraphs H and \\bar{H}. We also made a program by using software Octave-401. Then we applied the data of Flight Routes Mapping of One of Airlines in Indonesia to our program.
Structural evolution in the crystallization of rapid cooling silver melt

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tian, Z.A., E-mail: ze.tian@gmail.com; Laboratory for Simulation and Modelling of Particulate Systems School of Materials Science and Engineering, University of New South Wales, Sydney, NSW 2052; Dong, K.J.

2015-03-15

The structural evolution in a rapid cooling process of silver melt has been investigated at different scales by adopting several analysis methods. The results testify Ostwald’s rule of stages and Frank conjecture upon icosahedron with many specific details. In particular, the cluster-scale analysis by a recent developed method called LSCA (the Largest Standard Cluster Analysis) clarified the complex structural evolution occurred in crystallization: different kinds of local clusters (such as ico-like (ico is the abbreviation of icosahedron), ico-bcc like (bcc, body-centred cubic), bcc, bcc-like structures) in turn have their maximal numbers as temperature decreases. And in a rather wide temperaturemore » range the icosahedral short-range order (ISRO) demonstrates a saturated stage (where the amount of ico-like structures keeps stable) that breeds metastable bcc clusters. As the precursor of crystallization, after reaching the maximal number bcc clusters finally decrease, resulting in the final solid being a mixture mainly composed of fcc/hcp (face-centred cubic and hexagonal-closed packed) clusters and to a less degree, bcc clusters. This detailed geometric picture for crystallization of liquid metal is believed to be useful to improve the fundamental understanding of liquid–solid phase transition. - Highlights: • A comprehensive structural analysis is conducted focusing on crystallization. • The involved atoms in our analysis are more than 90% for all samples concerned. • A series of distinct intermediate states are found in crystallization of silver melt. • A novelty icosahedron-saturated state breeds the metastable bcc state.« less
Thermalization as an invisibility cloak for fragile quantum superpositions

NASA Astrophysics Data System (ADS)

Hahn, Walter; Fine, Boris V.

2017-07-01

We propose a method for protecting fragile quantum superpositions in many-particle systems from dephasing by external classical noise. We call superpositions "fragile" if dephasing occurs particularly fast, because the noise couples very differently to the superposed states. The method consists of letting a quantum superposition evolve under the internal thermalization dynamics of the system, followed by a time-reversal manipulation known as Loschmidt echo. The thermalization dynamics makes the superposed states almost indistinguishable during most of the above procedure. We validate the method by applying it to a cluster of spins ½.
Location and Size Planning of Distributed Photovoltaic Generation in Distribution network System Based on K-means Clustering Analysis

NASA Astrophysics Data System (ADS)

Lu, Siqi; Wang, Xiaorong; Wu, Junyong

2018-01-01

The paper presents a method to generate the planning scenarios, which is based on K-means clustering analysis algorithm driven by data, for the location and size planning of distributed photovoltaic (PV) units in the network. Taken the power losses of the network, the installation and maintenance costs of distributed PV, the profit of distributed PV and the voltage offset as objectives and the locations and sizes of distributed PV as decision variables, Pareto optimal front is obtained through the self-adaptive genetic algorithm (GA) and solutions are ranked by a method called technique for order preference by similarity to an ideal solution (TOPSIS). Finally, select the planning schemes at the top of the ranking list based on different planning emphasis after the analysis in detail. The proposed method is applied to a 10-kV distribution network in Gansu Province, China and the results are discussed.
Accelerating Information Retrieval from Profile Hidden Markov Model Databases.

PubMed

Tamimi, Ahmad; Ashhab, Yaqoub; Tamimi, Hashem

2016-01-01

Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.
An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins.

PubMed

Harper, Angela F; Leuthaeuser, Janelle B; Babbitt, Patricia C; Morris, John H; Ferrin, Thomas E; Poole, Leslie B; Fetrow, Jacquelyn S

2017-02-01

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.
Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.

PubMed

Zhu, Lin; Chung, Fu-Lai; Wang, Shitong

2009-06-01

The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities.
Zeldovich pancakes in observational data are cold

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brinckmann, Thejs; Lindholmer, Mikkel; Hansen, Steen

The present day universe consists of galaxies, galaxy clusters, one-dimensional filaments and two-dimensional sheets or pancakes, all of which combine to form the cosmic web. The so called ''Zeldovich pancakes' are very difficult to observe, because their overdensity is only slightly greater than the average density of the universe. Falco et al. [1] presented a method to identify Zeldovich pancakes in observational data, and these were used as a tool for estimating the mass of galaxy clusters. Here we expand and refine that observational detection method. We study two pancakes on scales of 10 Mpc, identified from spectroscopically observed galaxiesmore » near the Coma cluster, and compare with twenty numerical pancakes.We find that the observed structures have velocity dispersions of about 100 km/sec, which is relatively low compared to typical groups and filaments. These velocity dispersions are consistent with those found for the numerical pancakes. We also confirm that the identified structures are in fact two-dimensional structures. Finally, we estimate the stellar to total mass of the observational pancakes to be 2 · 10{sup −4}, within one order of magnitude, which is smaller than that of clusters of galaxies.« less
Open-Source Sequence Clustering Methods Improve the State Of the Art.

PubMed

Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

2016-01-01

Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
The relative vertex clustering value - a new criterion for the fast discovery of functional modules in protein interaction networks

PubMed Central

2015-01-01

Background Cellular processes are known to be modular and are realized by groups of proteins implicated in common biological functions. Such groups of proteins are called functional modules, and many community detection methods have been devised for their discovery from protein interaction networks (PINs) data. In current agglomerative clustering approaches, vertices with just a very few neighbors are often classified as separate clusters, which does not make sense biologically. Also, a major limitation of agglomerative techniques is that their computational efficiency do not scale well to large PINs. Finally, PIN data obtained from large scale experiments generally contain many false positives, and this makes it hard for agglomerative clustering methods to find the correct clusters, since they are known to be sensitive to noisy data. Results We propose a local similarity premetric, the relative vertex clustering value, as a new criterion allowing to decide when a node can be added to a given node's cluster and which addresses the above three issues. Based on this criterion, we introduce a novel and very fast agglomerative clustering technique, FAC-PIN, for discovering functional modules and protein complexes from a PIN data. Conclusions Our proposed FAC-PIN algorithm is applied to nine PIN data from eight different species including the yeast PIN, and the identified functional modules are validated using Gene Ontology (GO) annotations from DAVID Bioinformatics Resources. Identified protein complexes are also validated using experimentally verified complexes. Computational results show that FAC-PIN can discover functional modules or protein complexes from PINs more accurately and more efficiently than HC-PIN and CNM, the current state-of-the-art approaches for clustering PINs in an agglomerative manner. PMID:25734691
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation

DOE PAGES

Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; ...

2016-11-24

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Self-Assembly of Octopus Nanoparticles into Pre-Programmed Finite Clusters

NASA Astrophysics Data System (ADS)

Halverson, Jonathan; Tkachenko, Alexei

2012-02-01

The precise control of the spatial arrangement of nanoparticles (NP) is often required to take full advantage of their novel optical and electronic properties. NPs have been shown to self-assemble into crystalline structures using either patchy surface regions or complementary DNA strands to direct the assembly. Due to a lack of specificity of the interactions these methods lead to only a limited number of structures. An emerging approach is to bind ssDNA at specific sites on the particle surface making so-called octopus NPs. Using octopus NPs we investigate the inverse problem of the self-assembly of finite clusters. That is, for a given target cluster (e.g., arranging the NPs on the vertices of a dodecahedron) what are the minimum number of complementary DNA strands needed for the robust self-assembly of the cluster from an initially homogeneous NP solution? Based on the results of Brownian dynamics simulations we have compiled a set of design rules for various target clusters including cubes, pyramids, dodecahedrons and truncated icosahedrons. Our approach leads to control over the kinetic pathway and has demonstrated nearly perfect yield of the target.
Using Fuzzy Clustering for Real-time Space Flight Safety

NASA Technical Reports Server (NTRS)

Lee, Charles; Haskell, Richard E.; Hanna, Darrin; Alena, Richard L.

2004-01-01

To ensure space flight safety, it is necessary to monitor myriad sensor readings on the ground and in flight. Since a space shuttle has many sensors, monitoring data and drawing conclusions from information contained within the data in real time is challenging. The nature of the information can be critical to the success of the mission and safety of the crew and therefore, must be processed with minimal data-processing time. Data analysis algorithms could be used to synthesize sensor readings and compare data associated with normal operation with the data obtained that contain fault patterns to draw conclusions. Detecting abnormal operation during early stages in the transition from safe to unsafe operation requires a large amount of historical data that can be categorized into different classes (non-risk, risk). Even though the 40 years of shuttle flight program has accumulated volumes of historical data, these data don t comprehensively represent all possible fault patterns since fault patterns are usually unknown before the fault occurs. This paper presents a method that uses a similarity measure between fuzzy clusters to detect possible faults in real time. A clustering technique based on a fuzzy equivalence relation is used to characterize temporal data. Data collected during an initial time period are separated into clusters. These clusters are characterized by their centroids. Clusters formed during subsequent time periods are either merged with an existing cluster or added to the cluster list. The resulting list of cluster centroids, called a cluster group, characterizes the behavior of a particular set of temporal data. The degree to which new clusters formed in a subsequent time period are similar to the cluster group is characterized by a similarity measure, q. This method is applied to downlink data from Columbia flights. The results show that this technique can detect an unexpected fault that has not been present in the training data set.
Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques

ERIC Educational Resources Information Center

Luan, Jing

2004-01-01

This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…
The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

PubMed

Bahlmann, Claus; Burkhardt, Hans

2004-03-01

In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

The DAFT/FADA Survey status and latest results

NASA Astrophysics Data System (ADS)

Guennou, L.

2011-12-01

We present here the latest results obtained from the American French collaboration called the Dark energy American French Team/French American DArk energy Team (DAFT/FADA). The goal of the DAFT/FADA collaboration is to carry out a weak lensing tomography survey of z = 0.4-0.9 rich clusters of galaxies. Unlike supernovae or other methods such as cluster of galaxy counts, weak lensing tomography is purely based on geometry and does not depend on knowledge of the physics of the objects used as distance indicators. In addition, the reason for analyzing observations in the direction of clusters is that the shear signal is enhanced by about 10 over the field. Our work will eventually contain results obtained on 91 rich clusters from the HST archive combined with ground based work to obtain photo-zs. This combination of photo-z and weak lensing tomography will enable us to constrain the equation of state of dark energy. We present here the latest results obtained so far in this study.
Using the clustered circular layout as an informative method for visualizing protein-protein interaction networks.

PubMed

Fung, David C Y; Wilkins, Marc R; Hart, David; Hong, Seok-Hee

2010-07-01

The force-directed layout is commonly used in computer-generated visualizations of protein-protein interaction networks. While it is good for providing a visual outline of the protein complexes and their interactions, it has two limitations when used as a visual analysis method. The first is poor reproducibility. Repeated running of the algorithm does not necessarily generate the same layout, therefore, demanding cognitive readaptation on the investigator's part. The second limitation is that it does not explicitly display complementary biological information, e.g. Gene Ontology, other than the protein names or gene symbols. Here, we present an alternative layout called the clustered circular layout. Using the human DNA replication protein-protein interaction network as a case study, we compared the two network layouts for their merits and limitations in supporting visual analysis.
Overlapping Community Detection based on Network Decomposition

NASA Astrophysics Data System (ADS)

Ding, Zhuanlian; Zhang, Xingyi; Sun, Dengdi; Luo, Bin

2016-04-01

Community detection in complex network has become a vital step to understand the structure and dynamics of networks in various fields. However, traditional node clustering and relatively new proposed link clustering methods have inherent drawbacks to discover overlapping communities. Node clustering is inadequate to capture the pervasive overlaps, while link clustering is often criticized due to the high computational cost and ambiguous definition of communities. So, overlapping community detection is still a formidable challenge. In this work, we propose a new overlapping community detection algorithm based on network decomposition, called NDOCD. Specifically, NDOCD iteratively splits the network by removing all links in derived link communities, which are identified by utilizing node clustering technique. The network decomposition contributes to reducing the computation time and noise link elimination conduces to improving the quality of obtained communities. Besides, we employ node clustering technique rather than link similarity measure to discover link communities, thus NDOCD avoids an ambiguous definition of community and becomes less time-consuming. We test our approach on both synthetic and real-world networks. Results demonstrate the superior performance of our approach both in computation time and accuracy compared to state-of-the-art algorithms.
Cluster assembly in nitrogenase.

PubMed

Sickerman, Nathaniel S; Rettberg, Lee A; Lee, Chi Chung; Hu, Yilin; Ribbe, Markus W

2017-05-09

The versatile enzyme system nitrogenase accomplishes the challenging reduction of N 2 and other substrates through the use of two main metalloclusters. For molybdenum nitrogenase, the catalytic component NifDK contains the [Fe 8 S 7 ]-core P-cluster and a [MoFe 7 S 9 C-homocitrate] cofactor called the M-cluster. These chemically unprecedented metalloclusters play a critical role in the reduction of N 2 , and both originate from [Fe 4 S 4 ] clusters produced by the actions of NifS and NifU. Maturation of P-cluster begins with a pair of these [Fe 4 S 4 ] clusters on NifDK called the P*-cluster. An accessory protein NifZ aids in P-cluster fusion, and reductive coupling is facilitated by NifH in a stepwise manner to form P-cluster on each half of NifDK. For M-cluster biosynthesis, two [Fe 4 S 4 ] clusters on NifB are coupled with a carbon atom in a radical-SAM dependent process, and concomitant addition of a 'ninth' sulfur atom generates the [Fe 8 S 9 C]-core L-cluster. On the scaffold protein NifEN, L-cluster is matured to M-cluster by the addition of Mo and homocitrate provided by NifH. Finally, matured M-cluster in NifEN is directly transferred to NifDK, where a conformational change locks the cofactor in place. Mechanistic insights into these fascinating biosynthetic processes are detailed in this chapter. © 2017 The Author(s). Published by Portland Press Limited on behalf of the Biochemical Society.
Automated modal parameter estimation using correlation analysis and bootstrap sampling

NASA Astrophysics Data System (ADS)

Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.

2018-02-01

The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins

PubMed Central

Babbitt, Patricia C.; Ferrin, Thomas E.

2017-01-01

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially—MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method’s novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences. PMID:28187133
Homogeneity Pursuit

PubMed Central

Ke, Tracy; Fan, Jianqing; Wu, Yichao

2014-01-01

This paper explores the homogeneity of coefficients in high-dimensional regression, which extends the sparsity concept and is more general and suitable for many applications. Homogeneity arises when regression coefficients corresponding to neighboring geographical regions or a similar cluster of covariates are expected to be approximately the same. Sparsity corresponds to a special case of homogeneity with a large cluster of known atom zero. In this article, we propose a new method called clustering algorithm in regression via data-driven segmentation (CARDS) to explore homogeneity. New mathematics are provided on the gain that can be achieved by exploring homogeneity. Statistical properties of two versions of CARDS are analyzed. In particular, the asymptotic normality of our proposed CARDS estimator is established, which reveals better estimation accuracy for homogeneous parameters than that without homogeneity exploration. When our methods are combined with sparsity exploration, further efficiency can be achieved beyond the exploration of sparsity alone. This provides additional insights into the power of exploring low-dimensional structures in high-dimensional regression: homogeneity and sparsity. Our results also shed lights on the properties of the fussed Lasso. The newly developed method is further illustrated by simulation studies and applications to real data. Supplementary materials for this article are available online. PMID:26085701
Metabolic network visualization eliminating node redundance and preserving metabolic pathways

PubMed Central

Bourqui, Romain; Cottret, Ludovic; Lacroix, Vincent; Auber, David; Mary, Patrick; Sagot, Marie-France; Jourdan, Fabien

2007-01-01

Background The tools that are available to draw and to manipulate the representations of metabolism are usually restricted to metabolic pathways. This limitation becomes problematic when studying processes that span several pathways. The various attempts that have been made to draw genome-scale metabolic networks are confronted with two shortcomings: 1- they do not use contextual information which leads to dense, hard to interpret drawings, 2- they impose to fit to very constrained standards, which implies, in particular, duplicating nodes making topological analysis considerably more difficult. Results We propose a method, called MetaViz, which enables to draw a genome-scale metabolic network and that also takes into account its structuration into pathways. This method consists in two steps: a clustering step which addresses the pathway overlapping problem and a drawing step which consists in drawing the clustered graph and each cluster. Conclusion The method we propose is original and addresses new drawing issues arising from the no-duplication constraint. We do not propose a single drawing but rather several alternative ways of presenting metabolism depending on the pathway on which one wishes to focus. We believe that this provides a valuable tool to explore the pathway structure of metabolism. PMID:17608928
Evolution of redback radio pulsars in globular clusters

NASA Astrophysics Data System (ADS)

Benvenuto, O. G.; De Vito, M. A.; Horvath, J. E.

2017-02-01

Context. We study the evolution of close binary systems composed of a normal, intermediate mass star and a neutron star considering a chemical composition typical of that present in globular clusters (Z = 0.001). Aims: We look for similarities and differences with respect to solar composition donor stars, which we have extensively studied in the past. As a definite example, we perform an application on one of the redbacks located in a globular cluster. Methods: We performed a detailed grid of models in order to find systems that represent the so-called redback binary radio pulsar systems with donor star masses between 0.6 and 2.0 solar masses and orbital periods in the range 0.2-0.9 d. Results: We find that the evolution of these binary systems is rather similar to those corresponding to solar composition objects, allowing us to account for the occurrence of redbacks in globular clusters, as the main physical ingredient is the irradiation feedback. Redback systems are in the quasi-RLOF state, that is, almost filling their corresponding Roche lobe. During the irradiation cycle the system alternates between semi-detached and detached states. While detached the system appears as a binary millisecond pulsar, called a redback. Circumstellar material, as seen in redbacks, is left behind after the previous semi-detached phase. Conclusions: The evolution of binary radio pulsar systems considering irradiation successfully accounts for, and provides a way for, the occurrence of redback pulsars in low-metallicity environments such as globular clusters. This is the case despite possible effects of the low metal content of the donor star that could drive systems away from redback configuration.
What Feeds the Beast in a Galaxy Cluster?

NASA Image and Video Library

2015-09-10

A massive cluster of galaxies, called SpARCS1049+56, can be seen in this multi-wavelength view from NASA Hubble and Spitzer space telescopes. At the middle of the picture is the largest, central member of the family of galaxies (upper right red dot of central pair). Unlike other central galaxies in clusters, this one is bursting with the birth of new stars. Scientists say this star birth was triggered by a collision between a smaller galaxy and the giant, central galaxy. The smaller galaxy's wispy, shredded parts, called a tidal tail, can be seen coming out below the larger galaxy. Throughout this region are features called "beads on a string," which are areas where gas has clumped to form new stars. This type of "feeding" mechanism for galaxy clusters -- where gas from the merging of galaxies is converted to new stars -- is rare. The Hubble data in this image show infrared light with a wavelength of 1 micron in blue, and 1.6 microns in green. The Spitzer data show infrared light of 3.6 microns in red. http://photojournal.jpl.nasa.gov/catalog/PIA19837
SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies

PubMed Central

Bouaziz, Matthieu; Paccard, Caroline; Guedj, Mickael; Ambroise, Christophe

2012-01-01

Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns. PMID:23077494
Functional specialization in nucleotide sugar transporters occurred through differentiation of the gene cluster EamA (DUF6) before the radiation of Viridiplantae

PubMed Central

2011-01-01

Background The drug/metabolite transporter superfamily comprises a diversity of protein domain families with multiple functions including transport of nucleotide sugars. Drug/metabolite transporter domains are contained in both solute carrier families 30, 35 and 39 proteins as well as in acyl-malonyl condensing enzyme proteins. In this paper, we present an evolutionary analysis of nucleotide sugar transporters in relation to the entire superfamily of drug/metabolite transporters that considers crucial intra-protein duplication events that have shaped the transporters. We use a method that combines the strengths of hidden Markov models and maximum likelihood to find relationships between drug/metabolite transporter families, and branches within families. Results We present evidence that the triose-phosphate transporters, domain unknown function 914, uracil-diphosphate glucose-N-acetylglucosamine, and nucleotide sugar transporter families have evolved from a domain duplication event before the radiation of Viridiplantae in the EamA family (previously called domain unknown function 6). We identify previously unknown branches in the solute carrier 30, 35 and 39 protein families that emerged simultaneously as key physiological developments after the radiation of Viridiplantae, including the "35C/E" branch of EamA, which formed in the lineage of T. adhaerens (Animalia). We identify a second cluster of DMTs, called the domain unknown function 1632 cluster, which has non-cytosolic N- and C-termini, and thus appears to have been formed from a different domain duplication event. We identify a previously uncharacterized motif, G-X(6)-G, which is overrepresented in the fifth transmembrane helix of C-terminal domains. We present evidence that the family called fatty acid elongases are homologous to transporters, not enzymes as had previously been thought. Conclusions The nucleotide sugar transporters families were formed through differentiation of the gene cluster EamA (domain unknown function 6) before Viridiplantae, showing for the first time the significance of EamA. PMID:21569384
The effect of input DNA copy number on genotype call and characterising SNP markers in the humpback whale genome using a nanofluidic array.

PubMed

Bhat, Somanath; Polanowski, Andrea M; Double, Mike C; Jarman, Simon N; Emslie, Kerry R

2012-01-01

Recent advances in nanofluidic technologies have enabled the use of Integrated Fluidic Circuits (IFCs) for high-throughput Single Nucleotide Polymorphism (SNP) genotyping (GT). In this study, we implemented and validated a relatively low cost nanofluidic system for SNP-GT with and without Specific Target Amplification (STA). As proof of principle, we first validated the effect of input DNA copy number on genotype call rate using well characterised, digital PCR (dPCR) quantified human genomic DNA samples and then implemented the validated method to genotype 45 SNPs in the humpback whale, Megaptera novaeangliae, nuclear genome. When STA was not incorporated, for a homozygous human DNA sample, reaction chambers containing, on average 9 to 97 copies, showed 100% call rate and accuracy. Below 9 copies, the call rate decreased, and at one copy it was 40%. For a heterozygous human DNA sample, the call rate decreased from 100% to 21% when predicted copies per reaction chamber decreased from 38 copies to one copy. The tightness of genotype clusters on a scatter plot also decreased. In contrast, when the same samples were subjected to STA prior to genotyping a call rate and a call accuracy of 100% were achieved. Our results demonstrate that low input DNA copy number affects the quality of data generated, in particular for a heterozygous sample. Similar to human genomic DNA, a call rate and a call accuracy of 100% was achieved with whale genomic DNA samples following multiplex STA using either 15 or 45 SNP-GT assays. These calls were 100% concordant with their true genotypes determined by an independent method, suggesting that the nanofluidic system is a reliable platform for executing call rates with high accuracy and concordance in genomic sequences derived from biological tissue.
Ventricular beat classifier using fractal number clustering.

PubMed

Bakardjian, H

1992-09-01

A two-stage ventricular beat 'associative' classification procedure is described. The first stage separates typical beats from extrasystoles on the basis of area and polarity rules. At the second stage, the extrasystoles are classified in self-organised cluster formations of adjacent shape parameter values. This approach avoids the use of threshold values for discrimination between ectopic beats of different shapes, which could be critical in borderline cases. A pattern shape feature conventionally called a 'fractal number', in combination with a polarity attribute, was found to be a good criterion for waveform evaluation. An additional advantage of this pattern classification method is its good computational efficiency, which affords the opportunity to implement it in real-time systems.
Rigid-Cluster Models of Conformational Transitions in Macromolecular Machines and Assemblies

PubMed Central

Kim, Moon K.; Jernigan, Robert L.; Chirikjian, Gregory S.

2005-01-01

We present a rigid-body-based technique (called rigid-cluster elastic network interpolation) to generate feasible transition pathways between two distinct conformations of a macromolecular assembly. Many biological molecules and assemblies consist of domains which act more or less as rigid bodies during large conformational changes. These collective motions are thought to be strongly related with the functions of a system. This fact encourages us to simply model a macromolecule or assembly as a set of rigid bodies which are interconnected with distance constraints. In previous articles, we developed coarse-grained elastic network interpolation (ENI) in which, for example, only Cα atoms are selected as representatives in each residue of a protein. We interpolate distance differences of two conformations in ENI by using a simple quadratic cost function, and the feasible conformations are generated without steric conflicts. Rigid-cluster interpolation is an extension of the ENI method with rigid-clusters replacing point masses. Now the intermediate conformations in an anharmonic pathway can be determined by the translational and rotational displacements of large clusters in such a way that distance constraints are observed. We present the derivation of the rigid-cluster model and apply it to a variety of macromolecular assemblies. Rigid-cluster ENI is then modified for a hybrid model represented by a mixture of rigid clusters and point masses. Simulation results show that both rigid-cluster and hybrid ENI methods generate sterically feasible pathways of large systems in a very short time. For example, the HK97 virus capsid is an icosahedral symmetric assembly composed of 60 identical asymmetric units. Its original Hessian matrix size for a Cα coarse-grained model is >(300,000)2. However, it reduces to (84)2 when we apply the rigid-cluster model with icosahedral symmetry constraints. The computational cost of the interpolation no longer scales heavily with the size of structures; instead, it depends strongly on the minimal number of rigid clusters into which the system can be decomposed. PMID:15833998
Improving cluster-based missing value estimation of DNA microarray data.

PubMed

Brás, Lígia P; Menezes, José C

2007-06-01

We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.
Predicting overlapping protein complexes from weighted protein interaction graphs by gradually expanding dense neighborhoods.

PubMed

Dimitrakopoulos, Christos; Theofilatos, Konstantinos; Pegkas, Andreas; Likothanassis, Spiros; Mavroudi, Seferina

2016-07-01

Proteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein-protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics. In the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative "seed" nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster. In experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy. In the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters. Copyright © 2016 Elsevier B.V. All rights reserved.
A Novel Hybrid Intelligent Indoor Location Method for Mobile Devices by Zones Using Wi-Fi Signals

PubMed Central

Castañón–Puga, Manuel; Salazar, Abby Stephanie; Aguilar, Leocundo; Gaxiola-Pacheco, Carelia; Licea, Guillermo

2015-01-01

The increasing use of mobile devices in indoor spaces brings challenges to location methods. This work presents a hybrid intelligent method based on data mining and Type-2 fuzzy logic to locate mobile devices in an indoor space by zones using Wi-Fi signals from selected access points (APs). This approach takes advantage of wireless local area networks (WLANs) over other types of architectures and implements the complete method in a mobile application using the developed tools. Besides, the proposed approach is validated by experimental data obtained from case studies and the cross-validation technique. For the purpose of generating the fuzzy rules that conform to the Takagi–Sugeno fuzzy system structure, a semi-supervised data mining technique called subtractive clustering is used. This algorithm finds centers of clusters from the radius map given by the collected signals from APs. Measurements of Wi-Fi signals can be noisy due to several factors mentioned in this work, so this method proposed the use of Type-2 fuzzy logic for modeling and dealing with such uncertain information. PMID:26633417
A Novel Hybrid Intelligent Indoor Location Method for Mobile Devices by Zones Using Wi-Fi Signals.

PubMed

Castañón-Puga, Manuel; Salazar, Abby Stephanie; Aguilar, Leocundo; Gaxiola-Pacheco, Carelia; Licea, Guillermo

2015-12-02

The increasing use of mobile devices in indoor spaces brings challenges to location methods. This work presents a hybrid intelligent method based on data mining and Type-2 fuzzy logic to locate mobile devices in an indoor space by zones using Wi-Fi signals from selected access points (APs). This approach takes advantage of wireless local area networks (WLANs) over other types of architectures and implements the complete method in a mobile application using the developed tools. Besides, the proposed approach is validated by experimental data obtained from case studies and the cross-validation technique. For the purpose of generating the fuzzy rules that conform to the Takagi-Sugeno fuzzy system structure, a semi-supervised data mining technique called subtractive clustering is used. This algorithm finds centers of clusters from the radius map given by the collected signals from APs. Measurements of Wi-Fi signals can be noisy due to several factors mentioned in this work, so this method proposed the use of Type-2 fuzzy logic for modeling and dealing with such uncertain information.
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

PubMed

Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

2018-05-02

Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.

Exercise, character strengths, well-being, and learning climate in the prediction of performance over a 6-month period at a call center

PubMed Central

Moradi, Saleh; Nima, Ali A.; Rapp Ricciardi, Max; Archer, Trevor; Garcia, Danilo

2014-01-01

Background: Performance monitoring might have an adverse influence on call center agents' well-being. We investigate how performance, over a 6-month period, is related to agents' perceptions of their learning climate, character strengths, well-being (subjective and psychological), and physical activity. Method: Agents (N = 135) self-reported perception of the learning climate (Learning Climate Questionnaire), character strengths (Values In Action Inventory Short Version), well-being (Positive Affect, Negative Affect Schedule, Satisfaction With Life Scale, Psychological Well-Being Scales Short Version), and how often/intensively they engaged in physical activity. Performance, “time on the phone,” was monitored for 6 consecutive months by the same system handling the calls. Results: Performance was positively related to having opportunities to develop, the character strengths clusters of Wisdom and Knowledge (e.g., curiosity for learning, perspective) and Temperance (e.g., having self-control, being prudent, humble, and modest), and exercise frequency. Performance was negatively related to the sense of autonomy and responsibility, contentedness, the character strengths clusters of Humanity and Love (e.g., helping others, cooperation) and Justice (e.g., affiliation, fairness, leadership), positive affect, life satisfaction and exercise Intensity. Conclusion: Call centers may need to create opportunities to develop to increase agents' performance and focus on individual differences in the recruitment and selection of agents to prevent future shortcomings or worker dissatisfaction. Nevertheless, performance measurement in call centers may need to include other aspects that are more attuned with different character strengths. After all, allowing individuals to put their strengths at work should empower the individual and at the end the organization itself. Finally, physical activity enhancement programs might offer considerable positive work outcomes. PMID:25002853
Comparison of memory thresholds for planar qudit geometries

NASA Astrophysics Data System (ADS)

Marks, Jacob; Jochym-O'Connor, Tomas; Gheorghiu, Vlad

2017-11-01

We introduce and analyze a new type of decoding algorithm called general color clustering, based on renormalization group methods, to be used in qudit color codes. The performance of this decoder is analyzed under a generalized bit-flip error model, and is used to obtain the first memory threshold estimates for qudit 6-6-6 color codes. The proposed decoder is compared with similar decoding schemes for qudit surface codes as well as the current leading qubit decoders for both sets of codes. We find that, as with surface codes, clustering performs sub-optimally for qubit color codes, giving a threshold of 5.6 % compared to the 8.0 % obtained through surface projection decoding methods. However, the threshold rate increases by up to 112% for large qudit dimensions, plateauing around 11.9 % . All the analysis is performed using QTop, a new open-source software for simulating and visualizing topological quantum error correcting codes.
MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification.

PubMed

Kalyanaraman, Ananth; Cannon, William R; Latt, Benjamin; Baxter, Douglas J

2011-11-01

A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs. The source code along with user documentation are available on http://compbio.eecs.wsu.edu/MR-MSPolygraph. ananth@eecs.wsu.edu; william.cannon@pnnl.gov. Supplementary data are available at Bioinformatics online.
Improving Electronic Sensor Reliability by Robust Outlier Screening

PubMed Central

Moreno-Lizaranzu, Manuel J.; Cuesta, Federico

2013-01-01

Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs. PMID:24113682
Improving electronic sensor reliability by robust outlier screening.

PubMed

Moreno-Lizaranzu, Manuel J; Cuesta, Federico

2013-10-09

Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs.
Functional feature embedded space mapping of fMRI data.

PubMed

Hu, Jin; Tian, Jie; Yang, Lei

2006-01-01

We have proposed a new method for fMRI data analysis which is called Functional Feature Embedded Space Mapping (FFESM). Our work mainly focuses on the experimental design with periodic stimuli which can be described by a number of Fourier coefficients in the frequency domain. A nonlinear dimension reduction technique Isomap is applied to the high dimensional features obtained from frequency domain of the fMRI data for the first time. Finally, the presence of activated time series is identified by the clustering method in which the information theoretic criterion of minimum description length (MDL) is used to estimate the number of clusters. The feasibility of our algorithm is demonstrated by real human experiments. Although we focus on analyzing periodic fMRI data, the approach can be extended to analyze non-periodic fMRI data (event-related fMRI) by replacing the Fourier analysis with a wavelet analysis.
Clusterization in Ternary Fission

NASA Astrophysics Data System (ADS)

Kamanin, D. V.; Pyatkov, Y. V.

This lecture notes are devoted to the new kind of ternary decay of low excited heavy nuclei called by us "collinear cluster tri-partition" (CCT) due to the features of the effect observed, namely, decay partners fly away almost collinearly and at least one of them has magic nucleon composition. At the early stage of our work the process of "true ternary fission" (fission of the nucleus into three fragments of comparable masses) was considered to be undiscovered for low excited heavy nuclei. Another possible prototype—three body cluster radioactivity—was also unknown. The most close to the CCT phenomenon, at least cinematically, stands so called "polar emission", but only very light ions (up to isotopes of Be) were observed so far.
Immune Centroids Over-Sampling Method for Multi-Class Classification

DTIC Science & Technology

2015-05-22

recognize to specific antigens . The response of a receptor to an antigen can activate its hosting B-cell. Activated B-cell then proliferates and...modifying N.K. Jerne’s theory. The theory states that in a pre-existing group of lympho- cytes ( specifically B cells), a specific antigen only...the clusters of each small class, which have high data density, called global immune centroids over-sampling (denoted as Global-IC). Specifically
Solving the scalability issue in quantum-based refinement: Q|R#1.

PubMed

Zheng, Min; Moriarty, Nigel W; Xu, Yanting; Reimers, Jeffrey R; Afonine, Pavel V; Waller, Mark P

2017-12-01

Accurately refining biomacromolecules using a quantum-chemical method is challenging because the cost of a quantum-chemical calculation scales approximately as n m , where n is the number of atoms and m (≥3) is based on the quantum method of choice. This fundamental problem means that quantum-chemical calculations become intractable when the size of the system requires more computational resources than are available. In the development of the software package called Q|R, this issue is referred to as Q|R#1. A divide-and-conquer approach has been developed that fragments the atomic model into small manageable pieces in order to solve Q|R#1. Firstly, the atomic model of a crystal structure is analyzed to detect noncovalent interactions between residues, and the results of the analysis are represented as an interaction graph. Secondly, a graph-clustering algorithm is used to partition the interaction graph into a set of clusters in such a way as to minimize disruption to the noncovalent interaction network. Thirdly, the environment surrounding each individual cluster is analyzed and any residue that is interacting with a particular cluster is assigned to the buffer region of that particular cluster. A fragment is defined as a cluster plus its buffer region. The gradients for all atoms from each of the fragments are computed, and only the gradients from each cluster are combined to create the total gradients. A quantum-based refinement is carried out using the total gradients as chemical restraints. In order to validate this interaction graph-based fragmentation approach in Q|R, the entire atomic model of an amyloid cross-β spine crystal structure (PDB entry 2oNA) was refined.
Cerebellar Functional Parcellation Using Sparse Dictionary Learning Clustering.

PubMed

Wang, Changqing; Kipping, Judy; Bao, Chenglong; Ji, Hui; Qiu, Anqi

2016-01-01

The human cerebellum has recently been discovered to contribute to cognition and emotion beyond the planning and execution of movement, suggesting its functional heterogeneity. We aimed to identify the functional parcellation of the cerebellum using information from resting-state functional magnetic resonance imaging (rs-fMRI). For this, we introduced a new data-driven decomposition-based functional parcellation algorithm, called Sparse Dictionary Learning Clustering (SDLC). SDLC integrates dictionary learning, sparse representation of rs-fMRI, and k-means clustering into one optimization problem. The dictionary is comprised of an over-complete set of time course signals, with which a sparse representation of rs-fMRI signals can be constructed. Cerebellar functional regions were then identified using k-means clustering based on the sparse representation of rs-fMRI signals. We solved SDLC using a multi-block hybrid proximal alternating method that guarantees strong convergence. We evaluated the reliability of SDLC and benchmarked its classification accuracy against other clustering techniques using simulated data. We then demonstrated that SDLC can identify biologically reasonable functional regions of the cerebellum as estimated by their cerebello-cortical functional connectivity. We further provided new insights into the cerebello-cortical functional organization in children.
Recursive expectation-maximization clustering: A method for identifying buffering mechanisms composed of phenomic modules

NASA Astrophysics Data System (ADS)

Guo, Jingyu; Tian, Dehua; McKinney, Brett A.; Hartman, John L.

2010-06-01

Interactions between genetic and/or environmental factors are ubiquitous, affecting the phenotypes of organisms in complex ways. Knowledge about such interactions is becoming rate-limiting for our understanding of human disease and other biological phenomena. Phenomics refers to the integrative analysis of how all genes contribute to phenotype variation, entailing genome and organism level information. A systems biology view of gene interactions is critical for phenomics. Unfortunately the problem is intractable in humans; however, it can be addressed in simpler genetic model systems. Our research group has focused on the concept of genetic buffering of phenotypic variation, in studies employing the single-cell eukaryotic organism, S. cerevisiae. We have developed a methodology, quantitative high throughput cellular phenotyping (Q-HTCP), for high-resolution measurements of gene-gene and gene-environment interactions on a genome-wide scale. Q-HTCP is being applied to the complete set of S. cerevisiae gene deletion strains, a unique resource for systematically mapping gene interactions. Genetic buffering is the idea that comprehensive and quantitative knowledge about how genes interact with respect to phenotypes will lead to an appreciation of how genes and pathways are functionally connected at a systems level to maintain homeostasis. However, extracting biologically useful information from Q-HTCP data is challenging, due to the multidimensional and nonlinear nature of gene interactions, together with a relative lack of prior biological information. Here we describe a new approach for mining quantitative genetic interaction data called recursive expectation-maximization clustering (REMc). We developed REMc to help discover phenomic modules, defined as sets of genes with similar patterns of interaction across a series of genetic or environmental perturbations. Such modules are reflective of buffering mechanisms, i.e., genes that play a related role in the maintenance of physiological homeostasis. To develop the method, 297 gene deletion strains were selected based on gene-drug interactions with hydroxyurea, an inhibitor of ribonucleotide reductase enzyme activity, which is critical for DNA synthesis. To partition the gene functions, these 297 deletion strains were challenged with growth inhibitory drugs known to target different genes and cellular pathways. Q-HTCP-derived growth curves were used to quantify all gene interactions, and the data were used to test the performance of REMc. Fundamental advantages of REMc include objective assessment of total number of clusters and assignment to each cluster a log-likelihood value, which can be considered an indicator of statistical quality of clusters. To assess the biological quality of clusters, we developed a method called gene ontology information divergence z-score (GOid_z). GOid_z summarizes total enrichment of GO attributes within individual clusters. Using these and other criteria, we compared the performance of REMc to hierarchical and K-means clustering. The main conclusion is that REMc provides distinct efficiencies for mining Q-HTCP data. It facilitates identification of phenomic modules, which contribute to buffering mechanisms that underlie cellular homeostasis and the regulation of phenotypic expression.
Identification of five clusters of comorbidities in a longitudinal Japanese chronic obstructive pulmonary disease cohort.

PubMed

Chubachi, Shotaro; Sato, Minako; Kameyama, Naofumi; Tsutsumi, Akihiro; Sasaki, Mamoru; Tateno, Hiroki; Nakamura, Hidetoshi; Asano, Koichiro; Betsuyaku, Tomoko

2016-08-01

Patients with chronic obstructive pulmonary disease (COPD) frequently suffer from various comorbidities. Recently, cluster analysis has been proposed to examine the phenotypic heterogeneity in COPD. In order to comprehensively understand the comorbidities of COPD in Japan, we conducted multicenter, longitudinal cohort study, called the Keio COPD Comorbidity Research (K-CCR). In this cohort, comorbid diagnoses were established by both objective examination and review of clinical records, in addition to self-report. We aimed to investigate the clustering of nineteen clinically relevant comorbidities and the meaningful outcomes of the clusters over a two-year follow-up period. The present study analyzed data from COPD patients whose data of comorbidities were completed (n = 311). Cluster analysis was performed using Ward's minimum-variance method. Five comorbidity clusters were identified: less comorbidity; malignancy; metabolic and cardiovascular; gastroesophageal reflux disease (GERD) and psychological; and underweight and anemic. FEV1 did not differ among the clusters. GERD and psychological cluster had worse COPD assessment test (CAT) and Saint George's respiratory questionnaire (SGRQ) at baseline compared to the other clusters (CAT: p = 0.0003 and SGRQ: p = 0.00046). The rate of change in these scores did not differ within 2 years. The underweight and anemic cluster included subjects with lower baseline ratio of predicted diffusing capacity (DLco/VA) compared to the malignancy cluster (p = 0.036). Five clusters of comorbidities were identified in Japanese COPD patients. The clinical characteristics and health-related quality of life were different among these clusters during a follow-up of two years. Copyright © 2016 Elsevier Ltd. All rights reserved.
Testing light-traces-mass in Hubble Frontier Fields Cluster MACS-J0416.1-2403

DOE PAGES

Sebesta, Kevin; Williams, Liliya L. R.; Mohammed, Irshad; ...

2016-06-17

Here, we reconstruct the projected mass distribution of a massive merging Hubble Frontier Fields cluster MACSJ0416 using the genetic algorithm based free-form technique called Grale. The reconstructions are constrained by 149 lensed images identified by Jauzac et al. using HFF data. No information about cluster galaxies or light is used, which makes our reconstruction unique in this regard. Using visual inspection of the maps, as well as galaxy-mass correlation functions we conclude that overall light does follow mass. Furthermore, the fact that brighter galaxies are more strongly clustered with mass is an important confirmation of the standard biasing scenario inmore » galaxy clusters. On the smallest scales, approximately less than a few arcseconds, the resolution afforded by 149 images is still not sufficient to confirm or rule out galaxy-mass offsets of the kind observed in ACO 3827. We also compare the mass maps of MACSJ0416 obtained by three different groups: Grale, and two parametric Lenstool reconstructions from the CATS and Sharon/Johnson teams. Overall, the three agree well; one interesting discrepancy between Grale and Lenstool galaxy-mass correlation functions occurs on scales of tens of kpc and may suggest that cluster galaxies are more biased tracers of mass than parametric methods generally assume.« less
Missing link in the evolution of Hox clusters.

PubMed

Ogishima, Soichi; Tanaka, Hiroshi

2007-01-31

Hox cluster has key roles in regulating the patterning of the antero-posterior axis in a metazoan embryo. It consists of the anterior, central and posterior genes; the central genes have been identified only in bilaterians, but not in cnidarians, and are responsible for archiving morphological complexity in bilaterian development. However, their evolutionary history has not been revealed, that is, there has been a "missing link". Here we show the evolutionary history of Hox clusters of 18 bilaterians and 2 cnidarians by using a new method, "motif-based reconstruction", examining the gain/loss processes of evolutionarily conserved sequences, "motifs", outside the homeodomain. We successfully identified the missing link in the evolution of Hox clusters between the cnidarian-bilaterian ancestor and the bilaterians as the ancestor of the central genes, which we call the proto-central gene. Exploring the correspondent gene with the proto-central gene, we found that one of the acoela Hox genes has the same motif repertory as that of the proto-central gene. This interesting finding suggests that the acoela Hox cluster corresponds with the missing link in the evolution of the Hox cluster between the cnidarian-bilaterian ancestor and the bilaterians. Our findings suggested that motif gains/diversifications led to the explosive diversity of the bilaterian body plan.
Testing light-traces-mass in Hubble Frontier Fields Cluster MACS-J0416.1-2403

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sebesta, Kevin; Williams, Liliya L. R.; Mohammed, Irshad

Here, we reconstruct the projected mass distribution of a massive merging Hubble Frontier Fields cluster MACSJ0416 using the genetic algorithm based free-form technique called Grale. The reconstructions are constrained by 149 lensed images identified by Jauzac et al. using HFF data. No information about cluster galaxies or light is used, which makes our reconstruction unique in this regard. Using visual inspection of the maps, as well as galaxy-mass correlation functions we conclude that overall light does follow mass. Furthermore, the fact that brighter galaxies are more strongly clustered with mass is an important confirmation of the standard biasing scenario inmore » galaxy clusters. On the smallest scales, approximately less than a few arcseconds, the resolution afforded by 149 images is still not sufficient to confirm or rule out galaxy-mass offsets of the kind observed in ACO 3827. We also compare the mass maps of MACSJ0416 obtained by three different groups: Grale, and two parametric Lenstool reconstructions from the CATS and Sharon/Johnson teams. Overall, the three agree well; one interesting discrepancy between Grale and Lenstool galaxy-mass correlation functions occurs on scales of tens of kpc and may suggest that cluster galaxies are more biased tracers of mass than parametric methods generally assume.« less
Plasma protein induced clustering of red blood cells in micro capillaries

NASA Astrophysics Data System (ADS)

Wagner, Christian; Brust, Mathias; Aouane, Othmane; Flormann, Daniel; Thiebaud, Marine; Verdier, Claude; Coupier, Gwennou; Podgorski, Thomas; Misbah, Chaouqi; Selmi, Hassib

2013-11-01

The plasma molecule fibrinogen induces aggregation of RBCs to clusters, the so called rouleaux. Higher shear rates in bulk flow can break them up which results in the pronounced shear thinning of blood. This led to the assumption that rouleaux formation does not take place in the microcapillaries of the vascular network where high shear rates are present. However, the question is of high medical relevance. Cardio vascular disorders are still the main cause of death in the western world and cardiac patients have often higher fibrinogen level. We performed AFM based single cell force spectroscopy to determine the work of separation. Measurements at low hematocrit in a microfluidic channel show that the number of size of clusters is determined by the adhesion strength and we found that cluster formation is strongly enhanced by fibrinogen at physiological concentrations, even at shear rate as high as 1000 1/s. Numerical simulations based on a boundary integral method confirm our findings and the clustering transition takes place both in the experiments and in the simulations at the same interaction energies. In vivo measurements with intravital fluorescence microscopy in a dorsal skin fold chamber in a mouse reveal that RBCs indeed form clusters in the micrcapillary flow. This work was supported by the German Science Foundation research imitative SFB1027.
Mass Function of Galaxy Clusters in Relativistic Inhomogeneous Cosmology

NASA Astrophysics Data System (ADS)

Ostrowski, Jan J.; Buchert, Thomas; Roukema, Boudewijn F.

The current cosmological model (ΛCDM) with the underlying FLRW metric relies on the assumption of local isotropy, hence homogeneity of the Universe. Difficulties arise when one attempts to justify this model as an average description of the Universe from first principles of general relativity, since in general, the Einstein tensor built from the averaged metric is not equal to the averaged stress-energy tensor. In this context, the discrepancy between these quantities is called "cosmological backreaction" and has been the subject of scientific debate among cosmologists and relativists for more than 20 years. Here we present one of the methods to tackle this problem, i.e. averaging the scalar parts of the Einstein equations, together with its application, the cosmological mass function of galaxy clusters.
An advanced method for classifying atmospheric circulation types based on prototypes connectivity graph

NASA Astrophysics Data System (ADS)

Zagouras, Athanassios; Argiriou, Athanassios A.; Flocas, Helena A.; Economou, George; Fotopoulos, Spiros

2012-11-01

Classification of weather maps at various isobaric levels as a methodological tool is used in several problems related to meteorology, climatology, atmospheric pollution and to other fields for many years. Initially the classification was performed manually. The criteria used by the person performing the classification are features of isobars or isopleths of geopotential height, depending on the type of maps to be classified. Although manual classifications integrate the perceptual experience and other unquantifiable qualities of the meteorology specialists involved, these are typically subjective and time consuming. Furthermore, during the last years different approaches of automated methods for atmospheric circulation classification have been proposed, which present automated and so-called objective classifications. In this paper a new method of atmospheric circulation classification of isobaric maps is presented. The method is based on graph theory. It starts with an intelligent prototype selection using an over-partitioning mode of fuzzy c-means (FCM) algorithm, proceeds to a graph formulation for the entire dataset and produces the clusters based on the contemporary dominant sets clustering method. Graph theory is a novel mathematical approach, allowing a more efficient representation of spatially correlated data, compared to the classical Euclidian space representation approaches, used in conventional classification methods. The method has been applied to the classification of 850 hPa atmospheric circulation over the Eastern Mediterranean. The evaluation of the automated methods is performed by statistical indexes; results indicate that the classification is adequately comparable with other state-of-the-art automated map classification methods, for a variable number of clusters.
Repertoire and classification of non-song calls in Southeast Alaskan humpback whales (Megaptera novaeangliae).

PubMed

Fournet, Michelle E; Szabo, Andy; Mellinger, David K

2015-01-01

On low-latitude breeding grounds, humpback whales produce complex and highly stereotyped songs as well as a range of non-song sounds associated with breeding behaviors. While on their Southeast Alaskan foraging grounds, humpback whales produce a range of previously unclassified non-song vocalizations. This study investigates the vocal repertoire of Southeast Alaskan humpback whales from a sample of 299 non-song vocalizations collected over a 3-month period on foraging grounds in Frederick Sound, Southeast Alaska. Three classification systems were used, including aural spectrogram analysis, statistical cluster analysis, and discriminant function analysis, to describe and classify vocalizations. A hierarchical acoustic structure was identified; vocalizations were classified into 16 individual call types nested within four vocal classes. The combined classification method shows promise for identifying variability in call stereotypy between vocal groupings and is recommended for future classification of broad vocal repertoires.
Scalable Static and Dynamic Community Detection Using Grappolo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman

Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less

Inside the Flame Nebula

NASA Image and Video Library

2014-05-07

This composite image shows one of the clusters, NGC 2024, which is found in the center of the so-called Flame Nebula about 1,400 light years from Earth. Astronomers have studied two star clusters using NASA Chandra and infrared telescopes.
Celestial Cities and the Roads That Connect Them

NASA Image and Video Library

2008-01-25

Observations from NASA Spitzer Space Telescope show that filamentary galaxies form stars at twice the rate of their densely clustered counterparts. This is a representation of galaxies in and surrounding a galaxy cluster called Abell 1763.
Reuse of imputed data in microarray analysis increases imputation efficiency

PubMed Central

Kim, Ki-Yeol; Kim, Byoung-Jin; Yi, Gwan-Su

2004-01-01

Background The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. Results We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. Conclusions Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data. PMID:15504240
Pili-taxis: Clustering of Neisseria gonorrhoeae bacteria

NASA Astrophysics Data System (ADS)

Taktikos, Johannes; Zaburdaev, Vasily; Biais, Nicolas; Stark, Holger; Weitz, David A.

2012-02-01

The first step of colonization of Neisseria gonorrhoeae bacteria, the etiological agent of gonorrhea, is the attachment to human epithelial cells. The attachment of N. gonorrhoeae bacteria to surfaces or other cells is primarily mediated by filamentous appendages, called type IV pili (Tfp). Cycles of elongation and retraction of Tfp are responsible for a common bacterial motility called twitching motility which allows the bacteria to crawl over surfaces. Experimentally, N. gonorrhoeae cells initially dispersed over a surface agglomerate into round microcolonies within hours. It is so far not known whether this clustering is driven entirely by the Tfp dynamics or if chemotactic interactions are needed. Thus, we investigate whether the agglomeration may stem solely from the pili-mediated attraction between cells. By developing a statistical model for pili-taxis, we try to explain the experimental measurements of the time evolution of the mean cluster size, number of clusters, and area fraction covered by the cells.
Glass Effect in Inbreeding-Avoidance Systems: Minimum Viable Population for Outbreeders

NASA Astrophysics Data System (ADS)

Tainaka, Kei-ichi; Itoh, Yoshiaki

1996-10-01

Many animals, birds and plants have evolved mechanisms to avoid inbreeding between close relatives.Such mating systems may have developed several methods for restricting mate choice.If fragmentation of habitats becomes serious, these methods may lead to a lack of acceptable mates. We call this “glass effect”which is a generalization of the so-called Allee effect.We present two inbreeding-avoidance (outbreeding) models.Both models show that outbreeders have a high risk infragmented environments.We thus obtain the minimum viable population (MVP). It is found that the value of MVP amounts to the range from several hundred to several thousand individuals.While this value is much larger than thoseobtained by the previous demographic theories,it is consistent with recent empirical estimations.Moreover, we find that the glass effect is caused bydynamically induced clusters of relatives.This suggests that genetic variation will be decreased by the outbreeding in a highly fragmented environment.
Functional video-based analysis of 3D cardiac structures generated from human embryonic stem cells.

PubMed

Nitsch, Scarlett; Braun, Florian; Ritter, Sylvia; Scholz, Michael; Schroeder, Insa S

2018-05-01

Human embryonic stem cells (hESCs) differentiated into cardiomyocytes (CM) often develop into complex 3D structures that are composed of various cardiac cell types. Conventional methods to study the electrophysiology of cardiac cells are patch clamp and microelectrode array (MEAs) analyses. However, these methods are not suitable to investigate the contractile features of 3D cardiac clusters that detach from the surface of the culture dishes during differentiation. To overcome this problem, we developed a video-based motion detection software relying on the optical flow by Farnebäck that we call cBRA (cardiac beat rate analyzer). The beating characteristics of the differentiated cardiac clusters were calculated based on the local displacement between two subsequent images. Two differentiation protocols, which profoundly differ in the morphology of cardiac clusters generated and in the expression of cardiac markers, were used and the resulting CM were characterized. Despite these differences, beat rates and beating variabilities could be reliably determined using cBRA. Likewise, stimulation of β-adrenoreceptors by isoproterenol could easily be identified in the hESC-derived CM. Since even subtle changes in the beating features are detectable, this method is suitable for high throughput cardiotoxicity screenings. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming.

PubMed

Wang, Haizhou; Song, Mingzhou

2011-12-01

The heuristic k -means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp . We demonstrate its advantage in optimality and runtime over the standard iterative k -means algorithm.
Transferability study of CHO cell clustering assays for monitoring of pertussis toxin activity in acellular pertussis vaccines.

PubMed

Isbrucker, R; Daas, A; Wagner, L; Costanzo, A

2016-01-01

Current regulations for acellular pertussis (aP) vaccines require that they are tested for the presence of residual or reversion-derived pertussis toxin (PTx) activity using the mouse histamine sensitisation test (HIST). Although a CHO cell clustering assay can be used by manufacturers to verify if sufficient inactivation of the substance has occurred in-process, this assay cannot be used at present for the final product due to the presence of aluminium adjuvants which interfere with mammalian cell cultures. Recently, 2 modified CHO cell clustering assays which accommodate for the adjuvant effects have been proposed as alternatives to the HIST. These modified assays eliminate the adjuvant-induced cytotoxicity either through dilution of the vaccine (called the Direct Method) or by introducing a porous barrier between the adjuvant and the cells (the Indirect Method). Transferability and suitability of these methods for testing of products present on the European market were investigated during a collaborative study organised by the European Directorate for the Quality of Medicines & HealthCare (EDQM). Thirteen laboratories participated in this study which included 4 aP-containing vaccines spiked by addition of PTx. This study also assessed the transferability of a standardised CHO cell clustering assay protocol for use with non-adjuvanted PTx preparations. Results showed that the majority of laboratories were able to detect the PTx spike in all 4 vaccines at concentrations of 4 IU/mL or lower using the Indirect Method. This sensitivity is in the range of the theoretical sensitivity of the HIST. The Direct Method however did not show the expected results and would need additional development work.
On characterizing population commonalities and subject variations in brain networks.

PubMed

Ghanbari, Yasser; Bloy, Luke; Tunc, Birkan; Shankar, Varsha; Roberts, Timothy P L; Edgar, J Christopher; Schultz, Robert T; Verma, Ragini

2017-05-01

Brain networks based on resting state connectivity as well as inter-regional anatomical pathways obtained using diffusion imaging have provided insight into pathology and development. Such work has underscored the need for methods that can extract sub-networks that can accurately capture the connectivity patterns of the underlying population while simultaneously describing the variation of sub-networks at the subject level. We have designed a multi-layer graph clustering method that extracts clusters of nodes, called 'network hubs', which display higher levels of connectivity within the cluster than to the rest of the brain. The method determines an atlas of network hubs that describes the population, as well as weights that characterize subject-wise variation in terms of within- and between-hub connectivity. This lowers the dimensionality of brain networks, thereby providing a representation amenable to statistical analyses. The applicability of the proposed technique is demonstrated by extracting an atlas of network hubs for a population of typically developing controls (TDCs) as well as children with autism spectrum disorder (ASD), and using the structural and functional networks of a population to determine the subject-level variation of these hubs and their inter-connectivity. These hubs are then used to compare ASD and TDCs. Our method is generalizable to any population whose connectivity (structural or functional) can be captured via non-negative network graphs. Copyright © 2015 Elsevier B.V. All rights reserved.
A path-based measurement for human miRNA functional similarities using miRNA-disease associations

NASA Astrophysics Data System (ADS)

Ding, Pingjian; Luo, Jiawei; Xiao, Qiu; Chen, Xiangtao

2016-09-01

Compared with the sequence and expression similarity, miRNA functional similarity is so important for biology researches and many applications such as miRNA clustering, miRNA function prediction, miRNA synergism identification and disease miRNA prioritization. However, the existing methods always utilized the predicted miRNA target which has high false positive and false negative to calculate the miRNA functional similarity. Meanwhile, it is difficult to achieve high reliability of miRNA functional similarity with miRNA-disease associations. Therefore, it is increasingly needed to improve the measurement of miRNA functional similarity. In this study, we develop a novel path-based calculation method of miRNA functional similarity based on miRNA-disease associations, called MFSP. Compared with other methods, our method obtains higher average functional similarity of intra-family and intra-cluster selected groups. Meanwhile, the lower average functional similarity of inter-family and inter-cluster miRNA pair is obtained. In addition, the smaller p-value is achieved, while applying Wilcoxon rank-sum test and Kruskal-Wallis test to different miRNA groups. The relationship between miRNA functional similarity and other information sources is exhibited. Furthermore, the constructed miRNA functional network based on MFSP is a scale-free and small-world network. Moreover, the higher AUC for miRNA-disease prediction indicates the ability of MFSP uncovering miRNA functional similarity.
Robust statistical methods for hit selection in RNA interference high-throughput screening experiments.

PubMed

Zhang, Xiaohua Douglas; Yang, Xiting Cindy; Chung, Namjin; Gates, Adam; Stec, Erica; Kunapuli, Priya; Holder, Dan J; Ferrer, Marc; Espeseth, Amy S

2006-04-01

RNA interference (RNAi) high-throughput screening (HTS) experiments carried out using large (>5000 short interfering [si]RNA) libraries generate a huge amount of data. In order to use these data to identify the most effective siRNAs tested, it is critical to adopt and develop appropriate statistical methods. To address the questions in hit selection of RNAi HTS, we proposed a quartile-based method which is robust to outliers, true hits and nonsymmetrical data. We compared it with the more traditional tests, mean +/- k standard deviation (SD) and median +/- 3 median of absolute deviation (MAD). The results suggested that the quartile-based method selected more hits than mean +/- k SD under the same preset error rate. The number of hits selected by median +/- k MAD was close to that by the quartile-based method. Further analysis suggested that the quartile-based method had the greatest power in detecting true hits, especially weak or moderate true hits. Our investigation also suggested that platewise analysis (determining effective siRNAs on a plate-by-plate basis) can adjust for systematic errors in different plates, while an experimentwise analysis, in which effective siRNAs are identified in an analysis of the entire experiment, cannot. However, experimentwise analysis may detect a cluster of true positive hits placed together in one or several plates, while platewise analysis may not. To display hit selection results, we designed a specific figure called a plate-well series plot. We thus suggest the following strategy for hit selection in RNAi HTS experiments. First, choose the quartile-based method, or median +/- k MAD, for identifying effective siRNAs. Second, perform the chosen method experimentwise on transformed/normalized data, such as percentage inhibition, to check the possibility of hit clusters. If a cluster of selected hits are observed, repeat the analysis based on untransformed data to determine whether the cluster is due to an artifact in the data. If no clusters of hits are observed, select hits by performing platewise analysis on transformed data. Third, adopt the plate-well series plot to visualize both the data and the hit selection results, as well as to check for artifacts.
Genotyping in the cloud with Crossbow.

PubMed

Gurtowski, James; Schatz, Michael C; Langmead, Ben

2012-09-01

Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.
Callings in Career: A Typological Approach to Essential and Optional Components

ERIC Educational Resources Information Center

Hirschi, Andreas

2011-01-01

A sense of calling in career is supposed to have positive implications for individuals and organizations but current theoretical development is plagued with incongruent conceptualizations of what does or does not constitute a calling. The present study used cluster analysis to identify essential and optional components of a presence of calling…
The Value of Molecular vs. Morphometric and Acoustic Information for Species Identification Using Sympatric Molossid Bats

PubMed Central

Gager, Yann; Tarland, Emilia; Lieckfeldt, Dietmar; Ménage, Matthieu; Botero-Castro, Fidel; Rossiter, Stephen J.; Kraus, Robert H. S.; Ludwig, Arne; Dechmann, Dina K. N.

2016-01-01

A fundamental condition for any work with free-ranging animals is correct species identification. However, in case of bats, information on local species assemblies is frequently limited especially in regions with high biodiversity such as the Neotropics. The bat genus Molossus is a typical example of this, with morphologically similar species often occurring in sympatry. We used a multi-method approach based on molecular, morphometric and acoustic information collected from 962 individuals of Molossus bondae, M. coibensis, and M. molossus captured in Panama. We distinguished M. bondae based on size and pelage coloration. We identified two robust species clusters composed of M. molossus and M. coibensis based on 18 microsatellite markers but also on a more stringently determined set of four markers. Phylogenetic reconstructions using the mitochondrial gene co1 (DNA barcode) were used to diagnose these microsatellite clusters as M. molossus and M. coibensis. To differentiate species, morphological information was only reliable when forearm length and body mass were combined in a linear discriminant function (95.9% correctly identified individuals). When looking in more detail at M. molossus and M. coibensis, only four out of 13 wing parameters were informative for species differentiation, with M. coibensis showing lower values for hand wing area and hand wing length and higher values for wing loading. Acoustic recordings after release required categorization of calls into types, yielding only two informative subsets: approach calls and two-toned search calls. Our data emphasizes the importance of combining morphological traits and independent genetic data to inform the best choice and combination of discriminatory information used in the field. Because parameters can vary geographically, the multi-method approach may need to be adjusted to local species assemblies and populations to be entirely informative. PMID:26943355
Advertisement call and genetic structure conservatism: good news for an endangered Neotropical frog

PubMed Central

Costa, William P.; Martins, Lucas B.; Nunes-de-Almeida, Carlos H. L.; Toledo, Luís Felipe

2016-01-01

Background: Many amphibian species are negatively affected by habitat change due to anthropogenic activities. Populations distributed over modified landscapes may be subject to local extinction or may be relegated to the remaining—likely isolated and possibly degraded—patches of available habitat. Isolation without gene flow could lead to variability in phenotypic traits owing to differences in local selective pressures such as environmental structure, microclimate, or site-specific species assemblages. Methods: Here, we tested the microevolution hypothesis by evaluating the acoustic parameters of 349 advertisement calls from 15 males from six populations of the endangered amphibian species Proceratophrys moratoi. In addition, we analyzed the genetic distances among populations and the genetic diversity with a haplotype network analysis. We performed cluster analysis on acoustic data based on the Bray-Curtis index of similarity, using the UPGMA method. We correlated acoustic dissimilarities (calculated by Euclidean distance) with geographical and genetic distances among populations. Results: Spectral traits of the advertisement call of P. moratoi presented lower coefficients of variation than did temporal traits, both within and among males. Cluster analyses placed individuals without congruence in population or geographical distance, but recovered the species topology in relation to sister species. The genetic distance among populations was low; it did not exceed 0.4% for the most distant populations, and was not correlated with acoustic distance. Discussion: Both acoustic features and genetic sequences are highly conserved, suggesting that populations could be connected by recent migrations, and that they are subject to stabilizing selective forces. Although further studies are required, these findings add to a growing body of literature suggesting that this species would be a good candidate for a reintroduction program without negative effects on communication or genetic impact. PMID:27190717
Seven Sisters Get WISE

NASA Image and Video Library

2010-07-16

This image shows the famous Pleiades cluster of stars as seen through the eyes of NASA Wide-field Infrared Survey Explorer; they are what astronomers call an open cluster of stars, loosely bound to each other to eventually go their separate ways.
Formation of Compact Ellipticals in the merging star cluster scenario

NASA Astrophysics Data System (ADS)

Urrutia Zapata, Fernanda Cecilia; Theory and star formation group

2018-01-01

In the last years, extended old stellar clusters have been observed. They are like globular clusters (GCs) but with larger sizes(a limit of Re=10 pc is currently seen as reasonable). These extended objects (EOs) cover a huge range of mass. Objects at the low mass end with masses comparable to normal globular clusters are called extended clusters or faint fuzzies Larsen & Brodie (2000) and objects at the high-mass end are called ultra compact dwarf galaxies (UCDs). Ultra compact dwarf galaxies are compact object with luminositys above the brigtest known GCs. UCDs are more compact than typical dwarf galaxies but with comparable luminosities. Usually, a lower mass limit of 2 × 10^6 Solar masses is applied.Fellhauer & Kroupa (2002a,b) demostrated that object like ECs, FFs and UCDs can be the remnants of the merger of star clusters complexes, this scenario is called the Merging Star Cluster Scenario. Amore concise study was performed by Bruens et al. (2009, 2011).Our work tries to explain the formation of compact elliptical(cE). These objects are a comparatively rare class of spheroidal galaxies, possessing very small Re and high central surface brightnesses (Faber 1973). cEs have the same parameters as extended objects but they are slightly larger than 100 pc and the luminosities are in the range of -11 to -12 Mag.The standard formation sceanrio of these systems proposes a galaxy origin. CEs are the result of tidal stripping and truncation of nucleated larger systems. Or they could be a natural extension of the class of elliptical galaxies to lower luminosities and smaller sizes.We want to propose a completely new formation scenario for cEs. In our project we try to model cEs in a similar way that UCDs using the merging star cluster scenario extended to much higher masses and sizes. We think that in the early Universe we might have produced sufficiently strong star bursts to form cluster complexes which merge into cEs. So far it is observationally unknown if cEs are dark matter dominated objects. If our scenario is true, then they would be dark matter free very extended and massive "star clusters".
Directional virtual backbone based data aggregation scheme for Wireless Visual Sensor Networks.

PubMed

Zhang, Jing; Liu, Shi-Jian; Tsai, Pei-Wei; Zou, Fu-Min; Ji, Xiao-Rong

2018-01-01

Data gathering is a fundamental task in Wireless Visual Sensor Networks (WVSNs). Features of directional antennas and the visual data make WVSNs more complex than the conventional Wireless Sensor Network (WSN). The virtual backbone is a technique, which is capable of constructing clusters. The version associating with the aggregation operation is also referred to as the virtual backbone tree. In most of the existing literature, the main focus is on the efficiency brought by the construction of clusters that the existing methods neglect local-balance problems in general. To fill up this gap, Directional Virtual Backbone based Data Aggregation Scheme (DVBDAS) for the WVSNs is proposed in this paper. In addition, a measurement called the energy consumption density is proposed for evaluating the adequacy of results in the cluster-based construction problems. Moreover, the directional virtual backbone construction scheme is proposed by considering the local-balanced factor. Furthermore, the associated network coding mechanism is utilized to construct DVBDAS. Finally, both the theoretical analysis of the proposed DVBDAS and the simulations are given for evaluating the performance. The experimental results prove that the proposed DVBDAS achieves higher performance in terms of both the energy preservation and the network lifetime extension than the existing methods.
Bringing Clouds into Our Lab! - The Influence of Turbulence on the Early Stage Rain Droplets

NASA Astrophysics Data System (ADS)

Yavuz, Mehmet Altug; Kunnen, Rudie; Heijst, Gertjan; Clercx, Herman

2015-11-01

We are investigating a droplet-laden flow in an air-filled turbulence chamber, forced by speaker-driven air jets. The speakers are running in a random manner; yet they allow us to control and define the statistics of the turbulence. We study the motion of droplets with tunable size (Stokes numbers ~ 0.13 - 9) in a turbulent flow, mimicking the early stages of raindrop formation. 3D Particle Tracking Velocimetry (PTV) together with Laser Induced Fluorescence (LIF) methods are chosen as the experimental method to track the droplets and collect data for statistical analysis. Thereby it is possible to study the spatial distribution of the droplets in turbulence using the so-called Radial Distribution Function (RDF), a statistical measure to quantify the clustering of particles. Additionally, 3D-PTV technique allows us to measure velocity statistics of the droplets and the influence of the turbulence on droplet trajectories, both individually and collectively. In this contribution, we will present the clustering probability quantified by the RDF for different Stokes numbers. We will explain the physics underlying the influence of turbulence on droplet cluster behavior. This study supported by FOM/NWO Netherlands.
A Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization.

PubMed

He, Sheng; Samara, Petros; Burgers, Jan; Schomaker, Lambert

2016-11-01

It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. First, a novel descriptor, called histogram of orientations of handwritten strokes, is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the multi-label self-organizing map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. The experimental results on the medieval paleographic scale data set demonstrate that our method achieves state-of-the-art results.

A Look at the Law, Public Safety, Corrections & Security Cluster

ERIC Educational Resources Information Center

Coffee, Joseph N.

2008-01-01

A month after the 9/11 terrorist attack in 2001, an advisory group met in Little Rock, Arkansas, to begin the development of the Law, Public Safety, Corrections and Security (LPSCS) career cluster. At that time there were five pathways of what was then called the Law and Public Safety cluster--fire and emergency services, law enforcement,…
A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Multi-scale clustering by building a robust and self correcting ultrametric topology on data points.

PubMed

Fushing, Hsieh; Wang, Hui; Vanderwaal, Kimberly; McCowan, Brenda; Koehl, Patrice

2013-01-01

The advent of high-throughput technologies and the concurrent advances in information sciences have led to an explosion in size and complexity of the data sets collected in biological sciences. The biggest challenge today is to assimilate this wealth of information into a conceptual framework that will help us decipher biological functions. A large and complex collection of data, usually called a data cloud, naturally embeds multi-scale characteristics and features, generically termed geometry. Understanding this geometry is the foundation for extracting knowledge from data. We have developed a new methodology, called data cloud geometry-tree (DCG-tree), to resolve this challenge. This new procedure has two main features that are keys to its success. Firstly, it derives from the empirical similarity measurements a hierarchy of clustering configurations that captures the geometric structure of the data. This hierarchy is then transformed into an ultrametric space, which is then represented via an ultrametric tree or a Parisi matrix. Secondly, it has a built-in mechanism for self-correcting clustering membership across different tree levels. We have compared the trees generated with this new algorithm to equivalent trees derived with the standard Hierarchical Clustering method on simulated as well as real data clouds from fMRI brain connectivity studies, cancer genomics, giraffe social networks, and Lewis Carroll's Doublets network. In each of these cases, we have shown that the DCG trees are more robust and less sensitive to measurement errors, and that they provide a better quantification of the multi-scale geometric structures of the data. As such, DCG-tree is an effective tool for analyzing complex biological data sets.
Electron wavepacket dynamics in highly quasi-degenerate coupled electronic states: a theory for chemistry where the notion of adiabatic potential energy surface loses the sense.

PubMed

Yonehara, Takehiro; Takatsuka, Kazuo

2012-12-14

We develop a theory and the method of its application for chemical dynamics in systems, in which the adiabatic potential energy hyper-surfaces (PES) are densely quasi-degenerate to each other in a wide range of molecular geometry. Such adiabatic electronic states tend to couple each other through strong nonadiabatic interactions. Technically, therefore, it is often extremely hard to accurately single out the individual PES in those systems. Moreover, due to the mutual nonadiabatic couplings that may spread wide in space and due to the energy-time uncertainty relation, the notion of the isolated and well-defined potential energy surface should lose the sense. On the other hand, such dense electronic states should offer a very interesting molecular field in which chemical reactions to proceed in characteristic manners. However, to treat these systems, the standard theoretical framework of chemical reaction dynamics, which starts from the Born-Oppenheimer approximation and ends up with quantum nuclear wavepacket dynamics, is not very useful. We here explore this problem with our developed nonadiabatic electron wavepacket theory, which we call the phase-space averaging and natural branching (PSANB) method [T. Yonehara and K. Takatsuka, J. Chem. Phys. 129, 134109 (2008)], or branching-path representation, in which the packets are propagated in time along the non-Born-Oppenheimer branching paths. In this paper, after outlining the basic theory, we examine using a one-dimensional model how well the PSANB method works with such densely quasi-degenerate nonadiabatic systems. To do so, we compare the performance of PSANB with the full quantum mechanical results and those given by the fewest switches surface hopping (FSSH) method, which is known to be one of the most reliable and flexible methods to date. It turns out that the PSANB electron wavepacket approach actually yields very good results with far fewer initial sampling paths. Then we apply the electron wavepacket dynamics in path-branching representation and the so-called semiclassical Ehrenfest theory to a hydrogen molecule embedded in twelve membered boron cluster (B(12)) in excited states, which are densely quasi-degenerate due to the vacancy in 2p orbitals of boron atom [1s(2)2s(2)2p(1)]. Bond dissociation of the hydrogen molecule quickly takes place in the cluster and the resultant hydrogen atoms are squeezed out to the surface of the cluster. We further study collision dynamics between H(2) and B(12), which also gives interesting phenomena. The present study suggests an interesting functionality of the boron clusters.
Self-consistent clustering analysis: an efficient multiscale scheme for inelastic heterogeneous materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Z.; Bessa, M. A.; Liu, W.K.

A predictive computational theory is shown for modeling complex, hierarchical materials ranging from metal alloys to polymer nanocomposites. The theory can capture complex mechanisms such as plasticity and failure that span across multiple length scales. This general multiscale material modeling theory relies on sound principles of mathematics and mechanics, and a cutting-edge reduced order modeling method named self-consistent clustering analysis (SCA) [Zeliang Liu, M.A. Bessa, Wing Kam Liu, “Self-consistent clustering analysis: An efficient multi-scale scheme for inelastic heterogeneous materials,” Comput. Methods Appl. Mech. Engrg. 306 (2016) 319–341]. SCA reduces by several orders of magnitude the computational cost of micromechanical andmore » concurrent multiscale simulations, while retaining the microstructure information. This remarkable increase in efficiency is achieved with a data-driven clustering method. Computationally expensive operations are performed in the so-called offline stage, where degrees of freedom (DOFs) are agglomerated into clusters. The interaction tensor of these clusters is computed. In the online or predictive stage, the Lippmann-Schwinger integral equation is solved cluster-wise using a self-consistent scheme to ensure solution accuracy and avoid path dependence. To construct a concurrent multiscale model, this scheme is applied at each material point in a macroscale structure, replacing a conventional constitutive model with the average response computed from the microscale model using just the SCA online stage. A regularized damage theory is incorporated in the microscale that avoids the mesh and RVE size dependence that commonly plagues microscale damage calculations. The SCA method is illustrated with two cases: a carbon fiber reinforced polymer (CFRP) structure with the concurrent multiscale model and an application to fatigue prediction for additively manufactured metals. For the CFRP problem, a speed up estimated to be about 43,000 is achieved by using the SCA method, as opposed to FE2, enabling the solution of an otherwise computationally intractable problem. The second example uses a crystal plasticity constitutive law and computes the fatigue potency of extrinsic microscale features such as voids. This shows that local stress and strain are capture sufficiently well by SCA. This model has been incorporated in a process-structure-properties prediction framework for process design in additive manufacturing.« less
Potential motivational information encoded within humpback whale non-song vocal sounds.

PubMed

Dunlop, Rebecca A

2017-03-01

Acoustic signals in terrestrial animals follow motivational-structural rules to inform receivers of the signaler's motivational state, valence and level of arousal. Low-frequency "harsh" signals are produced in aggressive contexts, whereas high-frequency tonal sounds are produced in fearful/appeasement contexts. Using the non-song social call catalogue of humpback whales (Megaptera novaeangliae), this study tested for potential motivational-structural rules within the call catalogue of a baleen whale species. A total of 32 groups within different social contexts (ranging from stable, low arousal groups, such as a female with her calf, to affiliating, higher arousal, groups containing multiple males competing for access to the central female) were visually and acoustically tracked as they migrated southwards along the eastern coast of Australia. Social calls separated into four main cluster types, with signal structures in two categories consistent with "aggressive" signals and, "fearful/appeasement" signals in terrestrial animals. The group's use of signals within these clusters matched their context in that presumed low arousal non-affiliating groups almost exclusively used "low-arousal" signals (a cluster of low frequency unmodulated or upsweep sounds). Affiliating groups used a higher proportion of an intermediate cluster of signal types deemed "higher arousal" signals and groups containing three or more adults used a higher proportion of "aggressive" signal types.
The GSAM software: A global search algorithm of minima exploration for the investigation of low lying isomers of clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude

2015-01-22

The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, anmore » optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.« less
A cluster-based strategy for assessing the overlap between large chemical libraries and its application to a recent acquisition.

PubMed

Engels, Michael F M; Gibbs, Alan C; Jaeger, Edward P; Verbinnen, Danny; Lobanov, Victor S; Agrafiotis, Dimitris K

2006-01-01

We report on the structural comparison of the corporate collections of Johnson & Johnson Pharmaceutical Research & Development (JNJPRD) and 3-Dimensional Pharmaceuticals (3DP), performed in the context of the recent acquisition of 3DP by JNJPRD. The main objective of the study was to assess the druglikeness of the 3DP library and the extent to which it enriched the chemical diversity of the JNJPRD corporate collection. The two databases, at the time of acquisition, collectively contained more than 1.1 million compounds with a clearly defined structural description. The analysis was based on a clustering approach and aimed at providing an intuitive quantitative estimate and visual representation of this enrichment. A novel hierarchical clustering algorithm called divisive k-means was employed in combination with Kelley's cluster-level selection method to partition the combined data set into clusters, and the diversity contribution of each library was evaluated as a function of the relative occupancy of these clusters. Typical 3DP chemotypes enriching the diversity of the JNJPRD collection were catalogued and visualized using a modified maximum common substructure algorithm. The joint collection of JNJPRD and 3DP compounds was also compared to other databases of known medicinally active or druglike compounds. The potential of the methodology for the analysis of very large chemical databases is discussed.
Discharge-nitrate data clustering for characterizing surface-subsurface flow interaction and calibration of a hydrologic model

NASA Astrophysics Data System (ADS)

Shrestha, R. R.; Rode, M.

2008-12-01

Concentration of reactive chemicals has different chemical signatures in baseflow and surface runoff. Previous studies on nitrate export from a catchment indicate that the transport processes are driven by subsurface flow. Therefore nitrate signature can be used for understanding the event and pre-event contributions to streamflow and surface-subsurface flow interactions. The study uses flow and nitrate concentration time series data for understanding the relationship between these two variables. Unsupervised artificial neural network based learning method called self organizing map is used for the identification of clusters in the datasets. Based on the cluster results, five different pattern in the datasets are identified which correspond to (i) baseflow, (ii) subsurface flow increase, (iii) surface runoff increase, (iv) surface runoff recession, and (v) subsurface flow decrease regions. The cluster results in combination with a hydrologic model are used for discharge separation. For this purpose, a multi-objective optimization tool NSGA-II is used, where violation of cluster results is used as one of the objective functions. The results show that the use of cluster results as supplementary information for the calibration of a hydrologic model gives a plausible simulation of subsurface flow as well total runoff at the catchment outlet. The study is undertaken using data from the Weida catchment in the North-Eastern Germany, which is a sub-catchment of the Weisse Elster river in the Elbe river basin.
WINGS-SPE Spectroscopy in the WIde-field Nearby Galaxy-cluster Survey

NASA Astrophysics Data System (ADS)

Cava, A.; Bettoni, D.; Poggianti, B. M.; Couch, W. J.; Moles, M.; Varela, J.; Biviano, A.; D'Onofrio, M.; Dressler, A.; Fasano, G.; Fritz, J.; Kjærgaard, P.; Ramella, M.; Valentinuzzi, T.

2009-03-01

Aims: We present the results from a comprehensive spectroscopic survey of the WINGS (WIde-field Nearby Galaxy-cluster Survey) clusters, a program called WINGS-SPE. The WINGS-SPE sample consists of 48 clusters, 22 of which are in the southern sky and 26 in the north. The main goals of this spectroscopic survey are: (1) to study the dynamics and kinematics of the WINGS clusters and their constituent galaxies, (2) to explore the link between the spectral properties and the morphological evolution in different density environments and across a wide range of cluster X-ray luminosities and optical properties. Methods: Using multi-object fiber-fed spectrographs, we observed our sample of WINGS cluster galaxies at an intermediate resolution of 6-9 Å and, using a cross-correlation technique, we measured redshifts with a mean accuracy of ~45 km s-1. Results: We present redshift measurements for 6137 galaxies and their first analyses. Details of the spectroscopic observations are reported. The WINGS-SPE has ~30% overlap with previously published data sets, allowing us both to perform a complete comparison with the literature and to extend the catalogs. Conclusions: Using our redshifts, we calculate the velocity dispersion for all the clusters in the WINGS-SPE sample. We almost triple the number of member galaxies known in each cluster with respect to previous works. We also investigate the X-ray luminosity vs. velocity dispersion relation for our WINGS-SPE clusters, and find it to be consistent with the form Lx ∝ σ_v^4. Table 4, containing the complete redshift catalog, is only available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A/495/707
A cluster of Legionnaires' disease caused by Legionella longbeachae linked to potting compost in Scotland, 2008-2009.

PubMed

Pravinkumar, S J; Edwards, G; Lindsay, D; Redmond, S; Stirling, J; House, R; Kerr, J; Anderson, E; Breen, D; Blatchford, O; McDonald, E; Brown, A

2010-02-25

Three cases of Legionnaires disease caused by Legionella longbeachae Sg 1 associated with potting compost have been reported in Scotland between 2008 and 2009. The exact method of transmission is still not fully understood as Legionnaires disease is thought to be acquired by droplet inhalation. The linked cases associated with compost exposure call for an introduction of compost labelling, as is already in place in other countries where L. longbeachae outbreaks have been reported.
The Battlefield Environment Division Modeling Framework (BMF). Part 1: Optimizing the Atmospheric Boundary Layer Environment Model for Cluster Computing

DTIC Science & Technology

2014-02-01

idle waiting for the wavefront to reach it. To overcome this, Reeve et al. (2001) 3 developed a scheme in analogy to the red-black Gauss - Seidel iterative ...understandable procedure calls. Parallelization of the SIMPLE iterative scheme with SIP used a red-black scheme similar to the red-black Gauss - Seidel ...scheme, the SIMPLE method, for pressure-velocity coupling. The result is a slowing convergence of the outer iterations . The red-black scheme excites a 2
Simulating Electrophoresis with Discrete Charge and Drag

NASA Astrophysics Data System (ADS)

Mowitz, Aaron J.; Witten, Thomas A.

A charged asymmetric rigid cluster of colloidal particles in saline solution can respond in exotic ways to an electric field: it may spin or move transversely. These distinctive motions arise from the drag force of the neutralizing countercharge surrounding the cluster. Because of this drag, calculating the motion of arbitrary asymmetric objects with nonuniform charge is impractical by conventional methods. Here we present a new method of simulating electrophoresis, in which we replace the continuous object and the surrounding countercharge with discrete point-draggers, called Stokeslets. The balance of forces imposes a linear, self-consistent relation among the drag and Coulomb forces on the Stokeslets, which allows us to easily determine the object's motion via matrix inversion. By explicitly enforcing charge+countercharge neutrality, the simulation recovers the distinctive features of electrophoretic motion to few-percent accuracy using as few as 1000 Stokeslets. In particular, for uniformly charged objects, we observe the characteristic Smoluchowski independence of mobility on object size and shape. We then discuss electrophoretic motion of asymmetric objects, where our simulation method is particularly advantageous. This work is supported by a Grant from the US-Israel Binational Science Foundation.
Dialing in single-site reactivity of a supported calixarene-protected tetrairidium cluster catalyst† †Electronic supplementary information (ESI) available: Detailed characterization of Ir4 clusters, raw kinetic data, time scale analysis, experimental methods, and sample preparation. See DOI: 10.1039/c7sc00686a Click here for additional data file.

PubMed Central

Palermo, Andrew; Solovyov, Andrew; Ertler, Daniel

2017-01-01

A closed Ir4 carbonyl cluster, 1, comprising a tetrahedral metal frame and three sterically bulky tert-butyl-calix[4]arene(OPr)3(OCH2PPh2) (Ph = phenyl; Pr = propyl) ligands at the basal plane, was characterized with variable-temperature 13C NMR spectroscopy, which show the absence of scrambling of the CO ligands at temperatures up to 313 K. This demonstration of distinct sites for the CO ligands was found to extend to the reactivity and catalytic properties, as shown by selective decarbonylation in a reaction with trimethylamine N-oxide (TMAO) as an oxidant, which, reacting in the presence of ethylene, leads to the selective bonding of an ethyl ligand at the apical Ir site. These clusters were supported intact on porous silica and found to catalyze ethylene hydrogenation, and a comparison of the kinetics of the single-hydrogenation reaction and steady-state hydrogenation catalysis demonstrates a unique single-site catalyst—with each site having the same catalytic activity. Reaction orders in the catalytic ethylene hydrogenation reaction of approximately 1/2 and 0 for H2 and C2H4, respectively, nearly match those for conventional noble-metal catalysts. In contrast to oxidative decarbonylation, thermal desorption of CO from silica-supported cluster 1 occurred exclusively at the basal plane, giving rise to sites that do not react with ethylene and are catalytically inactive for ethylene hydrogenation. The evidence of distinctive sites on the cluster catalyst leads to a model that links to hydrogen-transfer catalysis on metals—involving some surface sites that bond to both hydrocarbon and hydrogen and are catalytically engaged (so-called “*” sites) and others, at the basal plane, which bond hydrogen and CO but not hydrocarbon and are reservoir sites (so-called “S” sites). PMID:28959418
I Can’t Believe It’s Not Toothpaste! Poison Control Center Calls Regarding Dental and Oral-Care Products

PubMed Central

Suchard, Jeffrey R.

2003-01-01

Background: A cluster of incidents in which non-tooth-paste products were used to brush teeth prompted a review of all calls to one Poison Control Center (PCC) regarding exposures to dental and oral-care products to determine if any resulted in significant toxicity. Methods: Retrospective review of 65,849 calls to one PCC during one calendar year. All inquiries about exposures to substances used as dental or oral-care products were analyzed by a single reviewer for reported adverse effects; including hospital admission or PCC referral for emergent medical evaluation. Results: 798 calls involved exposure to dental or oral-care products, comprising 1.21 % of all calls received. Toothbrushing incidents with non-toothpaste products (122 cases) did not result in any significant recognized toxicity. Twenty-four patients were either referred for emergent medical evaluation (14) or were admitted to the hospital (10). In 23 of these patients (96%), the toxic agent was either an over-the-counter analgesic or a local anesthetic used to treat dental pain. Conclusions: Among PCC calls received regarding dental and oral-care products, over-the-counter analgesics and local anesthetics used for dental pain resulted in the most frequent need for emergent medical evaluation or for hospital admission. PMID:20852712
Impact of a mHealth intervention for peer health workers on AIDS care in rural Uganda: a mixed methods evaluation of a cluster-randomized trial.

PubMed

Chang, Larry W; Kagaayi, Joseph; Arem, Hannah; Nakigozi, Gertrude; Ssempijja, Victor; Serwadda, David; Quinn, Thomas C; Gray, Ronald H; Bollinger, Robert C; Reynolds, Steven J

2011-11-01

Mobile phone access in low and middle-income countries is rapidly expanding and offers an opportunity to leverage limited human resources for health. We conducted a mixed methods evaluation of a cluster-randomized trial exploratory substudy on the impact of a mHealth (mobile phone) support intervention used by community-based peer health workers (PHW) on AIDS care in rural Uganda. 29 PHWs at 10 clinics were randomized by clinic to receive the intervention or not. PHWs used phones to call and text higher level providers with patient-specific clinical information. 970 patients cared for by the PHWs were followed over a 26 month period. No significant differences were found in patients' risk of virologic failure. Qualitative analyses found improvements in patient care and logistics and broad support for the mHealth intervention among patients, clinic staff, and PHWs. Key challenges identified included variable patient phone access, privacy concerns, and phone maintenance.
Development and application of QM/MM methods to study the solvation effects and surfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dibya, Pooja Arora

2010-01-01

Quantum mechanical (QM) calculations have the advantage of attaining high-level accuracy, however QM calculations become computationally inefficient as the size of the system grows. Solving complex molecular problems on large systems and ensembles by using quantum mechanics still poses a challenge in terms of the computational cost. Methods that are based on classical mechanics are an inexpensive alternative, but they lack accuracy. A good trade off between accuracy and efficiency is achieved by combining QM methods with molecular mechanics (MM) methods to use the robustness of the QM methods in terms of accuracy and the MM methods to minimize themore » computational cost. Two types of QM combined with MM (QM/MM) methods are the main focus of the present dissertation: the application and development of QM/MM methods for solvation studies and reactions on the Si(100) surface. The solvation studies were performed using a discreet solvation model that is largely based on first principles called the effective fragment potential method (EFP). The main idea of combining the EFP method with quantum mechanics is to accurately treat the solute-solvent and solvent-solvent interactions, such as electrostatic, polarization, dispersion and charge transfer, that are important in correctly calculating solvent effects on systems of interest. A second QM/MM method called SIMOMM (surface integrated molecular orbital molecular mechanics) is a hybrid QM/MM embedded cluster model that mimics the real surface.3 This method was employed to calculate the potential energy surfaces for reactions of atomic O on the Si(100) surface. The hybrid QM/MM method is a computationally inexpensive approach for studying reactions on larger surfaces in a reasonably accurate and efficient manner. This thesis is comprised of four chapters: Chapter 1 describes the general overview and motivation of the dissertation and gives a broad background of the computational methods that have been employed in this work. Chapter 2 illustrates the methodology of the interface of the EFP method with the configuration interaction with single excitations (CIS) method to study solvent effects in excited states. Chapter 3 discusses the study of the adiabatic electron affinity of the hydroxyl radical in aqueous solution and in micro-solvated clusters using a QM/EFP method. Chapter 4 describes the study of etching and diffusion of oxygen atom on a reconstructed Si(100)-2 x 1 surface using a hybrid QM/MM embedded cluster model (SIMOMM). Chapter 4 elucidates the application of the EFP method towards the understanding of the aqueous ionization potential of Na atom. Finally, a general conclusion of this dissertation work and prospective future direction are presented in Chapter 6.« less
HGDP and HapMap Analysis by Ancestry Mapper Reveals Local and Global Population Relationships

PubMed Central

Magalhães, Tiago R.; Casey, Jillian P.; Conroy, Judith; Regan, Regina; Fitzpatrick, Darren J.; Shah, Naisha; Sobral, João; Ennis, Sean

2012-01-01

Knowledge of human origins, migrations, and expansions is greatly enhanced by the availability of large datasets of genetic information from different populations and by the development of bioinformatic tools used to analyze the data. We present Ancestry Mapper, which we believe improves on existing methods, for the assignment of genetic ancestry to an individual and to study the relationships between local and global populations. The principle function of the method, named Ancestry Mapper, is to give each individual analyzed a genetic identifier, made up of just 51 genetic coordinates, that corresponds to its relationship to the HGDP reference population. As a consequence, the Ancestry Mapper Id (AMid) has intrinsic biological meaning and provides a tool to measure similarity between world populations. We applied Ancestry Mapper to a dataset comprised of the HGDP and HapMap data. The results show distinctions at the continental level, while simultaneously giving details at the population level. We clustered AMids of HGDP/HapMap and observe a recapitulation of human migrations: for a small number of clusters, individuals are grouped according to continental origins; for a larger number of clusters, regional and population distinctions are evident. Calculating distances between AMids allows us to infer ancestry. The number of coordinates is expandable, increasing the power of Ancestry Mapper. An R package called Ancestry Mapper is available to apply this method to any high density genomic data set. PMID:23189146
The Best of Both Worlds: Building on the COPUS and RTOP Observation Protocols to Easily and Reliably Measure Various Levels of Reformed Instructional Practice

PubMed Central

Lund, Travis J.; Pilarz, Matthew; Velasco, Jonathan B.; Chakraverty, Devasmita; Rosploch, Kaitlyn; Undersander, Molly; Stains, Marilyne

2015-01-01

Researchers, university administrators, and faculty members are increasingly interested in measuring and describing instructional practices provided in science, technology, engineering, and mathematics (STEM) courses at the college level. Specifically, there is keen interest in comparing instructional practices between courses, monitoring changes over time, and mapping observed practices to research-based teaching. While increasingly common observation protocols (Reformed Teaching Observation Protocol [RTOP] and Classroom Observation Protocol in Undergraduate STEM [COPUS]) at the postsecondary level help achieve some of these goals, they also suffer from weaknesses that limit their applicability. In this study, we leverage the strengths of these protocols to provide an easy method that enables the reliable and valid characterization of instructional practices. This method was developed empirically via a cluster analysis using observations of 269 individual class periods, corresponding to 73 different faculty members, 28 different research-intensive institutions, and various STEM disciplines. Ten clusters, called COPUS profiles, emerged from this analysis; they represent the most common types of instructional practices enacted in the classrooms observed for this study. RTOP scores were used to validate the alignment of the 10 COPUS profiles with reformed teaching. Herein, we present a detailed description of the cluster analysis method, the COPUS profiles, and the distribution of the COPUS profiles across various STEM courses at research-intensive universities. PMID:25976654
HGDP and HapMap analysis by Ancestry Mapper reveals local and global population relationships.

PubMed

Magalhães, Tiago R; Casey, Jillian P; Conroy, Judith; Regan, Regina; Fitzpatrick, Darren J; Shah, Naisha; Sobral, João; Ennis, Sean

2012-01-01

Knowledge of human origins, migrations, and expansions is greatly enhanced by the availability of large datasets of genetic information from different populations and by the development of bioinformatic tools used to analyze the data. We present Ancestry Mapper, which we believe improves on existing methods, for the assignment of genetic ancestry to an individual and to study the relationships between local and global populations. The principle function of the method, named Ancestry Mapper, is to give each individual analyzed a genetic identifier, made up of just 51 genetic coordinates, that corresponds to its relationship to the HGDP reference population. As a consequence, the Ancestry Mapper Id (AMid) has intrinsic biological meaning and provides a tool to measure similarity between world populations. We applied Ancestry Mapper to a dataset comprised of the HGDP and HapMap data. The results show distinctions at the continental level, while simultaneously giving details at the population level. We clustered AMids of HGDP/HapMap and observe a recapitulation of human migrations: for a small number of clusters, individuals are grouped according to continental origins; for a larger number of clusters, regional and population distinctions are evident. Calculating distances between AMids allows us to infer ancestry. The number of coordinates is expandable, increasing the power of Ancestry Mapper. An R package called Ancestry Mapper is available to apply this method to any high density genomic data set.

A Bayesian, generalized frailty model for comet assays.

PubMed

Ghebretinsae, Aklilu Habteab; Faes, Christel; Molenberghs, Geert; De Boeck, Marlies; Geys, Helena

2013-05-01

This paper proposes a flexible modeling approach for so-called comet assay data regularly encountered in preclinical research. While such data consist of non-Gaussian outcomes in a multilevel hierarchical structure, traditional analyses typically completely or partly ignore this hierarchical nature by summarizing measurements within a cluster. Non-Gaussian outcomes are often modeled using exponential family models. This is true not only for binary and count data, but also for, example, time-to-event outcomes. Two important reasons for extending this family are for (1) the possible occurrence of overdispersion, meaning that the variability in the data may not be adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of a hierarchical structure in the data, owing to clustering in the data. The first issue is dealt with through so-called overdispersion models. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. In the case of time-to-event data, one encounters, for example, the gamma frailty model (Duchateau and Janssen, 2007 ). While both of these issues may occur simultaneously, models combining both are uncommon. Molenberghs et al. ( 2010 ) proposed a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. Here, we use this method to model data from a comet assay with a three-level hierarchical structure. Although a conjugate gamma random effect is used for the overdispersion random effect, both gamma and normal random effects are considered for the hierarchical random effect. Apart from model formulation, we place emphasis on Bayesian estimation. Our proposed method has an upper hand over the traditional analysis in that it (1) uses the appropriate distribution stipulated in the literature; (2) deals with the complete hierarchical nature; and (3) uses all information instead of summary measures. The fit of the model to the comet assay is compared against the background of more conventional model fits. Results indicate the toxicity of 1,2-dimethylhydrazine dihydrochloride at different dose levels (low, medium, and high).
Metal-poor Type II Cepheids with Periods Less Than Three Days

NASA Astrophysics Data System (ADS)

Kovtyukh, V.; Wallerstein, G.; Yegorova, I.; Andrievsky, S.; Korotin, S.; Saviane, I.; Belik, S.; Davis, C. E.; Farrell, E. M.

2018-05-01

We have analyzed 10 high-resolution spectra of Type II Cepheids with periods less than 3 days. We find that they clearly separate into two groups: those with near or slightly below solar metallicities, and those with [Fe/H] between ‑1.5 and ‑2.0. While the former are usually called BL Her stars, we suggest that the latter be called UY Eri stars. The UY Eri subclass appears to be similar to the short period variables in globular clusters of the Galactic Halo. Globular clusters with [Fe/H] > ‑1.0 almost never have Type II Cepheids.
Critical thinking in higher education: The influence of teaching styles and peer collaboration on science and math learning

NASA Astrophysics Data System (ADS)

Quitadamo, Ian Joseph

Many higher education faculty perceive a deficiency in students' ability to reason, evaluate, and make informed judgments, skills that are deemed necessary for academic and job success in science and math. These skills, often collected within a domain called critical thinking (CT), have been studied and are thought to be influenced by teaching styles (the combination of beliefs, behavior, and attitudes used when teaching) and small group collaborative learning (SGCL). However, no existing studies show teaching styles and SGCL cause changes in student CT performance. This study determined how combinations of teaching styles called clusters and peer-facilitated SGCL (a specific form of SGCL) affect changes in undergraduate student CT performance using a quasi-experimental pre-test/post-test research design and valid and reliable CT performance indicators. Quantitative analyses of three teaching style cluster models (Grasha's cluster model, a weighted cluster model, and a student-centered/teacher-centered cluster model) and peer-facilitated SGCL were performed to evaluate their ability to cause measurable changes in student CT skills. Based on results that indicated weighted teaching style clusters and peer-facilitated SGCL are associated with significant changes in student CT, we conclude that teaching styles and peer-facilitated SGCL influence the development of undergraduate CT in higher education science and math.
Integrated management of thesis using clustering method

NASA Astrophysics Data System (ADS)

Astuti, Indah Fitri; Cahyadi, Dedy

2017-02-01

Thesis is one of major requirements for student in pursuing their bachelor degree. In fact, finishing the thesis involves a long process including consultation, writing manuscript, conducting the chosen method, seminar scheduling, searching for references, and appraisal process by the board of mentors and examiners. Unfortunately, most of students find it hard to match all the lecturers' free time to sit together in a seminar room in order to examine the thesis. Therefore, seminar scheduling process should be on the top of priority to be solved. Manual mechanism for this task no longer fulfills the need. People in campus including students, staffs, and lecturers demand a system in which all the stakeholders can interact each other and manage the thesis process without conflicting their timetable. A branch of computer science named Management Information System (MIS) could be a breakthrough in dealing with thesis management. This research conduct a method called clustering to distinguish certain categories using mathematics formulas. A system then be developed along with the method to create a well-managed tool in providing some main facilities such as seminar scheduling, consultation and review process, thesis approval, assessment process, and also a reliable database of thesis. The database plays an important role in present and future purposes.
ClueNet: Clustering a temporal network based on topological similarity rather than denseness.

PubMed

Crawford, Joseph; Milenković, Tijana

2018-01-01

Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of "topologically related" nodes, where the resulting topology-based clusters are expected to "correlate" well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data-their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.
Coarse analysis of collective behaviors: Bifurcation analysis of the optimal velocity model for traffic jam formation

NASA Astrophysics Data System (ADS)

Miura, Yasunari; Sugiyama, Yuki

2017-12-01

We present a general method for analyzing macroscopic collective phenomena observed in many-body systems. For this purpose, we employ diffusion maps, which are one of the dimensionality-reduction techniques, and systematically define a few relevant coarse-grained variables for describing macroscopic phenomena. The time evolution of macroscopic behavior is described as a trajectory in the low-dimensional space constructed by these coarse variables. We apply this method to the analysis of the traffic model, called the optimal velocity model, and reveal a bifurcation structure, which features a transition to the emergence of a moving cluster as a traffic jam.
A local search for a graph clustering problem

NASA Astrophysics Data System (ADS)

Navrotskaya, Anna; Il'ev, Victor

2016-10-01

In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.
The Productivity Analysis of Chennai Automotive Industry Cluster

NASA Astrophysics Data System (ADS)

Bhaskaran, E.

2014-07-01

Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.
Time-reversibility and particle sedimentation

NASA Technical Reports Server (NTRS)

Golubitsky, Martin; Krupa, Martin; Lim, Chjan

1991-01-01

This paper studies an ODE model, called the Stokeslet model, and describes sedimentation of small clusters of particles in a highly viscous fluid. This model has a trivial solution in which the n particles arrange themselves at the vertices of a regular n-sided polygon. When n = 3, Hocking and Caflisch et al. (1988) proved the existence of periodic motion (in the frame moving with the center of gravity in the cluster) in which the particles form an isosceles triangle. Here, the study of periodic and quasi-periodic solutions of the Stokeslet model is continued, with emphasis on the spatial and time-reversal symmetry of the model. For three particles, the existence of a second family of periodic solutions and a family of quasi-periodic solutions is proved. It is also indicated how the methods generalize to the case of n particles.
A Structure-Based Distance Metric for High-Dimensional Space Exploration with Multi-Dimensional Scaling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Hyun Jung; McDonnell, Kevin T.; Zelenyuk, Alla

2014-03-01

Although the Euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging inter-cluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multi-dimensional scaling (MDS) where one can often observe non-intuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly inmore » high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our MDS plots also exhibit similar visual relationships as the method of parallel coordinates which is often used alongside to visualize the high-dimensional data in raw form. We then cast our metric into a bi-scale framework which distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate Euclidean distance.« less
Revisiting the variation of clustering coefficient of biological networks suggests new modular structure.

PubMed

Hao, Dapeng; Ren, Cong; Li, Chuanxing

2012-05-01

A central idea in biology is the hierarchical organization of cellular processes. A commonly used method to identify the hierarchical modular organization of network relies on detecting a global signature known as variation of clustering coefficient (so-called modularity scaling). Although several studies have suggested other possible origins of this signature, it is still widely used nowadays to identify hierarchical modularity, especially in the analysis of biological networks. Therefore, a further and systematical investigation of this signature for different types of biological networks is necessary. We analyzed a variety of biological networks and found that the commonly used signature of hierarchical modularity is actually the reflection of spoke-like topology, suggesting a different view of network architecture. We proved that the existence of super-hubs is the origin that the clustering coefficient of a node follows a particular scaling law with degree k in metabolic networks. To study the modularity of biological networks, we systematically investigated the relationship between repulsion of hubs and variation of clustering coefficient. We provided direct evidences for repulsion between hubs being the underlying origin of the variation of clustering coefficient, and found that for biological networks having no anti-correlation between hubs, such as gene co-expression network, the clustering coefficient doesn't show dependence of degree. Here we have shown that the variation of clustering coefficient is neither sufficient nor exclusive for a network to be hierarchical. Our results suggest the existence of spoke-like modules as opposed to "deterministic model" of hierarchical modularity, and suggest the need to reconsider the organizational principle of biological hierarchy.
Mutual information estimation reveals global associations between stimuli and biological processes

PubMed Central

Suzuki, Taiji; Sugiyama, Masashi; Kanamori, Takafumi; Sese, Jun

2009-01-01

Background Although microarray gene expression analysis has become popular, it remains difficult to interpret the biological changes caused by stimuli or variation of conditions. Clustering of genes and associating each group with biological functions are often used methods. However, such methods only detect partial changes within cell processes. Herein, we propose a method for discovering global changes within a cell by associating observed conditions of gene expression with gene functions. Results To elucidate the association, we introduce a novel feature selection method called Least-Squares Mutual Information (LSMI), which computes mutual information without density estimaion, and therefore LSMI can detect nonlinear associations within a cell. We demonstrate the effectiveness of LSMI through comparison with existing methods. The results of the application to yeast microarray datasets reveal that non-natural stimuli affect various biological processes, whereas others are no significant relation to specific cell processes. Furthermore, we discover that biological processes can be categorized into four types according to the responses of various stimuli: DNA/RNA metabolism, gene expression, protein metabolism, and protein localization. Conclusion We proposed a novel feature selection method called LSMI, and applied LSMI to mining the association between conditions of yeast and biological processes through microarray datasets. In fact, LSMI allows us to elucidate the global organization of cellular process control. PMID:19208155
Blind source computer device identification from recorded VoIP calls for forensic investigation.

PubMed

Jahanirad, Mehdi; Anuar, Nor Badrul; Wahab, Ainuddin Wahid Abdul

2017-03-01

The VoIP services provide fertile ground for criminal activity, thus identifying the transmitting computer devices from recorded VoIP call may help the forensic investigator to reveal useful information. It also proves the authenticity of the call recording submitted to the court as evidence. This paper extended the previous study on the use of recorded VoIP call for blind source computer device identification. Although initial results were promising but theoretical reasoning for this is yet to be found. The study suggested computing entropy of mel-frequency cepstrum coefficients (entropy-MFCC) from near-silent segments as an intrinsic feature set that captures the device response function due to the tolerances in the electronic components of individual computer devices. By applying the supervised learning techniques of naïve Bayesian, linear logistic regression, neural networks and support vector machines to the entropy-MFCC features, state-of-the-art identification accuracy of near 99.9% has been achieved on different sets of computer devices for both call recording and microphone recording scenarios. Furthermore, unsupervised learning techniques, including simple k-means, expectation-maximization and density-based spatial clustering of applications with noise (DBSCAN) provided promising results for call recording dataset by assigning the majority of instances to their correct clusters. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Quantum cluster theory for the polarizable continuum model. I. The CCSD level with analytical first and second derivatives.

PubMed

Cammi, R

2009-10-28

We present a general formulation of the coupled-cluster (CC) theory for a molecular solute described within the framework of the polarizable continuum model (PCM). The PCM-CC theory is derived in its complete form, called PTDE scheme, in which the correlated electronic density is used to have a self-consistent reaction field, and in an approximate form, called PTE scheme, in which the PCM-CC equations are solved assuming the fixed Hartree-Fock solvent reaction field. Explicit forms for the PCM-CC-PTDE equations are derived at the single and double (CCSD) excitation level of the cluster operator. At the same level, explicit equations for the analytical first derivatives of the PCM basic energy functional are presented, and analytical second derivatives are also discussed. The corresponding PCM-CCSD-PTE equations are given as a special case of the full theory.
Probing high-redshift clusters with HST/ACS gravitational weak-lensing and Chandra x-ray observations

NASA Astrophysics Data System (ADS)

Jee, Myungkook James

2006-06-01

Clusters of galaxies, the largest gravitationally bound objects in the Universe, are useful tracers of cosmic evolution, and particularly detailed studies of still-forming clusters at high-redshifts can considerably enhance our understanding of the structure formation. We use two powerful methods that have become recently available for the study of these distant clusters: spaced- based gravitational weak-lensing and high-resolution X-ray observations. Detailed analyses of five high-redshift (0.8 < z < 1.3) clusters are presented based on the deep Advanced Camera for Surveys (ACS) and Chandra X-ray images. We show that, when the instrumental characteristics are properly understood, the newly installed ACS on the Hubble Space Telescope (HST) can detect subtle shape distortions of background galaxies down to the limiting magnitudes of the observations, which enables the mapping of the cluster dark matter in unprecedented high-resolution. The cluster masses derived from this HST /ACS weak-lensing study have been compared with those from the re-analyses of the archival Chandra X-ray data. We find that there are interesting offsets between the cluster galaxy, intracluster medium (ICM), and dark matter centroids, and possible scenarios are discussed. If the offset is confirmed to be uniquitous in other clusters, the explanation may necessitate major refinements in our current understanding of the nature of dark matter, as well as the cluster galaxy dynamics. CL0848+4452, the highest-redshift ( z = 1.27) cluster yet detected in weak-lensing, has a significant discrepancy between the weak- lensing and X-ray masses. If this trend is found to be severe and common also for other X-ray weak clusters at redshifts beyond the unity, the conventional X-ray determination of cluster mass functions, often inferred from their immediate X-ray properties such as the X-ray luminosity and temperature via the so-called mass-luminosity (M-L) and mass-temperature (M-T) relations, will become highly unstable in this redshift regime. Therefore, the relatively unbiased weak-lensing measurements of the cluster mass properties can be used to adequately calibrate the scaling relations in future high-redshift cluster investigations.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering

PubMed Central

Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

2012-01-01

Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
Integrating participatory community mobilization processes to improve dengue prevention: an eco-bio-social scaling up of local success in Machala, Ecuador

PubMed Central

Mitchell-Foster, Kendra; Ayala, Efraín Beltrán; Breilh, Jaime; Spiegel, Jerry; Wilches, Ana Arichabala; Leon, Tania Ordóñez; Delgado, Jefferson Adrian

2015-01-01

Background This project investigates the effectiveness and feasibility of scaling-up an eco-bio-social approach for implementing an integrated community-based approach for dengue prevention in comparison with existing insecticide-based and emerging biolarvicide-based programs in an endemic setting in Machala, Ecuador. Methods An integrated intervention strategy (IIS) for dengue prevention (an elementary school-based dengue education program, and clean patio and safe container program) was implemented in 10 intervention clusters from November 2012 to November 2013 using a randomized controlled cluster trial design (20 clusters: 10 intervention, 10 control; 100 households per cluster with 1986 total households). Current existing dengue prevention programs served as the control treatment in comparison clusters. Pupa per person index (PPI) is used as the main outcome measure. Particular attention was paid to social mobilization and empowerment with IIS. Results Overall, IIS was successful in reducing PPI levels in intervention communities versus control clusters, with intervention clusters in the six paired clusters that followed the study design experiencing a greater reduction of PPI compared to controls (2.2 OR, 95% CI: 1.2 to 4.7). Analysis of individual cases demonstrates that consideration for contexualizing programs and strategies to local neighborhoods can be very effective in reducing PPI for dengue transmission risk reduction. Conclusions In the rapidly evolving political climate for dengue control in Ecuador, integration of successful social mobilization and empowerment strategies with existing and emerging biolarvicide-based government dengue prevention and control programs is promising in reducing PPI and dengue transmission risk in southern coastal communities like Machala. However, more profound analysis of social determination of health is called for to assess sustainability prospects. PMID:25604763
An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data.

PubMed

Vu, Trung N; Valkenborg, Dirk; Smets, Koen; Verwaest, Kim A; Dommisse, Roger; Lemière, Filip; Verschoren, Alain; Goethals, Bart; Laukens, Kris

2011-10-20

Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.
Understanding Teacher Users of a Digital Library Service: A Clustering Approach

ERIC Educational Resources Information Center

Xu, Beijie

2011-01-01

This research examined teachers' online behaviors while using a digital library service--the Instructional Architect (IA)--through three consecutive studies. In the first two studies, a statistical model called latent class analysis (LCA) was applied to cluster different groups of IA teachers according to their diverse online behaviors. The third…
Enabling Comprehension of Patient Subgroups and Characteristics in Large Bipartite Networks: Implications for Precision Medicine

PubMed Central

Bhavnani, Suresh K.; Chen, Tianlong; Ayyaswamy, Archana; Visweswaran, Shyam; Bellala, Gowtham; Rohit, Divekar; Kevin E., Bassler

2017-01-01

A primary goal of precision medicine is to identify patient subgroups based on their characteristics (e.g., comorbidities or genes) with the goal of designing more targeted interventions. While network visualization methods such as Fruchterman-Reingold have been used to successfully identify such patient subgroups in small to medium sized data sets, they often fail to reveal comprehensible visual patterns in large and dense networks despite having significant clustering. We therefore developed an algorithm called ExplodeLayout, which exploits the existence of significant clusters in bipartite networks to automatically “explode” a traditional network layout with the goal of separating overlapping clusters, while at the same time preserving key network topological properties that are critical for the comprehension of patient subgroups. We demonstrate the utility of ExplodeLayout by visualizing a large dataset extracted from Medicare consisting of readmitted hip-fracture patients and their comorbidities, demonstrate its statistically significant improvement over a traditional layout algorithm, and discuss how the resulting network visualization enabled clinicians to infer mechanisms precipitating hospital readmission in specific patient subgroups. PMID:28815099

Characterization of polyploid wheat genomic diversity using a high-density 90 000 single nucleotide polymorphism array

PubMed Central

Wang, Shichen; Wong, Debbie; Forrest, Kerrie; Allen, Alexandra; Chao, Shiaoman; Huang, Bevan E; Maccaferri, Marco; Salvi, Silvio; Milner, Sara G; Cattivelli, Luigi; Mastrangelo, Anna M; Whan, Alex; Stephen, Stuart; Barker, Gary; Wieseke, Ralf; Plieske, Joerg; International Wheat Genome Sequencing Consortium; Lillemo, Morten; Mather, Diane; Appels, Rudi; Dolferus, Rudy; Brown-Guedira, Gina; Korol, Abraham; Akhunova, Alina R; Feuillet, Catherine; Salse, Jerome; Morgante, Michele; Pozniak, Curtis; Luo, Ming-Cheng; Dvorak, Jan; Morell, Matthew; Dubcovsky, Jorge; Ganal, Martin; Tuberosa, Roberto; Lawley, Cindy; Mikoulitch, Ivan; Cavanagh, Colin; Edwards, Keith J; Hayden, Matthew; Akhunov, Eduard

2014-01-01

High-density single nucleotide polymorphism (SNP) genotyping arrays are a powerful tool for studying genomic patterns of diversity, inferring ancestral relationships between individuals in populations and studying marker–trait associations in mapping experiments. We developed a genotyping array including about 90 000 gene-associated SNPs and used it to characterize genetic variation in allohexaploid and allotetraploid wheat populations. The array includes a significant fraction of common genome-wide distributed SNPs that are represented in populations of diverse geographical origin. We used density-based spatial clustering algorithms to enable high-throughput genotype calling in complex data sets obtained for polyploid wheat. We show that these model-free clustering algorithms provide accurate genotype calling in the presence of multiple clusters including clusters with low signal intensity resulting from significant sequence divergence at the target SNP site or gene deletions. Assays that detect low-intensity clusters can provide insight into the distribution of presence–absence variation (PAV) in wheat populations. A total of 46 977 SNPs from the wheat 90K array were genetically mapped using a combination of eight mapping populations. The developed array and cluster identification algorithms provide an opportunity to infer detailed haplotype structure in polyploid wheat and will serve as an invaluable resource for diversity studies and investigating the genetic basis of trait variation in wheat. PMID:24646323
Quantitative estimation of time-variable earthquake hazard by using fuzzy set theory

NASA Astrophysics Data System (ADS)

Deyi, Feng; Ichikawa, M.

1989-11-01

In this paper, the various methods of fuzzy set theory, called fuzzy mathematics, have been applied to the quantitative estimation of the time-variable earthquake hazard. The results obtained consist of the following. (1) Quantitative estimation of the earthquake hazard on the basis of seismicity data. By using some methods of fuzzy mathematics, seismicity patterns before large earthquakes can be studied more clearly and more quantitatively, highly active periods in a given region and quiet periods of seismic activity before large earthquakes can be recognized, similarities in temporal variation of seismic activity and seismic gaps can be examined and, on the other hand, the time-variable earthquake hazard can be assessed directly on the basis of a series of statistical indices of seismicity. Two methods of fuzzy clustering analysis, the method of fuzzy similarity, and the direct method of fuzzy pattern recognition, have been studied is particular. One method of fuzzy clustering analysis is based on fuzzy netting, and another is based on the fuzzy equivalent relation. (2) Quantitative estimation of the earthquake hazard on the basis of observational data for different precursors. The direct method of fuzzy pattern recognition has been applied to research on earthquake precursors of different kinds. On the basis of the temporal and spatial characteristics of recognized precursors, earthquake hazards in different terms can be estimated. This paper mainly deals with medium-short-term precursors observed in Japan and China.
Sensitive and specific peak detection for SELDI-TOF mass spectrometry using a wavelet/neural-network based approach.

PubMed

Emanuele, Vincent A; Panicker, Gitika; Gurbaxani, Brian M; Lin, Jin-Mann S; Unger, Elizabeth R

2012-01-01

SELDI-TOF mass spectrometer's compact size and automated, high throughput design have been attractive to clinical researchers, and the platform has seen steady-use in biomarker studies. Despite new algorithms and preprocessing pipelines that have been developed to address reproducibility issues, visual inspection of the results of SELDI spectra preprocessing by the best algorithms still shows miscalled peaks and systematic sources of error. This suggests that there continues to be problems with SELDI preprocessing. In this work, we study the preprocessing of SELDI in detail and introduce improvements. While many algorithms, including the vendor supplied software, can identify peak clusters of specific mass (or m/z) in groups of spectra with high specificity and low false discover rate (FDR), the algorithms tend to underperform estimating the exact prevalence and intensity of peaks in those clusters. Thus group differences that at first appear very strong are shown, after careful and laborious hand inspection of the spectra, to be less than significant. Here we introduce a wavelet/neural network based algorithm which mimics what a team of expert, human users would call for peaks in each of several hundred spectra in a typical SELDI clinical study. The wavelet denoising part of the algorithm optimally smoothes the signal in each spectrum according to an improved suite of signal processing algorithms previously reported (the LibSELDI toolbox under development). The neural network part of the algorithm combines those results with the raw signal and a training dataset of expertly called peaks, to call peaks in a test set of spectra with approximately 95% accuracy. The new method was applied to data collected from a study of cervical mucus for the early detection of cervical cancer in HPV infected women. The method shows promise in addressing the ongoing SELDI reproducibility issues.
Applications of conformal field theory to problems in 2D percolation

NASA Astrophysics Data System (ADS)

Simmons, Jacob Joseph Harris

This thesis explores critical two-dimensional percolation in bounded regions in the continuum limit. The main method which we employ is conformal field theory (CFT). Our specific results follow from the null-vector structure of the c = 0 CFT that applies to critical two-dimensional percolation. We also make use of the duality symmetry obeyed at the percolation point, and the fact that percolation may be understood as the q-state Potts model in the limit q → 1. Our first results describe the correlations between points in the bulk and boundary intervals or points, i.e. the probability that the various points or intervals are in the same percolation cluster. These quantities correspond to order-parameter profiles under the given conditions, or cluster connection probabilities. We consider two specific cases: an anchoring interval, and two anchoring points. We derive results for these and related geometries using the CFT null-vectors for the corresponding boundary condition changing (bcc) operators. In addition, we exhibit several exact relationships between these probabilities. These relations between the various bulk-boundary connection probabilities involve parameters of the CFT called operator product expansion (OPE) coefficients. We then compute several of these OPE coefficients, including those arising in our new probability relations. Beginning with the familiar CFT operator φ1,2, which corresponds to a free-fixed spin boundary change in the q-state Potts model, we then develop physical interpretations of the bcc operators. We argue that, when properly normalized, higher-order bcc operators correspond to successive fusions of multiple φ1,2, operators. Finally, by identifying the derivative of φ1,2 with the operator φ1,4, we derive several new quantities called first crossing densities. These new results are then combined and integrated to obtain the three previously known crossing quantities in a rectangle: the probability of a horizontal crossing cluster, the probability of a cluster crossing both horizontally and vertically, and the expected number of horizontal crossing clusters. These three results were known to be solutions to a certain fifth-order differential equation, but until now no physically meaningful explanation had appeared. This differential equation arises naturally in our derivation.
Determining the size dependence of structural properties of clusters

NASA Astrophysics Data System (ADS)

Dong, Yi; Springborg, Michael

2012-12-01

Problems related to the determination of the structure of the global total-energy minimum for clusters are discussed through three examples. For isolated gold clusters it is shown that low-symmetry structures result due to covalent bonding. Subsequently, SiNGeN and (HAlO)N clusters are treated for which the occurrence of so called homotops leads to additional computational complexity. For the former it is found that the structures are not directly related to those of the pure monatomic clusters, and for the latter the results are shown to be in agreement with available experimental information on nanostructured HAlO. In order to illustrate and analyze the results, various descriptors are introduced and applied.
Structural and Functional Analyses of the Proteins Involved in the Iron-Sulfur Cluster Biosynthesis

NASA Astrophysics Data System (ADS)

Wada, Kei

The iron-sulfur (Fe-S) clusters are ubiquitous prosthetic groups that are required to maintain such fundamental life processes as respiratory chain, photosynthesis and the regulation of gene expression. Assembly of intracellular Fe-S cluster requires the sophisticated biosynthetic systems called ISC and SUF machineries. To shed light on the molecular mechanism of Fe-S cluster assembly mediated by SUF machinery, several structures of the SUF components and their sub-complex were determined. The structural findings together with biochemical characterization of the core-complex (SufB-SufC-SufD complex) have led me to propose a working model for the cluster biosynthesis in the SUF machinery.
Merged or monolithic? Using machine-learning to reconstruct the dynamical history of simulated star clusters

NASA Astrophysics Data System (ADS)

Pasquato, Mario; Chung, Chul

2016-05-01

Context. Machine-learning (ML) solves problems by learning patterns from data with limited or no human guidance. In astronomy, ML is mainly applied to large observational datasets, e.g. for morphological galaxy classification. Aims: We apply ML to gravitational N-body simulations of star clusters that are either formed by merging two progenitors or evolved in isolation, planning to later identify globular clusters (GCs) that may have a history of merging from observational data. Methods: We create mock-observations from simulated GCs, from which we measure a set of parameters (also called features in the machine-learning field). After carrying out dimensionality reduction on the feature space, the resulting datapoints are fed in to various classification algorithms. Using repeated random subsampling validation, we check whether the groups identified by the algorithms correspond to the underlying physical distinction between mergers and monolithically evolved simulations. Results: The three algorithms we considered (C5.0 trees, k-nearest neighbour, and support-vector machines) all achieve a test misclassification rate of about 10% without parameter tuning, with support-vector machines slightly outperforming the others. The first principal component of feature space correlates with cluster concentration. If we exclude it from the regression, the performance of the algorithms is only slightly reduced.
Applying the Transtheoretical Model to evaluate the effect of a call-recall program in enhancing Pap smear practice: a cluster randomized trial.

PubMed

Abdullah, Fauziah; Su, Tin Tin

2013-01-01

The objective of this study was to evaluate the effect of a call-recall approach in enhancing Pap smear practice by changes of motivation stage among non-compliant women. A cluster randomized controlled trial with parallel and un-blinded design was conducted between January and November 2010 in 40 public secondary schools in Malaysia among 403 female teachers who never or infrequently attended for a Pap test. A cluster randomization was applied in assigning schools to both groups. An intervention group received an invitation and reminder (call-recall program) for a Pap test (20 schools with 201 participants), while the control group received usual care from the existing cervical screening program (20 schools with 202 participants). Multivariate logistic regression was performed to determine the effect of the intervention program on the action stage (Pap smear uptake) at 24 weeks. In both groups, pre-contemplation stage was found as the highest proportion of changes in stages. At 24 weeks, an intervention group showed two times more in the action stage than control group (adjusted odds ratio 2.44, 95% CI 1.29-4.62). The positive effect of a call-recall approach in motivating women to change the behavior of screening practice should be appreciated by policy makers and health care providers in developing countries as an intervention to enhance Pap smear uptake. Copyright © 2013 Elsevier Inc. All rights reserved.
A comparative meta-analysis of QTL between intraspecific Gossypium hirsutum interspecific populations and Gossypium hirsutum x Gossypium barbadense populations

USDA-ARS?s Scientific Manuscript database

Recent Meta-analysis of quantitative trait loci (QTL) in tetraploid cotton (Gossypium spp.) has identified regions of the genome with high concentrations of various trait QTL called clusters, and specific trait QTL called hotspots. The Meta-analysis included all population types of Gossypium mixing ...
Archimedean Synthesis and Magic Numbers: "Sizing" Giant Molybdenum-Oxide-Based Molecular Spheres of the Keplerate Type.

PubMed

Müller; Sarkar; Shah; Bögge; Schmidtmann; Kögerler; Hauptfleisch; Trautwein; Schünemann

1999-11-02

Pythagorean harmony can be found in the spherical polyoxometalate clusters described here (see illustration for an example of a structure), since there are interesting relationships between the so-called magic numbers (12, 32, 42, 72, 132) relevant for spherical viruses and the number of the building blocks in the cluster. The size of these Keplerate clusters can be tailored by varying the type of connections between the pentagons by means of different spacers.
Unified method of knowledge representation in the evolutionary artificial intelligence systems

NASA Astrophysics Data System (ADS)

Bykov, Nickolay M.; Bykova, Katherina N.

2003-03-01

The evolution of artificial intelligence systems called by complicating of their operation topics and science perfecting has resulted in a diversification of the methods both the algorithms of knowledge representation and usage in these systems. Often by this reason it is very difficult to design the effective methods of knowledge discovering and operation for such systems. In the given activity the authors offer a method of unitized representation of the systems knowledge about objects of an external world by rank transformation of their descriptions, made in the different features spaces: deterministic, probabilistic, fuzzy and other. The proof of a sufficiency of the information about the rank configuration of the object states in the features space for decision making is presented. It is shown that the geometrical and combinatorial model of the rank configurations set introduce their by group of some system of incidence, that allows to store the information on them in a convolute kind. The method of the rank configuration description by the DRP - code (distance rank preserving code) is offered. The problems of its completeness, information capacity, noise immunity and privacy are reviewed. It is shown, that the capacity of a transmission channel for such submission of the information is more than unit, as the code words contain the information both about the object states, and about the distance ranks between them. The effective algorithm of the data clustering for the object states identification, founded on the given code usage, is described. The knowledge representation with the help of the rank configurations allows to unitize and to simplify algorithms of the decision making by fulfillment of logic operations above the DRP - code words. Examples of the proposed clustering techniques operation on the given samples set, the rank configuration of resulted clusters and its DRP-codes are presented.
The Effects of Cluster-Based Mentoring Programme on Classroom Teaching Practices: Lessons from Pakistan

ERIC Educational Resources Information Center

Rizvi, Meher; Nagy, Philip

2016-01-01

This paper presents and evaluates a teacher training approach called the cluster-based mentoring programme (CBMP) for the professional development of government primary school teachers in Pakistan. The study sought to find differences in the teaching practices between districts where the CBMP was used (intervention) and control districts where it…
Understanding Teacher Users of a Digital Library Service: A Clustering Approach

ERIC Educational Resources Information Center

Xu, Beijie; Recker, Mimi

2011-01-01

This article describes the Knowledge Discovery and Data Mining (KDD) process and its application in the field of educational data mining (EDM) in the context of a digital library service called the Instructional Architect (IA.usu.edu). In particular, the study reported in this article investigated a certain type of data mining problem, clustering,…
Identification of genes induced in proteoid roots of white lupin under nitrogen and phosphorus deprivation, with functional characterization of a formamidase

USDA-ARS?s Scientific Manuscript database

White lupin (Lupinus albus L.) is considered a model system for understanding plant acclimation to nutrient deficiency. It acclimates to phosphorus (P) and iron (Fe) deficiency by the development of short, densely clustered lateral roots called proteoid (or cluster) roots; proteoid-root development ...
Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.

PubMed

Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J

2017-12-01

Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the limitations, trade-offs, and considerations for the sensitivities of variables and the biological interpretations of results. The Cluster app is available as a free download for Apple computers at iTunes, with a link to a user guide website.
Soft context clustering for F0 modeling in HMM-based speech synthesis

NASA Astrophysics Data System (ADS)

Khorram, Soheil; Sameti, Hossein; King, Simon

2015-12-01

This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.
Selection of key ambient particulate variables for epidemiological studies - applying cluster and heatmap analyses as tools for data reduction.

PubMed

Gu, Jianwei; Pitz, Mike; Breitner, Susanne; Birmili, Wolfram; von Klot, Stephanie; Schneider, Alexandra; Soentgen, Jens; Reller, Armin; Peters, Annette; Cyrys, Josef

2012-10-01

The success of epidemiological studies depends on the use of appropriate exposure variables. The purpose of this study is to extract a relatively small selection of variables characterizing ambient particulate matter from a large measurement data set. The original data set comprised a total of 96 particulate matter variables that have been continuously measured since 2004 at an urban background aerosol monitoring site in the city of Augsburg, Germany. Many of the original variables were derived from measured particle size distribution (PSD) across the particle diameter range 3 nm to 10 μm, including size-segregated particle number concentration, particle length concentration, particle surface concentration and particle mass concentration. The data set was complemented by integral aerosol variables. These variables were measured by independent instruments, including black carbon, sulfate, particle active surface concentration and particle length concentration. It is obvious that such a large number of measured variables cannot be used in health effect analyses simultaneously. The aim of this study is a pre-screening and a selection of the key variables that will be used as input in forthcoming epidemiological studies. In this study, we present two methods of parameter selection and apply them to data from a two-year period from 2007 to 2008. We used the agglomerative hierarchical cluster method to find groups of similar variables. In total, we selected 15 key variables from 9 clusters which are recommended for epidemiological analyses. We also applied a two-dimensional visualization technique called "heatmap" analysis to the Spearman correlation matrix. 12 key variables were selected using this method. Moreover, the positive matrix factorization (PMF) method was applied to the PSD data to characterize the possible particle sources. Correlations between the variables and PMF factors were used to interpret the meaning of the cluster and the heatmap analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

PubMed

Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

2016-01-01

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.
Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

PubMed Central

Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

2016-01-01

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646
4-Demethylwyosine Synthase from Pyrococcus abyssi Is a Radical-S-adenosyl-l-methionine Enzyme with an Additional [4Fe-4S]+2 Cluster That Interacts with the Pyruvate Co-substrate*

PubMed Central

Perche-Letuvée, Phanélie; Kathirvelu, Velavan; Berggren, Gustav; Clemancey, Martin; Latour, Jean-Marc; Maurel, Vincent; Douki, Thierry; Armengaud, Jean; Mulliez, Etienne; Fontecave, Marc; Garcia-Serres, Ricardo; Gambarelli, Serge; Atta, Mohamed

2012-01-01

Wybutosine and its derivatives are found in position 37 of tRNA encoding Phe in eukaryotes and archaea. They are believed to play a key role in the decoding function of the ribosome. The second step in the biosynthesis of wybutosine is catalyzed by TYW1 protein, which is a member of the well established class of metalloenzymes called “Radical-SAM.” These enzymes use a [4Fe-4S] cluster, chelated by three cysteines in a CX3CX2C motif, and S-adenosyl-l-methionine (SAM) to generate a 5′-deoxyadenosyl radical that initiates various chemically challenging reactions. Sequence analysis of TYW1 proteins revealed, in the N-terminal half of the enzyme beside the Radical-SAM cysteine triad, an additional highly conserved cysteine motif. In this study we show by combining analytical and spectroscopic methods including UV-visible absorption, Mössbauer, EPR, and HYSCORE spectroscopies that these additional cysteines are involved in the coordination of a second [4Fe-4S] cluster displaying a free coordination site that interacts with pyruvate, the second substrate of the reaction. The presence of two distinct iron-sulfur clusters on TYW1 is reminiscent of MiaB, another tRNA-modifying metalloenzyme whose active form was shown to bind two iron-sulfur clusters. A possible role for the second [4Fe-4S] cluster in the enzyme activity is discussed. PMID:23043105

Spatial event cluster detection using an approximate normal distribution.

PubMed

Torabi, Mahmoud; Rosychuk, Rhonda J

2008-12-12

In geographic surveillance of disease, areas with large numbers of disease cases are to be identified so that investigations of the causes of high disease rates can be pursued. Areas with high rates are called disease clusters and statistical cluster detection tests are used to identify geographic areas with higher disease rates than expected by chance alone. Typically cluster detection tests are applied to incident or prevalent cases of disease, but surveillance of disease-related events, where an individual may have multiple events, may also be of interest. Previously, a compound Poisson approach that detects clusters of events by testing individual areas that may be combined with their neighbours has been proposed. However, the relevant probabilities from the compound Poisson distribution are obtained from a recursion relation that can be cumbersome if the number of events are large or analyses by strata are performed. We propose a simpler approach that uses an approximate normal distribution. This method is very easy to implement and is applicable to situations where the population sizes are large and the population distribution by important strata may differ by area. We demonstrate the approach on pediatric self-inflicted injury presentations to emergency departments and compare the results for probabilities based on the recursion and the normal approach. We also implement a Monte Carlo simulation to study the performance of the proposed approach. In a self-inflicted injury data example, the normal approach identifies twelve out of thirteen of the same clusters as the compound Poisson approach, noting that the compound Poisson method detects twelve significant clusters in total. Through simulation studies, the normal approach well approximates the compound Poisson approach for a variety of different population sizes and case and event thresholds. A drawback of the compound Poisson approach is that the relevant probabilities must be determined through a recursion relation and such calculations can be computationally intensive if the cluster size is relatively large or if analyses are conducted with strata variables. On the other hand, the normal approach is very flexible, easily implemented, and hence, more appealing for users. Moreover, the concepts may be more easily conveyed to non-statisticians interested in understanding the methodology associated with cluster detection test results.
Network community structure and loop coefficient method

NASA Astrophysics Data System (ADS)

Vragović, I.; Louis, E.

2006-07-01

A modular structure, in which groups of tightly connected nodes could be resolved as separate entities, is a property that can be found in many complex networks. In this paper, we propose a algorithm for identifying communities in networks. It is based on a local measure, so-called loop coefficient that is a generalization of the clustering coefficient. Nodes with a large loop coefficient tend to be core inner community nodes, while other vertices are usually peripheral sites at the borders of communities. Our method gives satisfactory results for both artificial and real-world graphs, if they have a relatively pronounced modular structure. This type of algorithm could open a way of interpreting the role of nodes in communities in terms of the local loop coefficient, and could be used as a complement to other methods.
Multi-fault clustering and diagnosis of gear system mined by spectrum entropy clustering based on higher order cumulants

NASA Astrophysics Data System (ADS)

Shao, Renping; Li, Jing; Hu, Wentao; Dong, Feifei

2013-02-01

Higher order cumulants (HOC) is a new kind of modern signal analysis of theory and technology. Spectrum entropy clustering (SEC) is a data mining method of statistics, extracting useful characteristics from a mass of nonlinear and non-stationary data. Following a discussion on the characteristics of HOC theory and SEC method in this paper, the study of signal processing techniques and the unique merits of nonlinear coupling characteristic analysis in processing random and non-stationary signals are introduced. Also, a new clustering analysis and diagnosis method is proposed for detecting multi-damage on gear by introducing the combination of HOC and SEC into the damage-detection and diagnosis of the gear system. The noise is restrained by HOC and by extracting coupling features and separating the characteristic signal at different speeds and frequency bands. Under such circumstances, the weak signal characteristics in the system are emphasized and the characteristic of multi-fault is extracted. Adopting a data-mining method of SEC conducts an analysis and diagnosis at various running states, such as the speed of 300 r/min, 900 r/min, 1200 r/min, and 1500 r/min of the following six signals: no-fault, short crack-fault in tooth root, long crack-fault in tooth root, short crack-fault in pitch circle, long crack-fault in pitch circle, and wear-fault on tooth. Research shows that this combined method of detection and diagnosis can also identify the degree of damage of some faults. On this basis, the virtual instrument of the gear system which detects damage and diagnoses faults is developed by combining with advantages of MATLAB and VC++, employing component object module technology, adopting mixed programming methods, and calling the program transformed from an *.m file under VC++. This software system possesses functions of collecting and introducing vibration signals of gear, analyzing and processing signals, extracting features, visualizing graphics, detecting and diagnosing faults, detecting and monitoring, etc. Finally, the results of testing and verifying show that the developed system can effectively be used to detect and diagnose faults in an actual operating gear transmission system.
Multi-fault clustering and diagnosis of gear system mined by spectrum entropy clustering based on higher order cumulants.

PubMed

Shao, Renping; Li, Jing; Hu, Wentao; Dong, Feifei

2013-02-01

Higher order cumulants (HOC) is a new kind of modern signal analysis of theory and technology. Spectrum entropy clustering (SEC) is a data mining method of statistics, extracting useful characteristics from a mass of nonlinear and non-stationary data. Following a discussion on the characteristics of HOC theory and SEC method in this paper, the study of signal processing techniques and the unique merits of nonlinear coupling characteristic analysis in processing random and non-stationary signals are introduced. Also, a new clustering analysis and diagnosis method is proposed for detecting multi-damage on gear by introducing the combination of HOC and SEC into the damage-detection and diagnosis of the gear system. The noise is restrained by HOC and by extracting coupling features and separating the characteristic signal at different speeds and frequency bands. Under such circumstances, the weak signal characteristics in the system are emphasized and the characteristic of multi-fault is extracted. Adopting a data-mining method of SEC conducts an analysis and diagnosis at various running states, such as the speed of 300 r/min, 900 r/min, 1200 r/min, and 1500 r/min of the following six signals: no-fault, short crack-fault in tooth root, long crack-fault in tooth root, short crack-fault in pitch circle, long crack-fault in pitch circle, and wear-fault on tooth. Research shows that this combined method of detection and diagnosis can also identify the degree of damage of some faults. On this basis, the virtual instrument of the gear system which detects damage and diagnoses faults is developed by combining with advantages of MATLAB and VC++, employing component object module technology, adopting mixed programming methods, and calling the program transformed from an *.m file under VC++. This software system possesses functions of collecting and introducing vibration signals of gear, analyzing and processing signals, extracting features, visualizing graphics, detecting and diagnosing faults, detecting and monitoring, etc. Finally, the results of testing and verifying show that the developed system can effectively be used to detect and diagnose faults in an actual operating gear transmission system.
Development of EnergyPlus Utility to Batch Simulate Building Energy Performance on a National Scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Valencia, Jayson F.; Dirks, James A.

2008-08-29

EnergyPlus is a simulation program that requires a large number of details to fully define and model a building. Hundreds or even thousands of lines in a text file are needed to run the EnergyPlus simulation depending on the size of the building. To manually create these files is a time consuming process that would not be practical when trying to create input files for thousands of buildings needed to simulate national building energy performance. To streamline the process needed to create the input files for EnergyPlus, two methods were created to work in conjunction with the National Renewable Energymore » Laboratory (NREL) Preprocessor; this reduced the hundreds of inputs needed to define a building in EnergyPlus to a small set of high-level parameters. The first method uses Java routines to perform all of the preprocessing on a Windows machine while the second method carries out all of the preprocessing on the Linux cluster by using an in-house built utility called Generalized Parametrics (GPARM). A comma delimited (CSV) input file is created to define the high-level parameters for any number of buildings. Each method then takes this CSV file and uses the data entered for each parameter to populate an extensible markup language (XML) file used by the NREL Preprocessor to automatically prepare EnergyPlus input data files (idf) using automatic building routines and macro templates. Using a Linux utility called “make”, the idf files can then be automatically run through the Linux cluster and the desired data from each building can be aggregated into one table to be analyzed. Creating a large number of EnergyPlus input files results in the ability to batch simulate building energy performance and scale the result to national energy consumption estimates.« less
DCE: A Distributed Energy-Eﬃcient Clustering Protocol for Wireless Sensor Network Based on Double-Phase Cluster-Head Election.

PubMed

Han, Ruisong; Yang, Wei; Wang, Yipeng; You, Kaiming

2017-05-01

Clustering is an effective technique used to reduce energy consumption and extend the lifetime of wireless sensor network (WSN). The characteristic of energy heterogeneity of WSNs should be considered when designing clustering protocols. We propose and evaluate a novel distributed energy-eﬃcient clustering protocol called DCE for heterogeneous wireless sensor networks, based on a Double-phase Cluster-head Election scheme. In DCE, the procedure of cluster head election is divided into two phases. In the first phase, tentative cluster heads are elected with the probabilities which are decided by the relative levels of initial and residual energy. Then, in the second phase, the tentative cluster heads are replaced by their cluster members to form the final set of cluster heads if any member in their cluster has more residual energy. Employing two phases for cluster-head election ensures that the nodes with more energy have a higher chance to be cluster heads. Energy consumption is well-distributed in the proposed protocol, and the simulation results show that DCE achieves longer stability periods than other typical clustering protocols in heterogeneous scenarios.
ClueNet: Clustering a temporal network based on topological similarity rather than denseness

PubMed Central

Milenković, Tijana

2018-01-01

Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of “topologically related” nodes, where the resulting topology-based clusters are expected to “correlate” well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data—their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance. PMID:29738568
Biochemistry in an Undergraduate Writing-Intensive First-Year Program: Seminar Courses in Drugs and Bioethics

ERIC Educational Resources Information Center

Mills, Kenneth V.

2015-01-01

The College of the Holy Cross offers a universal first-year program called Montserrat, in which first-year students participate in a living-learning experience anchored by a yearlong seminar course. The seminar courses are part of a thematic cluster of four to eight courses; students in the cluster live together in a common dormitory and…
Gene diversity in some Muslim populations of North India.

PubMed

Aarzoo, S Shabana; Afzal, Mohammad

2005-06-01

North Indian Muslim populations have historical, linguistic, and socioreligious significance to the Indian subcontinent. Although sociocultural and political dimensions of their demography are well documented, no detailed genetic structure of the populations is available. We have undertaken a survey of the gene frequencies of the ABO, Rh, PTC taste ability, sickling, and G6PD systems for different endogamous groups: Sheikh, Syed, Pathan, Ansari, Saifi, and Hindu Bania. All the groups at most loci showed statistically nonsignificant differences, except for ABO and PTC traits, for which interpopulational differences were seen. Heterozygosity ranged from 0.048 to 0.617 among the Sheikh, 0.149 to 0.599 among the Pathan, 0.105 to 0.585 among the Ansari, 0.25 to 0.869 among the Syed, 0.107 to 0.565 among the Saifi, and 0.100 to 0.492 among the Hindu Bania. The average D(ST) and G(ST) values for the five marker loci were 0.0625 +/- 0.098 and 0.1072 +/- 0.041, respectively. A dendrogram was constructed using the UPGMA clustering method. Our results revealed that the Pathan and the Sheikh form one cluster, the Syed and the Hindu Bania form another cluster, and the two clusters join together (the so-called higher caste); also, the Saifi and the Ansari form a separate cluster (lower caste). The results of the genetic distance analysis are useful for understanding the pattern of genetic relationships between different endogamous groups of Muslims.
Automatic generation of efficient orderings of events for scheduling applications

NASA Technical Reports Server (NTRS)

Morris, Robert A.

1994-01-01

In scheduling a set of tasks, it is often not known with certainty how long a given event will take. We call this duration uncertainty. Duration uncertainty is a primary obstacle to the successful completion of a schedule. If a duration of one task is longer than expected, the remaining tasks are delayed. The delay may result in the abandonment of the schedule itself, a phenomenon known as schedule breakage. One response to schedule breakage is on-line, dynamic rescheduling. A more recent alternative is called proactive rescheduling. This method uses statistical data about the durations of events in order to anticipate the locations in the schedule where breakage is likely prior to the execution of the schedule. It generates alternative schedules at such sensitive points, which can be then applied by the scheduler at execution time, without the delay incurred by dynamic rescheduling. This paper proposes a technique for making proactive error management more effective. The technique is based on applying a similarity-based method of clustering to the problem of identifying similar events in a set of events.
Topology control algorithm for wireless sensor networks based on Link forwarding

NASA Astrophysics Data System (ADS)

Pucuo, Cairen; Qi, Ai-qin

2018-03-01

The research of topology control could effectively save energy and increase the service life of network based on wireless sensor. In this paper, a arithmetic called LTHC (link transmit hybrid clustering) based on link transmit is proposed. It decreases expenditure of energy by changing the way of cluster-node’s communication. The idea is to establish a link between cluster and SINK node when the cluster is formed, and link-node must be non-cluster. Through the link, cluster sends information to SINK nodes. For the sake of achieving the uniform distribution of energy on the network, prolongate the network survival time, and improve the purpose of communication, the communication will cut down much more expenditure of energy for cluster which away from SINK node. In the two aspects of improving the traffic and network survival time, we find that the LTCH is far superior to the traditional LEACH by experiments.
Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma

PubMed Central

Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

2007-01-01

Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis. PMID:18305825
Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma.

PubMed

Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

2007-12-30

Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis.
A Saliency Guided Semi-Supervised Building Change Detection Method for High Resolution Remote Sensing Images

PubMed Central

Hou, Bin; Wang, Yunhong; Liu, Qingjie

2016-01-01

Characterizations of up to date information of the Earth’s surface are an important application providing insights to urban planning, resources monitoring and environmental studies. A large number of change detection (CD) methods have been developed to solve them by utilizing remote sensing (RS) images. The advent of high resolution (HR) remote sensing images further provides challenges to traditional CD methods and opportunities to object-based CD methods. While several kinds of geospatial objects are recognized, this manuscript mainly focuses on buildings. Specifically, we propose a novel automatic approach combining pixel-based strategies with object-based ones for detecting building changes with HR remote sensing images. A multiresolution contextual morphological transformation called extended morphological attribute profiles (EMAPs) allows the extraction of geometrical features related to the structures within the scene at different scales. Pixel-based post-classification is executed on EMAPs using hierarchical fuzzy clustering. Subsequently, the hierarchical fuzzy frequency vector histograms are formed based on the image-objects acquired by simple linear iterative clustering (SLIC) segmentation. Then, saliency and morphological building index (MBI) extracted on difference images are used to generate a pseudo training set. Ultimately, object-based semi-supervised classification is implemented on this training set by applying random forest (RF). Most of the important changes are detected by the proposed method in our experiments. This study was checked for effectiveness using visual evaluation and numerical evaluation. PMID:27618903
A Saliency Guided Semi-Supervised Building Change Detection Method for High Resolution Remote Sensing Images.

PubMed

Hou, Bin; Wang, Yunhong; Liu, Qingjie

2016-08-27

Characterizations of up to date information of the Earth's surface are an important application providing insights to urban planning, resources monitoring and environmental studies. A large number of change detection (CD) methods have been developed to solve them by utilizing remote sensing (RS) images. The advent of high resolution (HR) remote sensing images further provides challenges to traditional CD methods and opportunities to object-based CD methods. While several kinds of geospatial objects are recognized, this manuscript mainly focuses on buildings. Specifically, we propose a novel automatic approach combining pixel-based strategies with object-based ones for detecting building changes with HR remote sensing images. A multiresolution contextual morphological transformation called extended morphological attribute profiles (EMAPs) allows the extraction of geometrical features related to the structures within the scene at different scales. Pixel-based post-classification is executed on EMAPs using hierarchical fuzzy clustering. Subsequently, the hierarchical fuzzy frequency vector histograms are formed based on the image-objects acquired by simple linear iterative clustering (SLIC) segmentation. Then, saliency and morphological building index (MBI) extracted on difference images are used to generate a pseudo training set. Ultimately, object-based semi-supervised classification is implemented on this training set by applying random forest (RF). Most of the important changes are detected by the proposed method in our experiments. This study was checked for effectiveness using visual evaluation and numerical evaluation.
Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

DOEpatents

Sanfilippo, Antonio [Richland, WA; Calapristi, Augustin J [West Richland, WA; Crow, Vernon L [Richland, WA; Hetzler, Elizabeth G [Kennewick, WA; Turner, Alan E [Kennewick, WA

2009-12-22

Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.
Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

NASA Astrophysics Data System (ADS)

Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

2018-05-01

The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.
Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

PubMed

Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

2014-01-01

Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.
Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches

PubMed Central

Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.

2014-01-01

Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683
Support vector machine multiuser receiver for DS-CDMA signals in multipath channels.

PubMed

Chen, S; Samingan, A K; Hanzo, L

2001-01-01

The problem of constructing an adaptive multiuser detector (MUD) is considered for direct sequence code division multiple access (DS-CDMA) signals transmitted through multipath channels. The emerging learning technique, called support vector machines (SVM), is proposed as a method of obtaining a nonlinear MUD from a relatively small training data block. Computer simulation is used to study this SVM MUD, and the results show that it can closely match the performance of the optimal Bayesian one-shot detector. Comparisons with an adaptive radial basis function (RBF) MUD trained by an unsupervised clustering algorithm are discussed.

Production of genome-edited pluripotent stem cells and mice by CRISPR/Cas.

PubMed

Horii, Takuro; Hatada, Izuho

2016-01-01

Clustered regularly at interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) nucleases, so-called CRISPR/Cas, was recently developed as an epoch-making genome engineering technology. This system only requires Cas9 nuclease and single-guide RNA complementary to a target locus. CRISPR/Cas enables the generation of knockout cells and animals in a single step. This system can also be used to generate multiple mutations and knockin in a single step, which is not possible using other methods. In this review, we provide an overview of genome editing by CRISPR/Cas in pluripotent stem cells and mice.
Evaluation of Potential LSST Spatial Indexing Strategies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nikolaev, S; Abdulla, G; Matzke, R

2006-10-13

The LSST requirement for producing alerts in near real-time, and the fact that generating an alert depends on knowing the history of light variations for a given sky position, both imply that the clustering information for all detections is available at any time during the survey. Therefore, any data structure describing clustering of detections in LSST needs to be continuously updated, even as new detections are arriving from the pipeline. We call this use case ''incremental clustering'', to reflect this continuous updating of clustering information. This document describes the evaluation results for several potential LSST incremental clustering strategies, using: (1)more » Neighbors table and zone optimization to store spatial clusters (a.k.a. Jim Grey's, or SDSS algorithm); (2) MySQL built-in R-tree implementation; (3) an external spatial index library which supports a query interface.« less
Evaluation of genetic diversity in jackfruit (Artocarpus heterophyllus Lam.) based on amplified fragment length polymorphism markers.

PubMed

Shyamalamma, S; Chandra, S B C; Hegde, M; Naryanswamy, P

2008-07-22

Artocarpus heterophyllus Lam., commonly called jackfruit, is a medium-sized evergreen tree that bears high yields of the largest known edible fruit. Yet, it has been little explored commercially due to wide variation in fruit quality. The genetic diversity and genetic relatedness of 50 jackfruit accessions were studied using amplified fragment length polymorphism markers. Of 16 primer pairs evaluated, eight were selected for screening of genotypes based on the number and quality of polymorphic fragments produced. These primer combinations produced 5976 bands, 1267 (22%) of which were polymorphic. Among the jackfruit accessions, the similarity coefficient ranged from 0.137 to 0.978; the accessions also shared a large number of monomorphic fragments (78%). Cluster analysis and principal component analysis grouped all jackfruit genotypes into three major clusters. Cluster I included the genotypes grown in a jackfruit region of Karnataka, called Tamaka, with very dry conditions; cluster II contained the genotypes collected from locations having medium to heavy rainfall in Karnataka; cluster III grouped the genotypes in distant locations with different environmental conditions. Strong coincidence of these amplified fragment length polymorphism-based groupings with geographical localities as well as morphological characters was observed. We found moderate genetic diversity in these jackfruit accessions. This information should be useful for tree breeding programs, as part of our effort to popularize jackfruit as a commercial crop.
Calling patterns in human communication dynamics

PubMed Central

Jiang, Zhi-Qiang; Xie, Wen-Jie; Li, Ming-Xia; Podobnik, Boris; Zhou, Wei-Xing; Stanley, H. Eugene

2013-01-01

Modern technologies not only provide a variety of communication modes (e.g., texting, cell phone conversation, and online instant messaging), but also detailed electronic traces of these communications between individuals. These electronic traces indicate that the interactions occur in temporal bursts. Here, we study intercall duration of communications of the 100,000 most active cell phone users of a Chinese mobile phone operator. We confirm that the intercall durations follow a power-law distribution with an exponential cutoff at the population level but find differences when focusing on individual users. We apply statistical tests at the individual level and find that the intercall durations follow a power-law distribution for only 3,460 individuals (3.46%). The intercall durations for the majority (73.34%) follow a Weibull distribution. We quantify individual users using three measures: out-degree, percentage of outgoing calls, and communication diversity. We find that the cell phone users with a power-law duration distribution fall into three anomalous clusters: robot-based callers, telecom fraud, and telephone sales. This information is of interest to both academics and practitioners, mobile telecom operators in particular. In contrast, the individual users with a Weibull duration distribution form the fourth cluster of ordinary cell phone users. We also discover more information about the calling patterns of these four clusters (e.g., the probability that a user will call the cr-th most contact and the probability distribution of burst sizes). Our findings may enable a more detailed analysis of the huge body of data contained in the logs of massive users. PMID:23319645
Calling patterns in human communication dynamics.

PubMed

Jiang, Zhi-Qiang; Xie, Wen-Jie; Li, Ming-Xia; Podobnik, Boris; Zhou, Wei-Xing; Stanley, H Eugene

2013-01-29

Modern technologies not only provide a variety of communication modes (e.g., texting, cell phone conversation, and online instant messaging), but also detailed electronic traces of these communications between individuals. These electronic traces indicate that the interactions occur in temporal bursts. Here, we study intercall duration of communications of the 100,000 most active cell phone users of a Chinese mobile phone operator. We confirm that the intercall durations follow a power-law distribution with an exponential cutoff at the population level but find differences when focusing on individual users. We apply statistical tests at the individual level and find that the intercall durations follow a power-law distribution for only 3,460 individuals (3.46%). The intercall durations for the majority (73.34%) follow a Weibull distribution. We quantify individual users using three measures: out-degree, percentage of outgoing calls, and communication diversity. We find that the cell phone users with a power-law duration distribution fall into three anomalous clusters: robot-based callers, telecom fraud, and telephone sales. This information is of interest to both academics and practitioners, mobile telecom operators in particular. In contrast, the individual users with a Weibull duration distribution form the fourth cluster of ordinary cell phone users. We also discover more information about the calling patterns of these four clusters (e.g., the probability that a user will call the c(r)-th most contact and the probability distribution of burst sizes). Our findings may enable a more detailed analysis of the huge body of data contained in the logs of massive users.
Advertisement call and genetic structure conservatism: good news for an endangered Neotropical frog.

PubMed

Forti, Lucas R; Costa, William P; Martins, Lucas B; Nunes-de-Almeida, Carlos H L; Toledo, Luís Felipe

2016-01-01

Many amphibian species are negatively affected by habitat change due to anthropogenic activities. Populations distributed over modified landscapes may be subject to local extinction or may be relegated to the remaining-likely isolated and possibly degraded-patches of available habitat. Isolation without gene flow could lead to variability in phenotypic traits owing to differences in local selective pressures such as environmental structure, microclimate, or site-specific species assemblages. Here, we tested the microevolution hypothesis by evaluating the acoustic parameters of 349 advertisement calls from 15 males from six populations of the endangered amphibian species Proceratophrys moratoi. In addition, we analyzed the genetic distances among populations and the genetic diversity with a haplotype network analysis. We performed cluster analysis on acoustic data based on the Bray-Curtis index of similarity, using the UPGMA method. We correlated acoustic dissimilarities (calculated by Euclidean distance) with geographical and genetic distances among populations. Spectral traits of the advertisement call of P. moratoi presented lower coefficients of variation than did temporal traits, both within and among males. Cluster analyses placed individuals without congruence in population or geographical distance, but recovered the species topology in relation to sister species. The genetic distance among populations was low; it did not exceed 0.4% for the most distant populations, and was not correlated with acoustic distance. Both acoustic features and genetic sequences are highly conserved, suggesting that populations could be connected by recent migrations, and that they are subject to stabilizing selective forces. Although further studies are required, these findings add to a growing body of literature suggesting that this species would be a good candidate for a reintroduction program without negative effects on communication or genetic impact.
Membership determination of open clusters based on a spectral clustering method

NASA Astrophysics Data System (ADS)

Gao, Xin-Hua

2018-06-01

We present a spectral clustering (SC) method aimed at segregating reliable members of open clusters in multi-dimensional space. The SC method is a non-parametric clustering technique that performs cluster division using eigenvectors of the similarity matrix; no prior knowledge of the clusters is required. This method is more flexible in dealing with multi-dimensional data compared to other methods of membership determination. We use this method to segregate the cluster members of five open clusters (Hyades, Coma Ber, Pleiades, Praesepe, and NGC 188) in five-dimensional space; fairly clean cluster members are obtained. We find that the SC method can capture a small number of cluster members (weak signal) from a large number of field stars (heavy noise). Based on these cluster members, we compute the mean proper motions and distances for the Hyades, Coma Ber, Pleiades, and Praesepe clusters, and our results are in general quite consistent with the results derived by other authors. The test results indicate that the SC method is highly suitable for segregating cluster members of open clusters based on high-precision multi-dimensional astrometric data such as Gaia data.
Feature selection for examining behavior by pathology laboratories.

PubMed

Hawkins, S; Williams, G; Baxter, R

2001-08-01

Australia has a universal health insurance scheme called Medicare, which is managed by Australia's Health Insurance Commission. Medicare payments for pathology services generate voluminous transaction data on patients, doctors and pathology laboratories. The Health Insurance Commission (HIC) currently uses predictive models to monitor compliance with regulatory requirements. The HIC commissioned a project to investigate the generation of new features from the data. Feature generation has not appeared as an important step in the knowledge discovery in databases (KDD) literature. New interesting features for use in predictive modeling are generated. These features were summarized, visualized and used as inputs for clustering and outlier detection methods. Data organization and data transformation methods are described for the efficient access and manipulation of these new features.
Neural-network-assisted genetic algorithm applied to silicon clusters

NASA Astrophysics Data System (ADS)

Marim, L. R.; Lemes, M. R.; dal Pino, A.

2003-03-01

Recently, a new optimization procedure that combines the power of artificial neural-networks with the versatility of the genetic algorithm (GA) was introduced. This method, called neural-network-assisted genetic algorithm (NAGA), uses a neural network to restrict the search space and it is expected to speed up the solution of global optimization problems if some previous information is available. In this paper, we have tested NAGA to determine the ground-state geometry of Sin (10⩽n⩽15) according to a tight-binding total-energy method. Our results indicate that NAGA was able to find the desired global minimum of the potential energy for all the test cases and it was at least ten times faster than pure genetic algorithm.
Particle-like structure of coaxial Lie algebras

NASA Astrophysics Data System (ADS)

Vinogradov, A. M.

2018-01-01

This paper is a natural continuation of Vinogradov [J. Math. Phys. 58, 071703 (2017)] where we proved that any Lie algebra over an algebraically closed field or over R can be assembled in a number of steps from two elementary constituents, called dyons and triadons. Here we consider the problems of the construction and classification of those Lie algebras which can be assembled in one step from base dyons and triadons, called coaxial Lie algebras. The base dyons and triadons are Lie algebra structures that have only one non-trivial structure constant in a given basis, while coaxial Lie algebras are linear combinations of pairwise compatible base dyons and triadons. We describe the maximal families of pairwise compatible base dyons and triadons called clusters, and, as a consequence, we give a complete description of the coaxial Lie algebras. The remarkable fact is that dyons and triadons in clusters are self-organised in structural groups which are surrounded by casings and linked by connectives. We discuss generalisations and applications to the theory of deformations of Lie algebras.
Functional MRI of the vocalization-processing network in the macaque brain

PubMed Central

Ortiz-Rios, Michael; Kuśmierek, Paweł; DeWitt, Iain; Archakov, Denis; Azevedo, Frederico A. C.; Sams, Mikko; Jääskeläinen, Iiro P.; Keliris, Georgios A.; Rauschecker, Josef P.

2015-01-01

Using functional magnetic resonance imaging in awake behaving monkeys we investigated how species-specific vocalizations are represented in auditory and auditory-related regions of the macaque brain. We found clusters of active voxels along the ascending auditory pathway that responded to various types of complex sounds: inferior colliculus (IC), medial geniculate nucleus (MGN), auditory core, belt, and parabelt cortex, and other parts of the superior temporal gyrus (STG) and sulcus (STS). Regions sensitive to monkey calls were most prevalent in the anterior STG, but some clusters were also found in frontal and parietal cortex on the basis of comparisons between responses to calls and environmental sounds. Surprisingly, we found that spectrotemporal control sounds derived from the monkey calls (“scrambled calls”) also activated the parietal and frontal regions. Taken together, our results demonstrate that species-specific vocalizations in rhesus monkeys activate preferentially the auditory ventral stream, and in particular areas of the antero-lateral belt and parabelt. PMID:25883546
Microbial communication leading to the activation of silent fungal secondary metabolite gene clusters

PubMed Central

Netzker, Tina; Fischer, Juliane; Weber, Jakob; Mattern, Derek J.; König, Claudia C.; Valiante, Vito; Schroeckh, Volker; Brakhage, Axel A.

2015-01-01

Microorganisms form diverse multispecies communities in various ecosystems. The high abundance of fungal and bacterial species in these consortia results in specific communication between the microorganisms. A key role in this communication is played by secondary metabolites (SMs), which are also called natural products. Recently, it was shown that interspecies “talk” between microorganisms represents a physiological trigger to activate silent gene clusters leading to the formation of novel SMs by the involved species. This review focuses on mixed microbial cultivation, mainly between bacteria and fungi, with a special emphasis on the induced formation of fungal SMs in co-cultures. In addition, the role of chromatin remodeling in the induction is examined, and methodical perspectives for the analysis of natural products are presented. As an example for an intermicrobial interaction elucidated at the molecular level, we discuss the specific interaction between the filamentous fungi Aspergillus nidulans and Aspergillus fumigatus with the soil bacterium Streptomyces rapamycinicus, which provides an excellent model system to enlighten molecular concepts behind regulatory mechanisms and will pave the way to a novel avenue of drug discovery through targeted activation of silent SM gene clusters through co-cultivations of microorganisms. PMID:25941517
Microbial communication leading to the activation of silent fungal secondary metabolite gene clusters.

PubMed

Netzker, Tina; Fischer, Juliane; Weber, Jakob; Mattern, Derek J; König, Claudia C; Valiante, Vito; Schroeckh, Volker; Brakhage, Axel A

2015-01-01

Microorganisms form diverse multispecies communities in various ecosystems. The high abundance of fungal and bacterial species in these consortia results in specific communication between the microorganisms. A key role in this communication is played by secondary metabolites (SMs), which are also called natural products. Recently, it was shown that interspecies "talk" between microorganisms represents a physiological trigger to activate silent gene clusters leading to the formation of novel SMs by the involved species. This review focuses on mixed microbial cultivation, mainly between bacteria and fungi, with a special emphasis on the induced formation of fungal SMs in co-cultures. In addition, the role of chromatin remodeling in the induction is examined, and methodical perspectives for the analysis of natural products are presented. As an example for an intermicrobial interaction elucidated at the molecular level, we discuss the specific interaction between the filamentous fungi Aspergillus nidulans and Aspergillus fumigatus with the soil bacterium Streptomyces rapamycinicus, which provides an excellent model system to enlighten molecular concepts behind regulatory mechanisms and will pave the way to a novel avenue of drug discovery through targeted activation of silent SM gene clusters through co-cultivations of microorganisms.
Relationship between Procedural Tactical Knowledge and Specific Motor Skills in Young Soccer Players

PubMed Central

Aquino, Rodrigo; Marques, Renato Francisco R.; Petiot, Grégory Hallé; Gonçalves, Luiz Guilherme C.; Moraes, Camila; Santiago, Paulo Roberto P.; Puggina, Enrico Fuini

2016-01-01

The purpose of this study was to investigate the association between offensive tactical knowledge and the soccer-specific motor skills performance. Fifteen participants were submitted to two evaluation tests, one to assess their technical and tactical analysis. The motor skills performance was measured through four tests of technical soccer skills: ball control, shooting, passing and dribbling. The tactical performance was based on a tactical assessment system called FUT-SAT (Analyses of Procedural Tactical Knowledge in Soccer). Afterwards, technical and tactical evaluation scores were ranked with and without the use of the cluster method. A positive, weak correlation was perceived in both analyses (rho = 0.39, not significant p = 0.14 (with cluster analysis); and rho = 0.35; not significant p = 0.20 (without cluster analysis)). We can conclude that there was a weak association between the technical and the offensive tactical knowledge. This shows the need to reflect on the use of such tests to assess technical skills in team sports since they do not take into account the variability and unpredictability of game actions and disregard the inherent needs to assess such skill performance in the game. PMID:29910300
Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions.

PubMed

Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen

2017-09-25

In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.
Cluster states and container picture in light nuclei, and triple-alpha reaction rate

NASA Astrophysics Data System (ADS)

Funaki, Yasuro

2015-04-01

The excited states in 12C are investigated by using an extended version of the so- called Tohsaki-Horiuchi-Schuck-Röpke (THSR) wave function, where both the 3α condensate and 8Be + α cluster asymptotic configurations are included. We focus on the structures of the “Hoyle band” states, 2+2, and 4+2 states, which are recently observed above the Hoyle state, and of the 0+3 and 0+4 states, which are also quite recently identified in experiment. We show that the Hoyle band is not simply considered to be the 8Be(0+) + α rotation as suggested by previous cluster model calculations, nor to be a rotation of a rigid-body triangle-shaped object composed of the 3α particles. We also discuss the rate of the triple-alpha radiative capture reaction, applyng the imaginary-time method. Results of the triple-alpha reaction rate are consistent with NACRE rate for both high (≈ 109K) and low (≈ 107 K) temperatures. We show that the rate of the imaginary-time calculation in coupled-channels approach has a large enhancement for low temperatures if we truncate the number of channels.
Co-clustering directed graphs to discover asymmetries and directional communities

PubMed Central

Rohe, Karl; Qin, Tai; Yu, Bin

2016-01-01

In directed graphs, relationships are asymmetric and these asymmetries contain essential structural information about the graph. Directed relationships lead to a new type of clustering that is not feasible in undirected graphs. We propose a spectral co-clustering algorithm called di-sim for asymmetry discovery and directional clustering. A Stochastic co-Blockmodel is introduced to show favorable properties of di-sim. To account for the sparse and highly heterogeneous nature of directed networks, di-sim uses the regularized graph Laplacian and projects the rows of the eigenvector matrix onto the sphere. A nodewise asymmetry score and di-sim are used to analyze the clustering asymmetries in the networks of Enron emails, political blogs, and the Caenorhabditis elegans chemical connectome. In each example, a subset of nodes have clustering asymmetries; these nodes send edges to one cluster, but receive edges from another cluster. Such nodes yield insightful information (e.g., communication bottlenecks) about directed networks, but are missed if the analysis ignores edge direction. PMID:27791058
Co-clustering directed graphs to discover asymmetries and directional communities.

PubMed

Rohe, Karl; Qin, Tai; Yu, Bin

2016-10-21

In directed graphs, relationships are asymmetric and these asymmetries contain essential structural information about the graph. Directed relationships lead to a new type of clustering that is not feasible in undirected graphs. We propose a spectral co-clustering algorithm called di-sim for asymmetry discovery and directional clustering. A Stochastic co-Blockmodel is introduced to show favorable properties of di-sim To account for the sparse and highly heterogeneous nature of directed networks, di-sim uses the regularized graph Laplacian and projects the rows of the eigenvector matrix onto the sphere. A nodewise asymmetry score and di-sim are used to analyze the clustering asymmetries in the networks of Enron emails, political blogs, and the Caenorhabditis elegans chemical connectome. In each example, a subset of nodes have clustering asymmetries; these nodes send edges to one cluster, but receive edges from another cluster. Such nodes yield insightful information (e.g., communication bottlenecks) about directed networks, but are missed if the analysis ignores edge direction.
Genetic structure of Plasmodium falciparum populations across the Honduras-Nicaragua border

PubMed Central

2013-01-01

Background The Caribbean coast of Central America remains an area of malaria transmission caused by Plasmodium falciparum despite the fact that morbidity has been reduced in recent years. Parasite populations in that region show interesting characteristics such as chloroquine susceptibility and low mortality rates. Genetic structure and diversity of P. falciparum populations in the Honduras-Nicaragua border were analysed in this study. Methods Seven neutral microsatellite loci were analysed in 110 P. falciparum isolates from endemic areas of Honduras (n = 77) and Nicaragua (n = 33), mostly from the border region called the Moskitia. Several analyses concerning the genetic diversity, linkage disequilibrium, population structure, molecular variance, and haplotype clustering were conducted. Results There was a low level of genetic diversity in P. falciparum populations from Honduras and Nicaragua. Expected heterozigosity (He) results were similarly low for both populations. A moderate differentiation was revealed by the FST index between both populations, and two putative clusters were defined through a structure analysis. The main cluster grouped most of samples from Honduras and Nicaragua, while the second cluster was smaller and included all the samples from the Siuna community in Nicaragua. This result could partially explain the stronger linkage disequilibrium (LD) in the parasite population from that country. These findings are congruent with the decreasing rates of malaria endemicity in Central America. PMID:24093629
Is the cluster environment quenching the Seyfert activity in elliptical and spiral galaxies?

NASA Astrophysics Data System (ADS)

de Souza, R. S.; Dantas, M. L. L.; Krone-Martins, A.; Cameron, E.; Coelho, P.; Hattab, M. W.; de Val-Borro, M.; Hilbe, J. M.; Elliott, J.; Hagen, A.; COIN Collaboration

2016-09-01

We developed a hierarchical Bayesian model (HBM) to investigate how the presence of Seyfert activity relates to their environment, herein represented by the galaxy cluster mass, M200, and the normalized cluster centric distance, r/r200. We achieved this by constructing an unbiased sample of galaxies from the Sloan Digital Sky Survey, with morphological classifications provided by the Galaxy Zoo Project. A propensity score matching approach is introduced to control the effects of confounding variables: stellar mass, galaxy colour, and star formation rate. The connection between Seyfert-activity and environmental properties in the de-biased sample is modelled within an HBM framework using the so-called logistic regression technique, suitable for the analysis of binary data (e.g. whether or not a galaxy hosts an AGN). Unlike standard ordinary least square fitting methods, our methodology naturally allows modelling the probability of Seyfert-AGN activity in galaxies on their natural scale, I.e. as a binary variable. Furthermore, we demonstrate how an HBM can incorporate information of each particular galaxy morphological type in an unified framework. In elliptical galaxies our analysis indicates a strong correlation of Seyfert-AGN activity with r/r200, and a weaker correlation with the mass of the host cluster. In spiral galaxies these trends do not appear, suggesting that the link between Seyfert activity and the properties of spiral galaxies are independent of the environment.

ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap

PubMed Central

Metsalu, Tauno; Vilo, Jaak

2015-01-01

The Principal Component Analysis (PCA) is a widely used method of reducing the dimensionality of high-dimensional data, often followed by visualizing two of the components on the scatterplot. Although widely used, the method is lacking an easy-to-use web interface that scientists with little programming skills could use to make plots of their own data. The same applies to creating heatmaps: it is possible to add conditional formatting for Excel cells to show colored heatmaps, but for more advanced features such as clustering and experimental annotations, more sophisticated analysis tools have to be used. We present a web tool called ClustVis that aims to have an intuitive user interface. Users can upload data from a simple delimited text file that can be created in a spreadsheet program. It is possible to modify data processing methods and the final appearance of the PCA and heatmap plots by using drop-down menus, text boxes, sliders etc. Appropriate defaults are given to reduce the time needed by the user to specify input parameters. As an output, users can download PCA plot and heatmap in one of the preferred file formats. This web server is freely available at http://biit.cs.ut.ee/clustvis/. PMID:25969447
User’s guide for GcClust—An R package for clustering of regional geochemical data

USGS Publications Warehouse

Ellefsen, Karl J.; Smith, David B.

2016-04-08

GcClust is a software package developed by the U.S. Geological Survey for statistical clustering of regional geochemical data, and similar data such as regional mineralogical data. Functions within the software package are written in the R statistical programming language. These functions, their documentation, and a copy of the user’s guide are bundled together in R’s unit of sharable code, which is called a “package.” The user’s guide includes step-by-step instructions showing how the functions are used to cluster data and to evaluate the clustering results. These functions are demonstrated in this report using test data, which are included in the package.
Kimberlites of the Man craton, West Africa

NASA Astrophysics Data System (ADS)

Skinner, E. M. W.; Apter, D. B.; Morelli, C.; Smithson, N. K.

2004-09-01

The Man craton in West Africa is an Archaean craton formerly joined to the Guyana craton (South America) that was rifted apart in the Mesozoic. Kimberlites of the Man craton include three Jurassic-aged clusters in Guinea, two Jurassic-aged clusters in Sierra Leone, and in Liberia two clusters of unknown age and one Neoproterozoic cluster recently dated at ∼800 Ma. All of the kimberlites irrespective of age occur as small pipes and prolific dykes. Some of the Banankoro cluster pipes in Guinea, the Koidu pipes in Sierra Leone and small pipes in the Weasua cluster in Liberia contain hypabyssal-facies kimberlite and remnants of the so-called transitional-facies and diatreme-facies kimberlite. Most of the Man craton kimberlites are mineralogically classified as phlogopite kimberlites, although potassium contents are relatively low. They are chemically similar to mica-poor Group 1A Southern African examples. The Jurassic kimberlites are considered to represent one province of kimberlites that track from older bodies in Guinea (Droujba 153 Ma) to progressively younger kimberlites in Sierra Leone (Koidu, 146 Ma and Tongo, 140 Ma). The scarcity of diatreme-facies kimberlites relative to hypabyssal-facies kimberlites and the presence of the so-called transitional-facies indicate that the pipes have been eroded down to the interface between the root and diatreme zones. From this observation, it is concluded that extensive erosion (1-2 km) has occurred since the Jurassic. In addition to erosion, the presence of abundant early crystallizing phlogopite is considered to have had an effect on the relatively small sizes of the Man craton kimberlites.
Stochastic Seeding Coupled with mRNA Self-Recruitment Generates Heterogeneous Drosophila Germ Granules.

PubMed

Niepielko, Matthew G; Eagle, Whitby V I; Gavis, Elizabeth R

2018-06-18

The formation of ribonucleoprotein assemblies called germ granules is a conserved feature of germline development. In Drosophila, germ granules form at the posterior of the oocyte in a specialized cytoplasm called the germ plasm, which specifies germline fate during embryogenesis. mRNAs, including nanos (nos) and polar granule component (pgc), that function in germline development are localized to the germ plasm through their incorporation into germ granules, which deliver them to the primordial germ cells. Germ granules are nucleated by Oskar (Osk) protein and contain varying combinations and quantities of their constituent mRNAs, which are organized as spatially distinct, multi-copy homotypic clusters. The process that gives rise to such heterogeneous yet organized granules remains unknown. Here, we show that individual nos and pgc transcripts can populate the same nascent granule, and these first transcripts then act as seeds, recruiting additional like transcripts to form homotypic clusters. Within a granule, homotypic clusters grow independently of each other but depend on the simultaneous acquisition of additional Osk. Although granules can contain multiple clusters of a particular mRNA, granule mRNA content is dominated by cluster size. These results suggest that the accumulation of mRNAs in the germ plasm is controlled by the mRNAs themselves through their ability to form homotypic clusters; thus, RNA self-association drives germ granule mRNA localization. We propose that a stochastic seeding and self-recruitment mechanism enables granules to simultaneously incorporate many different mRNAs while ensuring that each becomes enriched to a functional threshold. Copyright © 2018 Elsevier Ltd. All rights reserved.
amoA-based consensus phylogeny of ammonia-oxidizing archaea and deep sequencing of amoA genes from soils of four different geographic regions

PubMed Central

Pester, Michael; Rattei, Thomas; Flechl, Stefan; Gröngröft, Alexander; Richter, Andreas; Overmann, Jörg; Reinhold-Hurek, Barbara; Loy, Alexander; Wagner, Michael

2012-01-01

Ammonia-oxidizing archaea (AOA) play an important role in nitrification and many studies exploit their amoA genes as marker for their diversity and abundance. We present an archaeal amoA consensus phylogeny based on all publicly available sequences (status June 2010) and provide evidence for the diversification of AOA into four previously recognized clusters and one newly identified major cluster. These clusters, for which we suggest a new nomenclature, harboured 83 AOA species-level OTU (using an inferred species threshold of 85% amoA identity). 454 pyrosequencing of amoA amplicons from 16 soils sampled in Austria, Costa Rica, Greenland and Namibia revealed that only 2% of retrieved sequences had no database representative on the species-level and represented 30–37 additional species-level OTUs. With the exception of an acidic soil from which mostly amoA amplicons of the Nitrosotalea cluster were retrieved, all soils were dominated by amoA amplicons from the Nitrososphaera cluster (also called group I.1b), indicating that the previously reported AOA from the Nitrosopumilus cluster (also called group I.1a) are absent or represent minor populations in soils. AOA richness estimates on the species level ranged from 8–83 co-existing AOAs per soil. Presence/absence of amoA OTUs (97% identity level) correlated with geographic location, indicating that besides contemporary environmental conditions also dispersal limitation across different continents and/or historical environmental conditions might influence AOA biogeography in soils. PMID:22141924
Electronic excitation of molecules in solution calculated using the symmetry-adapted cluster–configuration interaction method in the polarizable continuum model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fukuda, Ryoichi, E-mail: fukuda@ims.ac.jp; Ehara, Masahiro; Elements Strategy Initiative for Catalysts and Batteries

2015-12-31

The effects from solvent environment are specific to the electronic states; therefore, a computational scheme for solvent effects consistent with the electronic states is necessary to discuss electronic excitation of molecules in solution. The PCM (polarizable continuum model) SAC (symmetry-adapted cluster) and SAC-CI (configuration interaction) methods are developed for such purposes. The PCM SAC-CI adopts the state-specific (SS) solvation scheme where solvent effects are self-consistently considered for every ground and excited states. For efficient computations of many excited states, we develop a perturbative approximation for the PCM SAC-CI method, which is called corrected linear response (cLR) scheme. Our test calculationsmore » show that the cLR PCM SAC-CI is a very good approximation of the SS PCM SAC-CI method for polar and nonpolar solvents.« less
A model-based spike sorting algorithm for removing correlation artifacts in multi-neuron recordings.

PubMed

Pillow, Jonathan W; Shlens, Jonathon; Chichilnisky, E J; Simoncelli, Eero P

2013-01-01

We examine the problem of estimating the spike trains of multiple neurons from voltage traces recorded on one or more extracellular electrodes. Traditional spike-sorting methods rely on thresholding or clustering of recorded signals to identify spikes. While these methods can detect a large fraction of the spikes from a recording, they generally fail to identify synchronous or near-synchronous spikes: cases in which multiple spikes overlap. Here we investigate the geometry of failures in traditional sorting algorithms, and document the prevalence of such errors in multi-electrode recordings from primate retina. We then develop a method for multi-neuron spike sorting using a model that explicitly accounts for the superposition of spike waveforms. We model the recorded voltage traces as a linear combination of spike waveforms plus a stochastic background component of correlated Gaussian noise. Combining this measurement model with a Bernoulli prior over binary spike trains yields a posterior distribution for spikes given the recorded data. We introduce a greedy algorithm to maximize this posterior that we call "binary pursuit". The algorithm allows modest variability in spike waveforms and recovers spike times with higher precision than the voltage sampling rate. This method substantially corrects cross-correlation artifacts that arise with conventional methods, and substantially outperforms clustering methods on both real and simulated data. Finally, we develop diagnostic tools that can be used to assess errors in spike sorting in the absence of ground truth.
A Model-Based Spike Sorting Algorithm for Removing Correlation Artifacts in Multi-Neuron Recordings

PubMed Central

Chichilnisky, E. J.; Simoncelli, Eero P.

2013-01-01

We examine the problem of estimating the spike trains of multiple neurons from voltage traces recorded on one or more extracellular electrodes. Traditional spike-sorting methods rely on thresholding or clustering of recorded signals to identify spikes. While these methods can detect a large fraction of the spikes from a recording, they generally fail to identify synchronous or near-synchronous spikes: cases in which multiple spikes overlap. Here we investigate the geometry of failures in traditional sorting algorithms, and document the prevalence of such errors in multi-electrode recordings from primate retina. We then develop a method for multi-neuron spike sorting using a model that explicitly accounts for the superposition of spike waveforms. We model the recorded voltage traces as a linear combination of spike waveforms plus a stochastic background component of correlated Gaussian noise. Combining this measurement model with a Bernoulli prior over binary spike trains yields a posterior distribution for spikes given the recorded data. We introduce a greedy algorithm to maximize this posterior that we call “binary pursuit”. The algorithm allows modest variability in spike waveforms and recovers spike times with higher precision than the voltage sampling rate. This method substantially corrects cross-correlation artifacts that arise with conventional methods, and substantially outperforms clustering methods on both real and simulated data. Finally, we develop diagnostic tools that can be used to assess errors in spike sorting in the absence of ground truth. PMID:23671583
A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

PubMed

Wang, Xueyi

2012-02-08

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: Evolutionary enhanced Markov clustering.

PubMed

Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina

2015-03-01

Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new potentially true human protein complexes were suggested as candidates for further validation using experimental techniques. Copyright © 2015 Elsevier B.V. All rights reserved.
On the multi-scale description of micro-structured fluids composed of aggregating rods

NASA Astrophysics Data System (ADS)

Perez, Marta; Scheuer, Adrien; Abisset-Chavanne, Emmanuelle; Ammar, Amine; Chinesta, Francisco; Keunings, Roland

2018-05-01

When addressing the flow of concentrated suspensions composed of rods, dense clusters are observed. Thus, the adequate modelling and simulation of such a flow requires addressing the kinematics of these dense clusters and their impact on the flow in which they are immersed. In a former work, we addressed a first modelling framework of these clusters, assumed so dense that they were considered rigid and their kinematics (flow-induced rotation) were totally defined by a symmetric tensor c with unit trace representing the cluster conformation. Then, the rigid nature of the clusters was relaxed, assuming them deformable, and a model giving the evolution of both the cluster shape and its microstructural orientation descriptor (the so-called shape and orientation tensors) was proposed. This paper compares the predictions coming from those models with finer-scale discrete simulations inspired from molecular dynamics modelling.
Sampling designs for HIV molecular epidemiology with application to Honduras.

PubMed

Shepherd, Bryan E; Rossini, Anthony J; Soto, Ramon Jeremias; De Rivera, Ivette Lorenzana; Mullins, James I

2005-11-01

Proper sampling is essential to characterize the molecular epidemiology of human immunodeficiency virus (HIV). HIV sampling frames are difficult to identify, so most studies use convenience samples. We discuss statistically valid and feasible sampling techniques that overcome some of the potential for bias due to convenience sampling and ensure better representation of the study population. We employ a sampling design called stratified cluster sampling. This first divides the population into geographical and/or social strata. Within each stratum, a population of clusters is chosen from groups, locations, or facilities where HIV-positive individuals might be found. Some clusters are randomly selected within strata and individuals are randomly selected within clusters. Variation and cost help determine the number of clusters and the number of individuals within clusters that are to be sampled. We illustrate the approach through a study designed to survey the heterogeneity of subtype B strains in Honduras.
Swarm Intelligence in Text Document Clustering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cui, Xiaohui; Potok, Thomas E

2008-01-01

Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less
Hypervelocity stars from young stellar clusters in the Galactic Centre

NASA Astrophysics Data System (ADS)

Fragione, G.; Capuzzo-Dolcetta, R.; Kroupa, P.

2017-05-01

The enormous velocities of the so-called hypervelocity stars (HVSs) derive, likely, from close interactions with massive black holes, binary stars encounters or supernova explosions. In this paper, we investigate the origin of HVSs as consequence of the close interaction between the Milky Way central massive black hole and a passing-by young stellar cluster. We found that both single and binary HVSs may be generated in a burst-like event, as the cluster passes near the orbital pericentre. High-velocity stars will move close to the initial cluster orbital plane and in the direction of the cluster orbital motion at the pericentre. The binary fraction of these HVS jets depends on the primordial binary fraction in the young cluster. The level of initial mass segregation determines the value of the average mass of the ejected stars. Some binary stars will merge, continuing their travel across and out of the Galaxy as blue stragglers.
Multicolor photometry of the merging galaxy cluster A2319: Dynamics and star formation properties

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yan, Peng-Fei; Yuan, Qi-Rong; Zhang, Li

2014-05-01

Asymmetric X-ray emission and a powerful cluster-scale radio halo indicate that A2319 is a merging cluster of galaxies. This paper presents our multicolor photometry for A2319 with 15 optical intermediate filters in the Beijing-Arizona-Taiwan-Connecticut (BATC) system. There are 142 galaxies with known spectroscopic redshifts within the viewing field of 58' × 58' centered on this rich cluster, including 128 member galaxies (called sample I). A large velocity dispersion in the rest frame, 1622{sub −70}{sup +91} km s{sup –1}, suggests merger dynamics in A2319. The contour map of projected density and localized velocity structure confirm the so-called A2319B substructure, at ∼10'more » northwest to the main concentration A2319A. The spectral energy distributions (SEDs) of more than 30,000 sources are obtained in our BATC photometry down to V ∼ 20 mag. A u-band (∼3551 Å) image with better seeing and spatial resolution, obtained with the Bok 2.3 m telescope at Kitt Peak, is taken to make star-galaxy separation and distinguish the overlapping contamination in the BATC aperture photometry. With color-color diagrams and photometric redshift technique, 233 galaxies brighter than h {sub BATC} = 19.0 are newly selected as member candidates after an exclusion of false candidates with contaminated BATC SEDs by eyeball-checking the u-band Bok image. The early-type galaxies are found to follow a tight color-magnitude correlation. Based on sample I and the enlarged sample of member galaxies (called sample II), subcluster A2319B is confirmed. The star formation properties of cluster galaxies are derived with the evolutionary synthesis model, PEGASE, assuming a Salpeter initial mass function and an exponentially decreasing star formation rate (SFR). A strong environmental effect on star formation histories is found in the manner that galaxies in the sparse regions have various star formation histories, while galaxies in the dense regions are found to have shorter SFR time scales, older stellar ages, and higher interstellar medium metallicities. For the merging cluster A2319, local surface density is a better environmental indicator rather than the cluster-centric distance. Compared with the well-relaxed cluster A2589, a higher fraction of star-forming galaxies is found in A2319, indicating that the galaxy-scale turbulence stimulated by the subcluster merger might have played a role in triggering the star formation activity.« less
Improving 3d Spatial Queries Search: Newfangled Technique of Space Filling Curves in 3d City Modeling

NASA Astrophysics Data System (ADS)

Uznir, U.; Anton, F.; Suhaibah, A.; Rahman, A. A.; Mioc, D.

2013-09-01

The advantages of three dimensional (3D) city models can be seen in various applications including photogrammetry, urban and regional planning, computer games, etc.. They expand the visualization and analysis capabilities of Geographic Information Systems on cities, and they can be developed using web standards. However, these 3D city models consume much more storage compared to two dimensional (2D) spatial data. They involve extra geometrical and topological information together with semantic data. Without a proper spatial data clustering method and its corresponding spatial data access method, retrieving portions of and especially searching these 3D city models, will not be done optimally. Even though current developments are based on an open data model allotted by the Open Geospatial Consortium (OGC) called CityGML, its XML-based structure makes it challenging to cluster the 3D urban objects. In this research, we propose an opponent data constellation technique of space-filling curves (3D Hilbert curves) for 3D city model data representation. Unlike previous methods, that try to project 3D or n-dimensional data down to 2D or 3D using Principal Component Analysis (PCA) or Hilbert mappings, in this research, we extend the Hilbert space-filling curve to one higher dimension for 3D city model data implementations. The query performance was tested using a CityGML dataset of 1,000 building blocks and the results are presented in this paper. The advantages of implementing space-filling curves in 3D city modeling will improve data retrieval time by means of optimized 3D adjacency, nearest neighbor information and 3D indexing. The Hilbert mapping, which maps a subinterval of the [0, 1] interval to the corresponding portion of the d-dimensional Hilbert's curve, preserves the Lebesgue measure and is Lipschitz continuous. Depending on the applications, several alternatives are possible in order to cluster spatial data together in the third dimension compared to its clustering in 2D.
A Sequential Ensemble Prediction System at Convection Permitting Scales

NASA Astrophysics Data System (ADS)

Milan, M.; Simmer, C.

2012-04-01

A Sequential Assimilation Method (SAM) following some aspects of particle filtering with resampling, also called SIR (Sequential Importance Resampling), is introduced and applied in the framework of an Ensemble Prediction System (EPS) for weather forecasting on convection permitting scales, with focus to precipitation forecast. At this scale and beyond, the atmosphere increasingly exhibits chaotic behaviour and non linear state space evolution due to convectively driven processes. One way to take full account of non linear state developments are particle filter methods, their basic idea is the representation of the model probability density function by a number of ensemble members weighted by their likelihood with the observations. In particular particle filter with resampling abandons ensemble members (particles) with low weights restoring the original number of particles adding multiple copies of the members with high weights. In our SIR-like implementation we substitute the likelihood way to define weights and introduce a metric which quantifies the "distance" between the observed atmospheric state and the states simulated by the ensemble members. We also introduce a methodology to counteract filter degeneracy, i.e. the collapse of the simulated state space. To this goal we propose a combination of resampling taking account of simulated state space clustering and nudging. By keeping cluster representatives during resampling and filtering, the method maintains the potential for non linear system state development. We assume that a particle cluster with initially low likelihood may evolve in a state space with higher likelihood in a subsequent filter time thus mimicking non linear system state developments (e.g. sudden convection initiation) and remedies timing errors for convection due to model errors and/or imperfect initial condition. We apply a simplified version of the resampling, the particles with highest weights in each cluster are duplicated; for the model evolution for each particle pair one particle evolves using the forward model; the second particle, however, is nudged to the radar and satellite observation during its evolution based on the forward model.
Hydrodynamic clustering of droplets in turbulence

NASA Astrophysics Data System (ADS)

Kunnen, Rudie; Yavuz, Altug; van Heijst, Gertjan; Clercx, Herman

2017-11-01

Small, inertial particles are known to cluster in turbulent flows: particles are centrifuged out of eddies and gather in the strain-dominated regions. This so-called preferential concentration is reflected in the radial distribution function (RDF; a quantitative measure of clustering). We study clustering of water droplets in a loudspeaker-driven turbulence chamber. We track the motion of droplets in 3D and calculate the RDF. At moderate scales (a few Kolmogorov lengths) we find the typical power-law scaling of preferential concentration in the RDF. However, at even smaller scales (a few droplet diameters), we encounter a hitherto unobserved additional clustering. We postulate that the additional clustering is due to hydrodynamic interactions, an effect which is typically disregarded in modeling. Using a perturbative expansion of inertial effects in a Stokes-flow description of two interacting spheres, we obtain an expression for the RDF which indeed includes the additional clustering. The additional clustering enhances the collision probability of droplets, which enhances their growth rate due to coalescence. The additional clustering is thus an essential effect in precipitation modeling.
Open clusters. III. Fundamental parameters of B stars in NGC 6087, NGC 6250, NGC 6383, and NGC 6530 B-type stars with circumstellar envelopes

NASA Astrophysics Data System (ADS)

Aidelman, Y.; Cidale, L. S.; Zorec, J.; Panei, J. A.

2018-02-01

Context. Stellar physical properties of star clusters are poorly known and the cluster parameters are often very uncertain. Methods: Our goals are to perform a spectrophotometric study of the B star population in open clusters to derive accurate stellar parameters, search for the presence of circumstellar envelopes, and discuss the characteristics of these stars. The BCD spectrophotometric system is a powerful method to obtain stellar fundamental parameters from direct measurements of the Balmer discontinuity. To this end, we wrote the interactive code MIDE3700. The BCD parameters can also be used to infer the main properties of open clusters: distance modulus, color excess, and age. Furthermore, we inspected the Balmer discontinuity to provide evidence for the presence of circumstellar disks and identify Be star candidates. We used an additional set of high-resolution spectra in the Hα region to confirm the Be nature of these stars. Results: We provide Teff, log g, Mv, Mbol, and spectral types for a sample of 68 stars in the field of the open clusters NGC 6087, NGC 6250, NGC 6383, and NGC 6530, as well as the cluster distances, ages, and reddening. Then, based on a sample of 230 B stars in the direction of the 11 open clusters studied along this series of three papers, we report 6 new Be stars, 4 blue straggler candidates, and 15 B-type stars (called Bdd) with a double Balmer discontinuity, which indicates the presence of circumstellar envelopes. We discuss the distribution of the fraction of B, Be, and Bdd star cluster members per spectral subtype. The majority of the Be stars are dwarfs and present a maximum at the spectral type B2-B4 in young and intermediate-age open clusters (<40 Myr). Another maximum of Be stars is observed at the spectral type B6-B8 in open clusters older than 40 Myr, where the population of Bdd stars also becomes relevant. The Bdd stars seem to be in a passive emission phase. Conclusions: Our results support previous statements that the Be phenomenon is present along the whole main sequence band and occurs in very different evolutionary states. We find clear evidence of an increase of stars with circumstellar envelopes with cluster age. The Be phenomenon reaches its maximum in clusters of intermediate age (10-40 Myr) and the number of B stars with circumstellar envelopes (Be plus Bdd stars) is also high for the older clusters (40-100 Myr). Observations taken at CASLEO, operating under agreement of CONICET and the Universities of La Plata, Córdoba, and San Juan, Argentina.Tables 1, 2, 9-16 are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/610/A30
Double Cluster Heads Model for Secure and Accurate Data Fusion in Wireless Sensor Networks

PubMed Central

Fu, Jun-Song; Liu, Yun

2015-01-01

Secure and accurate data fusion is an important issue in wireless sensor networks (WSNs) and has been extensively researched in the literature. In this paper, by combining clustering techniques, reputation and trust systems, and data fusion algorithms, we propose a novel cluster-based data fusion model called Double Cluster Heads Model (DCHM) for secure and accurate data fusion in WSNs. Different from traditional clustering models in WSNs, two cluster heads are selected after clustering for each cluster based on the reputation and trust system and they perform data fusion independently of each other. Then, the results are sent to the base station where the dissimilarity coefficient is computed. If the dissimilarity coefficient of the two data fusion results exceeds the threshold preset by the users, the cluster heads will be added to blacklist, and the cluster heads must be reelected by the sensor nodes in a cluster. Meanwhile, feedback is sent from the base station to the reputation and trust system, which can help us to identify and delete the compromised sensor nodes in time. Through a series of extensive simulations, we found that the DCHM performed very well in data fusion security and accuracy. PMID:25608211

Is There a Typology of Teacher and Leader Responders to Call and Do They Cluster in Different Types of Schools? A Two-Level Latent Class Analysis of Call Survey Data

ERIC Educational Resources Information Center

Bowers, Alex J.; Blitz, Mark; Modeste, Marsha; Salisbury, Jason; Halverson, Richard R.

2017-01-01

Background: Across the recent research on school leadership, leadership for learning has emerged as a strong framework for integrating current theories, such as instructional, transformational, and distributed leadership as well as effective human resource practices, instructional evaluation, and resource allocation. Yet, questions remain as to…
Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data

PubMed Central

Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.

2003-01-01

Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292
The Cluster Concept and Community Services

ERIC Educational Resources Information Center

Collins, Charles; Tillery, Dale

1972-01-01

An innovative design for the organization of community services and continuing education in a community college district calls for the centralization of administration and the decentralization of program offerings. (RN)
Hydrophobic cluster analysis of G protein-coupled receptors: a powerful tool to derive structural and functional information from 2D-representation of protein sequences.

PubMed

Lentes, K U; Mathieu, E; Bischoff, R; Rasmussen, U B; Pavirani, A

1993-01-01

Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555-574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis. HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.
First LOCSMITH locations of deep moonquakes

NASA Astrophysics Data System (ADS)

Hempel, S.; Knapmeyer, M.; Sens-Schönfelder, C.; Oberst, J.

2008-09-01

Introduction Several thousand seismic events were recorded by the Apollo seismic network from 19691977. Different types of events can be distinguished: meteoroid impacts, thermal quakes and internally caused moonquakes. The latter subdivide into shallow (100 to 300km) and deep moonquakes (700 to 1100km), which are by far the most common events. The deep quakes would be no immediate danger to inhabitated stations on the Earth's Moon because of their relatively low magnitude and great depth. However, they bear important information on lunar structure and evolution, and their distribution probably reflects their source mechanism. In this study, we reinvestigate location patterns of deep lunar quakes. LOCSMITH The core of this study is a new location method (LOCSMITH, [1]). This algorithm uses time intervals rather than time instants as input, which contain the dedicated arrival with probability 1. LOCSMITH models and compares theoretical and actual travel times on a global scale and uses an adaptive grid to search source locations compatible with all observations. The output is a set of all possible hypocenters for the considered region of repeating, tidally triggered moonquake activity, called clusters. The shape and size of these sets gives a better estimate of the location uncertainty than the formal standard deviations returned by classical methods. This is used for grading of deep moonquake clusters according to the currently available data quality. Classification of deep moonquakes As first step, we establish a reciprocal dependence of size and shape of LOCSMITH location clouds on number of arrivals. Four different shapes are recognized, listed here in an order corresponding to decreasing spatial resolution: 1. "Balls", which are well defined and relatively small types of sets resembling the commonly assumed error ellipsoid. These are found in the best cases with many observations. Locations in this shape are obtained for clusters 1, 18 or 33, these were already well located by earlier works [2,3]. 2. The next best shape of a location set is the "banana" as found for clusters 5, 39 or 53 [Fig. 1]. In this case, only limited depth resolution is available, and the solution spreads over a large volume. The size of a "banana" could be minimized by either finding a not yet discovered shear wave arrival or estimating a S arrival time interval by considering the coda instead of a clear S arrival. 3. Shape of clouds we call "cones" are formed by clusters for which no compressional wave arrivals, but three S arrivals were picked. Such solutions were found for clusters 35, 201 or 218 [Fig. 2]. A depth limitation is given only by the surface of the Moon's far side. In previous works, locations of these clusters were usually determined with a fixed depth, thus neglecting all depth uncertainty [2]. 4. The fourth and worst class shows a "disc"like shape with no depth resolution and almost no latitude resolution. Clusters of this class, like 4, 23 or 43, were not located so far. From class 1 ("ball") to 4 ("disc") the amount of possible hypocenters increases. So we also found a correlation between size and shape of volumes containing possible hypocenter solutions. Aim We classified all clusters according to the solution set scheme by using arrival times of [2] with an estimated error of ±10s as input for LOCSMITH. We reprocess selected clusters of each class to come up with the special requirements and possibilities of this new location method. As said above, one of the requirements of LOCSMITH is the definition of a time interval instead of a time instant for input, and an interesting option is using an estimated S arrival time interval derived from coda and scattering model, lacking a clear S arrival. We try to find fully automated methods for each processing step, dependent on the quality of data. Methods For despiking we merged methods by [4] and [5] and achieve very good results even for worst case as already presented in [6]. Prior to stacking we developed a complex multiparameter correlation algorithm to calculate the optimum time shift. Results We present relocations of selected deep moonquakes in context of data availability and quality. Previous locations are often contained in our location clouds, but realistic location uncertainties allow large deviations from the best fitting solutions, including locations on the far side of the Moon. Perspective By developing new methods for data processing and using the LOCSMITH locating algorithm we hope to reduce the location uncertainty sufficiently to make sure that all sources are on the near side, or to prove a far side origin of some of them. This would answer questions of hemispheric symmetry of lunar deep seismicity and the Moon's internal structure. References [1] Knapmeyer (2008) accepted to GJI. [2] Nakamura (2005) JGR, 110, E01001. [3] Lognonné (2003) EPSL, 211, 2744. [4] Bulow (2005) JGR, 110, E10003. [5] Sonnemann (2005) EGU05A07960. [6] Hempel, Knapmeyer, Oberst (2008) EGU2008A07989.
A scalable and practical one-pass clustering algorithm for recommender system

NASA Astrophysics Data System (ADS)

Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali

2015-12-01

KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Clustering in Cell Cycle Dynamics with General Response/Signaling Feedback

PubMed Central

Young, Todd R.; Fernandez, Bastien; Buckalew, Richard; Moses, Gregory; Boczko, Erik M.

2011-01-01

Motivated by experimental and theoretical work on autonomous oscillations in yeast, we analyze ordinary differential equations models of large populations of cells with cell-cycle dependent feedback. We assume a particular type of feedback that we call Responsive/Signaling (RS), but do not specify a functional form of the feedback. We study the dynamics and emergent behaviour of solutions, particularly temporal clustering and stability of clustered solutions. We establish the existence of certain periodic clustered solutions as well as “uniform” solutions and add to the evidence that cell-cycle dependent feedback robustly leads to cell-cycle clustering. We highlight the fundamental differences in dynamics between systems with negative and positive feedback. For positive feedback systems the most important mechanism seems to be the stability of individual isolated clusters. On the other hand we find that in negative feedback systems, clusters must interact with each other to reinforce coherence. We conclude from various details of the mathematical analysis that negative feedback is most consistent with observations in yeast experiments. PMID:22001733
Multiple valence superatoms.

PubMed

Reveles, J U; Khanna, S N; Roach, P J; Castleman, A W

2006-12-05

We recently demonstrated that, in gas phase clusters containing aluminum and iodine atoms, an Al(13) cluster behaves like a halogen atom, whereas an Al(14) cluster exhibits properties analogous to an alkaline earth atom. These observations, together with our findings that Al(13)(-) is inert like a rare gas atom, have reinforced the idea that chosen clusters can exhibit chemical behaviors reminiscent of atoms in the periodic table, offering the exciting prospect of a new dimension of the periodic table formed by cluster elements, called superatoms. As the behavior of clusters can be controlled by size and composition, the superatoms offer the potential to create unique compounds with tailored properties. In this article, we provide evidence of an additional class of superatoms, namely Al(7)(-), that exhibit multiple valences, like some of the elements in the periodic table, and hence have the potential to form stable compounds when combined with other atoms. These findings support the contention that there should be no limitation in finding clusters, which mimic virtually all members of the periodic table.
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop

NASA Astrophysics Data System (ADS)

Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.

2018-04-01

The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
Lifetime assessment by intermittent inspection under the mixture Weibull power law model with application to XLPE cables.

PubMed

Hirose, H

1997-01-01

This paper proposes a new treatment for electrical insulation degradation. Some types of insulation which have been used under various circumstances are considered to degrade at various rates in accordance with their stress circumstances. The cross-linked polyethylene (XLPE) insulated cables inspected by major Japanese electric companies clearly indicate such phenomena. By assuming that the inspected specimen is sampled from one of the clustered groups, a mixed degradation model can be constructed. Since the degradation of the insulation under common circumstances is considered to follow a Weibull distribution, a mixture model and a Weibull power law can be combined. This is called The mixture Weibull power law model. By using the maximum likelihood estimation for the newly proposed model to Japanese 22 and 33 kV insulation class cables, they are clustered into a certain number of groups by using the AIC and the generalized likelihood ratio test method. The reliability of the cables at specified years are assessed.
Andrzej PȨKALSKI Networks of Scientific Interests with Internal Degrees of Freedom Through Self-Citation Analysis

NASA Astrophysics Data System (ADS)

Ausloos, M.; Lambiotte, R.; Scharnhorst, A.; Hellsten, I.

Old and recent theoretical works by Andrzej Pȩkalski (APE) are recalled as possible sources of interest for describing network formation and clustering in complex (scientific) communities, through self-organization and percolation processes. Emphasis is placed on APE self-citation network over four decades. The method is that used for detecting scientists' field mobility by focusing on author's self-citation, co-authorships and article topics networks as in Refs. 1 and 2. It is shown that APE's self-citation patterns reveal important information on APE interest for research topics over time as well as APE engagement on different scientific topics and in different networks of collaboration. Its interesting complexity results from "degrees of freedom" and external fields leading to so called internal shock resistance. It is found that APE network of scientific interests belongs to independent clusters and occurs through rare or drastic events as in irreversible "preferential attachment processes", similar to those found in usual mechanics and thermodynamics phase transitions.
Characterization of essential proteins based on network topology in proteins interaction networks

NASA Astrophysics Data System (ADS)

Bakar, Sakhinah Abu; Taheri, Javid; Zomaya, Albert Y.

2014-06-01

The identification of essential proteins is theoretically and practically important as (1) it is essential to understand the minimal surviving requirements for cellular lives, and (2) it provides fundamental for development of drug. As conducting experimental studies to identify essential proteins are both time and resource consuming, here we present a computational approach in predicting them based on network topology properties from protein-protein interaction networks of Saccharomyces cerevisiae. The proposed method, namely EP3NN (Essential Proteins Prediction using Probabilistic Neural Network) employed a machine learning algorithm called Probabilistic Neural Network as a classifier to identify essential proteins of the organism of interest; it uses degree centrality, closeness centrality, local assortativity and local clustering coefficient of each protein in the network for such predictions. Results show that EP3NN managed to successfully predict essential proteins with an accuracy of 95% for our studied organism. Results also show that most of the essential proteins are close to other proteins, have assortativity behavior and form clusters/sub-graph in the network.
Examination of evidence for collinear cluster tri-partition

NASA Astrophysics Data System (ADS)

Pyatkov, Yu. V.; Kamanin, D. V.; Alexandrov, A. A.; Alexandrova, I. A.; Goryainova, Z. I.; Malaza, V.; Mkaza, N.; Kuznetsova, E. A.; Strekalovsky, A. O.; Strekalovsky, O. V.; Zhuchko, V. E.

2017-12-01

Background: In a series of experiments at different time-of-flight spectrometers of heavy ions we have observed manifestations of a new at least ternary decay channel of low excited heavy nuclei. Due to specific features of the effect, it was called collinear cluster tri-partition (CCT). The obtained experimental results have initiated a number of theoretical articles dedicated to different aspects of the CCT. Special attention was paid to kinematics constraints and stability of collinearity. Purpose: To compare theoretical predictions with our experimental data, only partially published so far. To develop the model of one of the most populated CCT modes that gives rise to the so-called "Ni-bump." Method: The fission events under analysis form regular two-dimensional linear structures in the mass correlation distributions of the fission fragments. The structures were revealed both at a highly statistically reliable level but on the background substrate, and at the low statistics in almost noiseless distribution. The structures are bounded by the known magic fragments and were reproduced at different spectrometers. All this provides high reliability of our experimental findings. The model of the CCT proposed here is based on theoretical results, published recently, and the detailed analysis of all available experimental data. Results: Under our model, the CCT mode giving rise to the Ni bump occurs as a two-stage breakup of the initial three body chain like the nuclear configuration with an elongated central cluster. After the first scission at the touching point with one of the side clusters, the predominantly heavier one, the deformation energy of the central cluster allows the emission of up to four neutrons flying apart isotropically. The heavy side cluster and a dinuclear system, consisting of the light side cluster and the central one, relaxed to a less elongated shape, are accelerated in the mutual Coulomb field. The "tip" of the dinuclear system at the moment of its rupture faces the heavy fragment or the opposite direction due to a single turn of the system around its center of gravity. Conclusions: Additional experimental information regarding the energies of the CCT partners and the proposed model of the process respond to criticisms concerning the kinematic constraints and the stability of collinearity in the CCT. The octupole deformed system formed after the first scission is oriented along the fission axis, and its rupture occurs predominantly after the full acceleration. Noncollinear true ternary fission and far asymmetric binary fission, observed earlier, appear to be the special cases of the decay of the prescission configuration leading to the CCT. Detection of the Ni-7268 fission fragments with a kinetic energy E <25 MeV at the mass-separator Lohengrin is proposed for an independent experimental verification of the CCT.
Encoding the local connectivity patterns of fMRI for cognitive task and state classification.

PubMed

Onal Ertugrul, Itir; Ozay, Mete; Yarman Vural, Fatos T

2018-06-15

In this work, we propose a novel framework to encode the local connectivity patterns of brain, using Fisher vectors (FV), vector of locally aggregated descriptors (VLAD) and bag-of-words (BoW) methods. We first obtain local descriptors, called mesh arc descriptors (MADs) from fMRI data, by forming local meshes around anatomical regions, and estimating their relationship within a neighborhood. Then, we extract a dictionary of relationships, called brain connectivity dictionary by fitting a generative Gaussian mixture model (GMM) to a set of MADs, and selecting codewords at the mean of each component of the mixture. Codewords represent connectivity patterns among anatomical regions. We also encode MADs by VLAD and BoW methods using k-Means clustering. We classify cognitive tasks using the Human Connectome Project (HCP) task fMRI dataset and cognitive states using the Emotional Memory Retrieval (EMR). We train support vector machines (SVMs) using the encoded MADs. Results demonstrate that, FV encoding of MADs can be successfully employed for classification of cognitive tasks, and outperform VLAD and BoW representations. Moreover, we identify the significant Gaussians in mixture models by computing energy of their corresponding FV parts, and analyze their effect on classification accuracy. Finally, we suggest a new method to visualize the codewords of the learned brain connectivity dictionary.
Rain volume estimation over areas using satellite and radar data

NASA Technical Reports Server (NTRS)

Doneaud, A. A.; Vonderhaar, T. H.

1985-01-01

The feasibility of rain volume estimation over fixed and floating areas was investigated using rapid scan satellite data following a technique recently developed with radar data, called the Area Time Integral (ATI) technique. The radar and rapid scan GOES satellite data were collected during the Cooperative Convective Precipitation Experiment (CCOPE) and North Dakota Cloud Modification Project (NDCMP). Six multicell clusters and cells were analyzed to the present time. A two-cycle oscillation emphasizing the multicell character of the clusters is demonstrated. Three clusters were selected on each day, 12 June and 2 July. The 12 June clusters occurred during the daytime, while the 2 July clusters during the nighttime. A total of 86 time steps of radar and 79 time steps of satellite images were analyzed. There were approximately 12-min time intervals between radar scans on the average.
Cluster structure in the correlation coefficient matrix can be characterized by abnormal eigenvalues

NASA Astrophysics Data System (ADS)

Nie, Chun-Xiao

2018-02-01

In a large number of previous studies, the researchers found that some of the eigenvalues of the financial correlation matrix were greater than the predicted values of the random matrix theory (RMT). Here, we call these eigenvalues as abnormal eigenvalues. In order to reveal the hidden meaning of these abnormal eigenvalues, we study the toy model with cluster structure and find that these eigenvalues are related to the cluster structure of the correlation coefficient matrix. In this paper, model-based experiments show that in most cases, the number of abnormal eigenvalues of the correlation matrix is equal to the number of clusters. In addition, empirical studies show that the sum of the abnormal eigenvalues is related to the clarity of the cluster structure and is negatively correlated with the correlation dimension.
Speaker Linking and Applications using Non-Parametric Hashing Methods

DTIC Science & Technology

2016-09-08

clustering method based on hashing—canopy- clustering . We apply this method to a large corpus of speaker recordings, demonstrate performance tradeoffs...and compare to other hash- ing methods. Index Terms: speaker recognition, clustering , hashing, locality sensitive hashing. 1. Introduction We assume...speaker in our corpus. Second, given a QBE method, how can we perform speaker clustering —each clustering should be a single speaker, and a cluster should
Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

PubMed

Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

2017-01-01

Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys

PubMed Central

Hund, Lauren; Bedrick, Edward J.; Pagano, Marcello

2015-01-01

Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis. PMID:26125967
Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys.

PubMed

Hund, Lauren; Bedrick, Edward J; Pagano, Marcello

2015-01-01

Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis.

Spotting effect in microarray experiments

PubMed Central

Mary-Huard, Tristan; Daudin, Jean-Jacques; Robin, Stéphane; Bitton, Frédérique; Cabannes, Eric; Hilson, Pierre

2004-01-01

Background Microarray data must be normalized because they suffer from multiple biases. We have identified a source of spatial experimental variability that significantly affects data obtained with Cy3/Cy5 spotted glass arrays. It yields a periodic pattern altering both signal (Cy3/Cy5 ratio) and intensity across the array. Results Using the variogram, a geostatistical tool, we characterized the observed variability, called here the spotting effect because it most probably arises during steps in the array printing procedure. Conclusions The spotting effect is not appropriately corrected by current normalization methods, even by those addressing spatial variability. Importantly, the spotting effect may alter differential and clustering analysis. PMID:15151695
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm

NASA Astrophysics Data System (ADS)

Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian

2017-03-01

DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
GibbsCluster: unsupervised clustering and alignment of peptide sequences.

PubMed

Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

2017-07-03

Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Music viewed by its entropy content: A novel window for comparative analysis

PubMed Central

Febres, Gerardo; Jaffe, Klaus

2017-01-01

Polyphonic music files were analyzed using the set of symbols that produced the Minimal Entropy Description, which we call the Fundamental Scale. This allowed us to create a novel space to represent music pieces by developing: (a) a method to adjust a textual description from its original scale of observation to an arbitrarily selected scale, (b) a method to model the structure of any textual description based on the shape of the symbol frequency profiles, and (c) the concept of higher order entropy as the entropy associated with the deviations of a frequency-ranked symbol profile from a perfect Zipfian profile. We call this diversity index the ‘2nd Order Entropy’. Applying these methods to a variety of musical pieces showed how the space of ‘symbolic specific diversity-entropy’ and that of ‘2nd order entropy’ captures characteristics that are unique to each music type, style, composer and genre. Some clustering of these properties around each musical category is shown. These methods allow us to visualize a historic trajectory of academic music across this space, from medieval to contemporary academic music. We show that the description of musical structures using entropy, symbol frequency profiles and specific symbolic diversity allows us to characterize traditional and popular expressions of music. These classification techniques promise to be useful in other disciplines for pattern recognition and machine learning. PMID:29040288
Music viewed by its entropy content: A novel window for comparative analysis.

PubMed

Febres, Gerardo; Jaffe, Klaus

2017-01-01

Polyphonic music files were analyzed using the set of symbols that produced the Minimal Entropy Description, which we call the Fundamental Scale. This allowed us to create a novel space to represent music pieces by developing: (a) a method to adjust a textual description from its original scale of observation to an arbitrarily selected scale, (b) a method to model the structure of any textual description based on the shape of the symbol frequency profiles, and (c) the concept of higher order entropy as the entropy associated with the deviations of a frequency-ranked symbol profile from a perfect Zipfian profile. We call this diversity index the '2nd Order Entropy'. Applying these methods to a variety of musical pieces showed how the space of 'symbolic specific diversity-entropy' and that of '2nd order entropy' captures characteristics that are unique to each music type, style, composer and genre. Some clustering of these properties around each musical category is shown. These methods allow us to visualize a historic trajectory of academic music across this space, from medieval to contemporary academic music. We show that the description of musical structures using entropy, symbol frequency profiles and specific symbolic diversity allows us to characterize traditional and popular expressions of music. These classification techniques promise to be useful in other disciplines for pattern recognition and machine learning.
Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium

PubMed Central

Grove, Megan L.; Yu, Bing; Cochran, Barbara J.; Haritunians, Talin; Bis, Joshua C.; Taylor, Kent D.; Hansen, Mark; Borecki, Ingrid B.; Cupples, L. Adrienne; Fornage, Myriam; Gudnason, Vilmundur; Harris, Tamara B.; Kathiresan, Sekar; Kraaij, Robert; Launer, Lenore J.; Levy, Daniel; Liu, Yongmei; Mosley, Thomas; Peloso, Gina M.; Psaty, Bruce M.; Rich, Stephen S.; Rivadeneira, Fernando; Siscovick, David S.; Smith, Albert V.; Uitterlinden, Andre; van Duijn, Cornelia M.; Wilson, James G.; O’Donnell, Christopher J.; Rotter, Jerome I.; Boerwinkle, Eric

2013-01-01

Genotyping arrays are a cost effective approach when typing previously-identified genetic polymorphisms in large numbers of samples. One limitation of genotyping arrays with rare variants (e.g., minor allele frequency [MAF] <0.01) is the difficulty that automated clustering algorithms have to accurately detect and assign genotype calls. Combining intensity data from large numbers of samples may increase the ability to accurately call the genotypes of rare variants. Approximately 62,000 ethnically diverse samples from eleven Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium cohorts were genotyped with the Illumina HumanExome BeadChip across seven genotyping centers. The raw data files for the samples were assembled into a single project for joint calling. To assess the quality of the joint calling, concordance of genotypes in a subset of individuals having both exome chip and exome sequence data was analyzed. After exclusion of low performing SNPs on the exome chip and non-overlap of SNPs derived from sequence data, genotypes of 185,119 variants (11,356 were monomorphic) were compared in 530 individuals that had whole exome sequence data. A total of 98,113,070 pairs of genotypes were tested and 99.77% were concordant, 0.14% had missing data, and 0.09% were discordant. We report that joint calling allows the ability to accurately genotype rare variation using array technology when large sample sizes are available and best practices are followed. The cluster file from this experiment is available at www.chargeconsortium.com/main/exomechip. PMID:23874508
The best of both worlds: Building on the COPUS and RTOP observation protocols to easily and reliably measure various levels of reformed instructional practice.

PubMed

Lund, Travis J; Pilarz, Matthew; Velasco, Jonathan B; Chakraverty, Devasmita; Rosploch, Kaitlyn; Undersander, Molly; Stains, Marilyne

2015-01-01

Researchers, university administrators, and faculty members are increasingly interested in measuring and describing instructional practices provided in science, technology, engineering, and mathematics (STEM) courses at the college level. Specifically, there is keen interest in comparing instructional practices between courses, monitoring changes over time, and mapping observed practices to research-based teaching. While increasingly common observation protocols (Reformed Teaching Observation Protocol [RTOP] and Classroom Observation Protocol in Undergraduate STEM [COPUS]) at the postsecondary level help achieve some of these goals, they also suffer from weaknesses that limit their applicability. In this study, we leverage the strengths of these protocols to provide an easy method that enables the reliable and valid characterization of instructional practices. This method was developed empirically via a cluster analysis using observations of 269 individual class periods, corresponding to 73 different faculty members, 28 different research-intensive institutions, and various STEM disciplines. Ten clusters, called COPUS profiles, emerged from this analysis; they represent the most common types of instructional practices enacted in the classrooms observed for this study. RTOP scores were used to validate the alignment of the 10 COPUS profiles with reformed teaching. Herein, we present a detailed description of the cluster analysis method, the COPUS profiles, and the distribution of the COPUS profiles across various STEM courses at research-intensive universities. © 2015 T. J. Lund et al. CBE—Life Sciences Education © 2015 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Vaccines for preventing anthrax.

PubMed

Donegan, Sarah; Bellamy, Richard; Gamble, Carrol L

2009-04-15

Anthrax is a bacterial zoonosis that occasionally causes human disease and is potentially fatal. Anthrax vaccines include a live-attenuated vaccine, an alum-precipitated cell-free filtrate vaccine, and a recombinant protein vaccine. To evaluate the effectiveness, immunogenicity, and safety of vaccines for preventing anthrax. We searched the following databases (November 2008): Cochrane Infectious Diseases Group Specialized Register; CENTRAL (The Cochrane Library 2008, Issue 4); MEDLINE; EMBASE; LILACS; and mRCT. We also searched reference lists. We included randomized controlled trials (RCTs) of individuals and cluster-RCTs comparing anthrax vaccine with placebo, other (non-anthrax) vaccines, or no intervention; or comparing administration routes or treatment regimens of anthrax vaccine. Two authors independently considered trial eligibility, assessed risk of bias, and extracted data. We presented cases of anthrax and seroconversion rates using risk ratios (RR) and 95% confidence intervals (CI). We summarized immunoglobulin G (IgG) concentrations using geometric means. We carried out a sensitivity analysis to investigate the effect of clustering on the results from one cluster-RCT. No meta-analysis was undertaken. One cluster-RCT (with 157,259 participants) and four RCTs of individuals (1917 participants) met the inclusion criteria. The cluster-RCT from the former USSR showed that, compared with no vaccine, a live-attenuated vaccine (called STI) protected against clinical anthrax whether given by a needleless device (RR 0.16; 102,737 participants, 154 clusters) or the scarification method (RR 0.25; 104,496 participants, 151 clusters). Confidence intervals were statistically significant in unadjusted calculations, but when a small amount of association within clusters was assumed, the differences were not statistically significant. The four RCTs (of individuals) of inactivated vaccines (anthrax vaccine absorbed and recombinant protective antigen) showed a dose response relationship for the anti-protective antigen IgG antibody titre. Intramuscular administration was associated with fewer injection site reactions than subcutaneous injection, and injection site reaction rates were lower when the dosage interval was longer. One cluster-RCT provides limited evidence that a live-attenuated vaccine is effective in preventing cutaneous anthrax. Vaccines based on anthrax antigens are immunogenic in most vaccinees with few adverse events or reactions. Ongoing randomized controlled trials are investigating the immunogenicity and safety of anthrax vaccines.
Omega Centauri Looks Radiant in Infrared

NASA Image and Video Library

2008-04-10

A cluster brimming with millions of stars glistens like an iridescent opal in this image from NASA Spitzer Space Telescope. Called Omega Centauri, the sparkling orb of stars is like a miniature galaxy.
Frontal Lobe Seizures

MedlinePlus

... cause of frontal lobe epilepsy remains unknown. Complications Status epilepticus. Frontal lobe seizures tend to occur in clusters and may provoke a dangerous condition called status epilepticus — in which seizure activity lasts much longer than ...
Semi-supervised clustering methods.

PubMed

Bair, Eric

2013-01-01

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.
Clonal Selection Based Artificial Immune System for Generalized Pattern Recognition

NASA Technical Reports Server (NTRS)

Huntsberger, Terry

2011-01-01

The last two decades has seen a rapid increase in the application of AIS (Artificial Immune Systems) modeled after the human immune system to a wide range of areas including network intrusion detection, job shop scheduling, classification, pattern recognition, and robot control. JPL (Jet Propulsion Laboratory) has developed an integrated pattern recognition/classification system called AISLE (Artificial Immune System for Learning and Exploration) based on biologically inspired models of B-cell dynamics in the immune system. When used for unsupervised or supervised classification, the method scales linearly with the number of dimensions, has performance that is relatively independent of the total size of the dataset, and has been shown to perform as well as traditional clustering methods. When used for pattern recognition, the method efficiently isolates the appropriate matches in the data set. The paper presents the underlying structure of AISLE and the results from a number of experimental studies.
Decision Making Based on Fuzzy Aggregation Operators for Medical Diagnosis from Dental X-ray images.

PubMed

Ngan, Tran Thi; Tuan, Tran Manh; Son, Le Hoang; Minh, Nguyen Hai; Dey, Nilanjan

2016-12-01

Medical diagnosis is considered as an important step in dentistry treatment which assists clinicians to give their decision about diseases of a patient. It has been affirmed that the accuracy of medical diagnosis, which is much influenced by the clinicians' experience and knowledge, plays an important role to effective treatment therapies. In this paper, we propose a novel decision making method based on fuzzy aggregation operators for medical diagnosis from dental X-Ray images. It firstly divides a dental X-Ray image into some segments and identified equivalent diseases by a classification method called Affinity Propagation Clustering (APC+). Lastly, the most potential disease is found using fuzzy aggregation operators. The experimental validation on real dental datasets of Hanoi Medical University Hospital, Vietnam showed the superiority of the proposed method against the relevant ones in terms of accuracy.
A spatial scan statistic for multiple clusters.

PubMed

Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

2011-10-01

Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.
Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

PubMed

Xiao, Yongling; Abrahamowicz, Michal

2010-03-30

We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

PubMed Central

Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

2017-01-01

We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392
Hubble Spies Spectacular Sombrero

NASA Image and Video Library

2005-05-05

Lying at the southern edge of the rich Virgo cluster of galaxies, Messier 104, also called the Sombrero galaxy, is one of the most famous objects in the sky in this image from NASA Hubble Space Telescope.
Advances in the understanding of cluster headache.

PubMed

Leone, Massimo; Proietti Cecchini, Alberto

2017-02-01

Cluster headache is the worst primary headache form; it occurs in paroxysmal excruciatingly severe unilateral head pain attacks usually grouped in cluster periods. The familial occurrence of the disease indicates a genetic component but a gene abnormality is yet to be disclosed. Activation of trigeminal afferents and cranial parasympathetic efferents, the so-called trigemino-parasympathetic reflex, can explain pain and accompanying oculo-facial autonomic phenomena. In particular, pain in cluster headache is attributed, at least in part, to the increased CGRP plasma levels released by activated trigeminal system. Posterior hypothalamus was hypothesized to be the cluster generator activating the trigemino-parasympathetic reflex. Efficacy of monoclonal antibodies against CRGP is under investigation in randomized clinical trials. Areas covered: This paper will focus on main findings contributing to consider cluster headache as a neurovascular disorder with an origin from within the brain. Expert commentary: Accumulated evidence with hypothalamic stimulation in cluster headache patients indicate that posterior hypothalamus terminates rather than triggers the attacks. More extensive studies on the genetics of cluster headache are necessary to disclose anomalies behind the increased familial risk of the disease. Results from ongoing clinical trials in cluster headache sufferers using monoclonal antibodies against CGRP will open soon a new era.
A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O(kn) time, where k denotes the current number of centers. Traditional techniques for accelerating nearest neighbor searching involve storing the k centers in a data structure. However, because of the iterative nature of the algorithm, this data structure would need to be rebuilt with each new iteration. Our approach is to store the data points in a kd-tree data structure. The assignment of points to nearest neighbors is carried out by a filtering process, which successively eliminates centers that can not possibly be the nearest neighbor for a given region of space. This algorithm is significantly faster, because large groups of data points can be assigned to their nearest center in a single operation. Preliminary results on a number of real Landsat datasets show that our revised ISOCLUS-like scheme runs about twice as fast.
Complex patchy colloids shaped from deformable seed particles through capillary interactions.

PubMed

Meester, V; Kraft, D J

2018-02-14

We investigate the mechanisms underlying the reconfiguration of random aggregates of spheres through capillary interactions, the so-called "colloidal recycling" method, to fabricate a wide variety of patchy particles. We explore the influence of capillary forces on clusters of deformable seed particles by systematically varying the crosslink density of the spherical seeds. Spheres with a poorly crosslinked polymer network strongly deform due to capillary forces and merge into large spheres. With increasing crosslink density and therefore rigidity, the shape of the spheres is increasingly preserved during reconfiguration, yielding patchy particles of well-defined shape for up to five spheres. In particular, we find that the aspect ratio between the length and width of dumbbells, L/W, increases with the crosslink density (cd) as L/W = B - A·exp(-cd/C). For clusters consisting of more than five spheres, the particle deformability furthermore determines the patch arrangement of the resulting particles. The reconfiguration pathway of clusters of six densely or poorly crosslinked seeds leads to octahedral and polytetrahedral shaped patchy particles, respectively. For seven particles several geometries were obtained with a preference for pentagonal dipyramids by the rigid spheres, while the soft spheres do rarely arrive in these structures. Even larger clusters of over 15 particles form non-uniform often aspherical shapes. We discuss that the reconfiguration pathway is largely influenced by confinement and geometric constraints. The key factor which dominates during reconfiguration depends on the deformability of the spherical seed particles.

Using the morphology and magnetic fields of tailed radio galaxies as environmental probes

NASA Astrophysics Data System (ADS)

Johnston-Hollitt, M.; Dehghan, S.; Pratley, L.

2015-03-01

Bent-tailed (BT) radio sources have long been known to trace over densities in the Universe up to z ~ 1 and there is increasing evidence this association persists out to redshifts of 2. The morphology of the jets in BT galaxies is primarily a function of the environment that they have resided in and so BTs provide invaluable clues as to their local conditions. Thus, not only can samples of BT galaxies be used as signposts of large-scale structure, but are also valuable for obtaining a statistical measurement of properties of the intra-cluster medium including the presence of cluster accretion shocks & winds, and as historical anemometers, preserving the dynamical history of their surroundings in their jets. We discuss the use of BTs to unveil large-scale structure and provide an example in which a BT was used to unlock the dynamical history of its host cluster. In addition to their use as density and dynamical indicators, BTs are useful probes of the magnetic field on their environment on scales which are inaccessible to other methods. Here we discuss a novel way in which a particular sub-class of BTs, the so-called `corkscrew' galaxies might further elucidate the coherence lengths of the magnetic fields in their vicinity. Given that BTs are estimated to make up a large population in next generation surveys we posit that the use of jets in this way could provide a unique source of environmental information for clusters and groups up to z = 2.
Identifying conserved gene clusters in the presence of homology families.

PubMed

He, Xin; Goldwasser, Michael H

2005-01-01

The study of conserved gene clusters is important for understanding the forces behind genome organization and evolution, as well as the function of individual genes or gene groups. In this paper, we present a new model and algorithm for identifying conserved gene clusters from pairwise genome comparison. This generalizes a recent model called "gene teams." A gene team is a set of genes that appear homologously in two or more species, possibly in a different order yet with the distance of adjacent genes in the team for each chromosome always no more than a certain threshold. We remove the constraint in the original model that each gene must have a unique occurrence in each chromosome and thus allow the analysis on complex prokaryotic or eukaryotic genomes with extensive paralogs. Our algorithm analyzes a pair of chromosomes in O(mn) time and uses O(m+n) space, where m and n are the number of genes in the respective chromosomes. We demonstrate the utility of our methods by studying two bacterial genomes, E. coli K-12 and B. subtilis. Many of the teams identified by our algorithm correlate with documented E. coli operons, while several others match predicted operons, previously suggested by computational techniques. Our implementation and data are publicly available at euler.slu.edu/ approximately goldwasser/homologyteams/.
Grouped fuzzy SVM with EM-based partition of sample space for clustered microcalcification detection.

PubMed

Wang, Huiya; Feng, Jun; Wang, Hongyu

2017-07-20

Detection of clustered microcalcification (MC) from mammograms plays essential roles in computer-aided diagnosis for early stage breast cancer. To tackle problems associated with the diversity of data structures of MC lesions and the variability of normal breast tissues, multi-pattern sample space learning is required. In this paper, a novel grouped fuzzy Support Vector Machine (SVM) algorithm with sample space partition based on Expectation-Maximization (EM) (called G-FSVM) is proposed for clustered MC detection. The diversified pattern of training data is partitioned into several groups based on EM algorithm. Then a series of fuzzy SVM are integrated for classification with each group of samples from the MC lesions and normal breast tissues. From DDSM database, a total of 1,064 suspicious regions are selected from 239 mammography, and the measurement of Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and EVL = TPR* 1-FPR are 0.82, 0.78, 0.14 and 0.72, respectively. The proposed method incorporates the merits of fuzzy SVM and multi-pattern sample space learning, decomposing the MC detection problem into serial simple two-class classification. Experimental results from synthetic data and DDSM database demonstrate that our integrated classification framework reduces the false positive rate significantly while maintaining the true positive rate.
HRLSim: a high performance spiking neural network simulator for GPGPU clusters.

PubMed

Minkovich, Kirill; Thibeault, Corey M; O'Brien, Michael John; Nogin, Aleksey; Cho, Youngkwan; Srinivasa, Narayan

2014-02-01

Modeling of large-scale spiking neural models is an important tool in the quest to understand brain function and subsequently create real-world applications. This paper describes a spiking neural network simulator environment called HRL Spiking Simulator (HRLSim). This simulator is suitable for implementation on a cluster of general purpose graphical processing units (GPGPUs). Novel aspects of HRLSim are described and an analysis of its performance is provided for various configurations of the cluster. With the advent of inexpensive GPGPU cards and compute power, HRLSim offers an affordable and scalable tool for design, real-time simulation, and analysis of large-scale spiking neural networks.
Physical-depth architectural requirements for generating universal photonic cluster states

NASA Astrophysics Data System (ADS)

Morley-Short, Sam; Bartolucci, Sara; Gimeno-Segovia, Mercedes; Shadbolt, Pete; Cable, Hugo; Rudolph, Terry

2018-01-01

Most leading proposals for linear-optical quantum computing (LOQC) use cluster states, which act as a universal resource for measurement-based (one-way) quantum computation. In ballistic approaches to LOQC, cluster states are generated passively from small entangled resource states using so-called fusion operations. Results from percolation theory have previously been used to argue that universal cluster states can be generated in the ballistic approach using schemes which exceed the critical threshold for percolation, but these results consider cluster states with unbounded size. Here we consider how successful percolation can be maintained using a physical architecture with fixed physical depth, assuming that the cluster state is continuously generated and measured, and therefore that only a finite portion of it is visible at any one point in time. We show that universal LOQC can be implemented using a constant-size device with modest physical depth, and that percolation can be exploited using simple pathfinding strategies without the need for high-complexity algorithms.
Free Factories: Unified Infrastructure for Data Intensive Web Services

PubMed Central

Zaranek, Alexander Wait; Clegg, Tom; Vandewege, Ward; Church, George M.

2010-01-01

We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and disk space for cache, and reserves disks in each node for high bandwidth storage. The batch processing service uses a variation of the MapReduce model. Virtualization allows every CPU in the cluster to participate in batch jobs. Each 48-node cluster can achieve 4-8 gigabytes per second of disk I/O. Our intent is to use multiple clusters to process hundreds of simultaneous requests on multi-hundred terabyte data sets. Currently, our applications achieve 1 gigabyte per second of I/O with 123 disks by scheduling batch jobs on two clusters, one of which is located in a remote data center. PMID:20514356
Composition Formulas of Inorganic Compounds in Terms of Cluster Plus Glue Atom Model.

PubMed

Ma, Yanping; Dong, Dandan; Wu, Aimin; Dong, Chuang

2018-01-16

The present paper attempts to identify the molecule-like structural units in inorganic compounds, by applying the so-called "cluster plus glue atom model". This model, originating from metallic glasses and quasi-crystals, describes any structure in terms of a nearest-neighbor cluster and a few outer-shell glue atoms, expressed in the cluster formula [cluster](glue atoms). Similar to the case for normal molecules where the charge transfer occurs within the molecule to meet the commonly known octet electron rule, the octet state is reached after matching the nearest-neighbor cluster with certain outer-shell glue atoms. These kinds of structural units contain information on local atomic configuration, chemical composition, and electron numbers, just as for normal molecules. It is shown that the formulas of typical inorganic compounds, such as fluorides, oxides, and nitrides, satisfy a similar octet electron rule, with the total number of valence electrons per unit formula being multiples of eight.
Inflation data clustering of some cities in Indonesia

NASA Astrophysics Data System (ADS)

Setiawan, Adi; Susanto, Bambang; Mahatma, Tundjung

2017-06-01

In this paper, it is presented how to cluster inflation data of cities in Indonesia by using k-means cluster method and fuzzy c-means method. The data that are used is limited to the monthly inflation data from 15 cities across Indonesia which have highest weight of donations and is supplemented with 5 cities used in the calculation of inflation in Indonesia. When they are applied into two clusters with k = 2 for k-means cluster method and c = 2, w = 1.25 for fuzzy c-means cluster method, Ambon, Manado and Jayapura tend to become one cluster (high inflation) meanwhile other cities tend to become members of other cluster (low inflation). However, if they are applied into two clusters with c=2, w=1.5, Surabaya, Medan, Makasar, Samarinda, Makasar, Manado, Ambon dan Jayapura tend to become one cluster (high inflation) meanwhile other cities tend to become members of other cluster (low inflation). Furthermore, when we use two clusters with k=3 for k-means cluster method and c=3, w = 1.25 for fuzzy c-means cluster method, Ambon tends to become member of first cluster (high inflation), Manado and Jayapura tend to become member of second cluster (moderate inflation), other cities tend to become members of third cluster (low inflation). If it is applied c=3, w = 1.5, Ambon, Manado and Jayapura tend to become member of first cluster (high inflation), Surabaya, Bandung, Medan, Makasar, Banyuwangi, Denpasar, Samarinda dan Mataram tend to become members of second cluster (moderate inflation), meanwhile other cities tend to become members of third cluster (low inflation). Similarly, interpretation can be made to the results of applying 5 clusters.
Semi-supervised clustering methods

PubMed Central

Bair, Eric

2013-01-01

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830
Fractal Clustering and Knowledge-driven Validation Assessment for Gene Expression Profiling.

PubMed

Wang, Lu-Yong; Balasubramanian, Ammaiappan; Chakraborty, Amit; Comaniciu, Dorin

2005-01-01

DNA microarray experiments generate a substantial amount of information about the global gene expression. Gene expression profiles can be represented as points in multi-dimensional space. It is essential to identify relevant groups of genes in biomedical research. Clustering is helpful in pattern recognition in gene expression profiles. A number of clustering techniques have been introduced. However, these traditional methods mainly utilize shape-based assumption or some distance metric to cluster the points in multi-dimension linear Euclidean space. Their results shows poor consistence with the functional annotation of genes in previous validation study. From a novel different perspective, we propose fractal clustering method to cluster genes using intrinsic (fractal) dimension from modern geometry. This method clusters points in such a way that points in the same clusters are more self-affine among themselves than to the points in other clusters. We assess this method using annotation-based validation assessment for gene clusters. It shows that this method is superior in identifying functional related gene groups than other traditional methods.
Nuclear pasta in hot dense matter and its implications for neutrino scattering

NASA Astrophysics Data System (ADS)

Roggero, Alessandro; Margueron, Jérôme; Roberts, Luke F.; Reddy, Sanjay

2018-04-01

The abundance of large clusters of nucleons in neutron-rich matter at subnuclear density is found to be greatly reduced by finite-temperature effects when matter is close to β equilibrium, compared to the case where the electron fraction is fixed at Ye>0.1 , as often considered in the literature. Large nuclei and exotic nonspherical nuclear configurations called pasta, favored in the vicinity of the transition to uniform matter at T =0 , dissolve at a relatively low temperature Tu as protons leak out of nuclei and pasta. For matter at β equilibrium with a negligible neutrino chemical potential we find that Tuβ≃4 ±1 MeV for realistic equations of state. This is lower than the maximum temperature Tmaxβ≃9 ±1 MeV at which nuclei can coexist with a gas of nucleons and can be explained by a change in the nature of the transition to uniform matter called retrograde condensation. An important new finding is that coherent neutrino scattering from nuclei and pasta makes a modest contribution to the opacity under the conditions encountered in supernovas and neutron star mergers. This is because large nuclear clusters dissolve at most relevant temperatures, and at lower temperatures, when clusters are present, Coulomb correlations between them suppress coherent neutrino scattering off individual clusters. Implications for neutrino signals from galactic supernovas are briefly discussed.
Interactive visual exploration and refinement of cluster assignments.

PubMed

Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R

2017-09-12

With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.
Transgenerationally inherited piRNAs trigger piRNA biogenesis by changing the chromatin of piRNA clusters and inducing precursor processing

PubMed Central

Le Thomas, Adrien; Stuwe, Evelyn; Li, Sisi; Marinov, Georgi; Rozhkov, Nikolay; Chen, Yung-Chia Ariel; Luo, Yicheng; Sachidanandam, Ravi; Toth, Katalin Fejes; Patel, Dinshaw; Aravin, Alexei A.

2014-01-01

Small noncoding RNAs that associate with Piwi proteins, called piRNAs, serve as guides for repression of diverse transposable elements in germ cells of metazoa. In Drosophila, the genomic regions that give rise to piRNAs, the so-called piRNA clusters, are transcribed to generate long precursor molecules that are processed into mature piRNAs. How genomic regions that give rise to piRNA precursor transcripts are differentiated from the rest of the genome and how these transcripts are specifically channeled into the piRNA biogenesis pathway are not known. We found that transgenerationally inherited piRNAs provide the critical trigger for piRNA production from homologous genomic regions in the next generation by two different mechanisms. First, inherited piRNAs enhance processing of homologous transcripts into mature piRNAs by initiating the ping-pong cycle in the cytoplasm. Second, inherited piRNAs induce installment of the histone 3 Lys9 trimethylation (H3K9me3) mark on genomic piRNA cluster sequences. The heterochromatin protein 1 (HP1) homolog Rhino binds to the H3K9me3 mark through its chromodomain and is enriched over piRNA clusters. Rhino recruits the piRNA biogenesis factor Cutoff to piRNA clusters and is required for efficient transcription of piRNA precursors. We propose that transgenerationally inherited piRNAs act as an epigenetic memory for identification of substrates for piRNA biogenesis on two levels: by inducing a permissive chromatin environment for piRNA precursor synthesis and by enhancing processing of these precursors. PMID:25085419
Cluster of Stars in Kepler Sight

NASA Image and Video Library

2009-04-16

This image zooms into a small portion of NASA Kepler full field of view, an expansive, 100-square-degree patch of sky in our Milky Way galaxy. An eight-billion-year-old cluster of stars 13,000 light-years from Earth, called NGC 6791, can be seen in the image. Clusters are families of stars that form together out of the same gas cloud. This particular cluster is called an open cluster, because the stars are loosely bound and have started to spread out from each other. The area pictured is 0.2 percent of Kepler's full field of view, and shows hundreds of stars in the constellation Lyra. The image has been color-coded so that brighter stars appear white, and fainter stars, red. It is a 60-second exposure, taken on April 8, 2009, one day after the spacecraft's dust cover was jettisoned. Kepler was designed to hunt for planets like Earth. The mission will spend the next three-and-a-half years staring at the same stars, looking for periodic dips in brightness. Such dips occur when planets cross in front of their stars from our point of view in the galaxy, partially blocking the starlight. To achieve the level of precision needed to spot planets as small as Earth, Kepler's images are intentionally blurred slightly. This minimizes the number of saturated stars. Saturation, or "blooming," occurs when the brightest stars overload the individual pixels in the detectors, causing the signal to spill out into nearby pixels. http://photojournal.jpl.nasa.gov/catalog/PIA11986
Varying face occlusion detection and iterative recovery for face recognition

NASA Astrophysics Data System (ADS)

Wang, Meng; Hu, Zhengping; Sun, Zhe; Zhao, Shuhuan; Sun, Mei

2017-05-01

In most sparse representation methods for face recognition (FR), occlusion problems were usually solved via removing the occlusion part of both query samples and training samples to perform the recognition process. This practice ignores the global feature of facial image and may lead to unsatisfactory results due to the limitation of local features. Considering the aforementioned drawback, we propose a method called varying occlusion detection and iterative recovery for FR. The main contributions of our method are as follows: (1) to detect an accurate occlusion area of facial images, an image processing and intersection-based clustering combination method is used for occlusion FR; (2) according to an accurate occlusion map, the new integrated facial images are recovered iteratively and put into a recognition process; and (3) the effectiveness on recognition accuracy of our method is verified by comparing it with three typical occlusion map detection methods. Experiments show that the proposed method has a highly accurate detection and recovery performance and that it outperforms several similar state-of-the-art methods against partial contiguous occlusion.
Semiconductor color-center structure and excitation spectra: Equation-of-motion coupled-cluster description of vacancy and transition-metal defect photoluminescence

NASA Astrophysics Data System (ADS)

Lutz, Jesse J.; Duan, Xiaofeng F.; Burggraf, Larry W.

2018-03-01

Valence excitation spectra are computed for deep-center silicon-vacancy defects in 3C, 4H, and 6H silicon carbide (SiC), and comparisons are made with literature photoluminescence measurements. Optimizations of nuclear geometries surrounding the defect centers are performed within a Gaussian basis-set framework using many-body perturbation theory or density functional theory (DFT) methods, with computational expenses minimized by a QM/MM technique called SIMOMM. Vertical excitation energies are subsequently obtained by applying excitation-energy, electron-attached, and ionized equation-of-motion coupled-cluster (EOMCC) methods, where appropriate, as well as time-dependent (TD) DFT, to small models including only a few atoms adjacent to the defect center. We consider the relative quality of various EOMCC and TD-DFT methods for (i) energy-ordering potential ground states differing incrementally in charge and multiplicity, (ii) accurately reproducing experimentally measured photoluminescence peaks, and (iii) energy-ordering defects of different types occurring within a given polytype. The extensibility of this approach to transition-metal defects is also tested by applying it to silicon-substituted chromium defects in SiC and comparing with measurements. It is demonstrated that, when used in conjunction with SIMOMM-optimized geometries, EOMCC-based methods can provide a reliable prediction of the ground-state charge and multiplicity, while also giving a quantitative description of the photoluminescence spectra, accurate to within 0.1 eV of measurement for all cases considered.
Dawn in the Apollo Valley

NASA Image and Video Library

2013-12-18

Beam Wave Guide antennas at Goldstone, known as the Beam Waveguide Cluster. They are located in an area at Goldstone called Apollo Valley. The Goldstone Deep Space Communications Complex is located in the Mojave Desert in California, USA.
Surfactant-controlled polymerization of semiconductor clusters to quantum dots through competing step-growth and living chain-growth mechanisms.

PubMed

Evans, Christopher M; Love, Alyssa M; Weiss, Emily A

2012-10-17

This article reports control of the competition between step-growth and living chain-growth polymerization mechanisms in the formation of cadmium chalcogenide colloidal quantum dots (QDs) from CdSe(S) clusters by varying the concentration of anionic surfactant in the synthetic reaction mixture. The growth of the particles proceeds by step-addition from initially nucleated clusters in the absence of excess phosphinic or carboxylic acids, which adsorb as their anionic conjugate bases, and proceeds indirectly by dissolution of clusters, and subsequent chain-addition of monomers to stable clusters (Ostwald ripening) in the presence of excess phosphinic or carboxylic acid. Fusion of clusters by step-growth polymerization is an explanation for the consistent observation of so-called "magic-sized" clusters in QD growth reactions. Living chain-addition (chain addition with no explicit termination step) produces QDs over a larger range of sizes with better size dispersity than step-addition. Tuning the molar ratio of surfactant to Se(2-)(S(2-)), the limiting ionic reagent, within the living chain-addition polymerization allows for stoichiometric control of QD radius without relying on reaction time.
Reducing Earth Topography Resolution for SMAP Mission Ground Tracks Using K-Means Clustering

NASA Technical Reports Server (NTRS)

Rizvi, Farheen

2013-01-01

The K-means clustering algorithm is used to reduce Earth topography resolution for the SMAP mission ground tracks. As SMAP propagates in orbit, knowledge of the radar antenna footprints on Earth is required for the antenna misalignment calibration. Each antenna footprint contains a latitude and longitude location pair on the Earth surface. There are 400 pairs in one data set for the calibration model. It is computationally expensive to calculate corresponding Earth elevation for these data pairs. Thus, the antenna footprint resolution is reduced. Similar topographical data pairs are grouped together with the K-means clustering algorithm. The resolution is reduced to the mean of each topographical cluster called the cluster centroid. The corresponding Earth elevation for each cluster centroid is assigned to the entire group. Results show that 400 data points are reduced to 60 while still maintaining algorithm performance and computational efficiency. In this work, sensitivity analysis is also performed to show a trade-off between algorithm performance versus computational efficiency as the number of cluster centroids and algorithm iterations are increased.
A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

PubMed

Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

2016-02-01

Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

A proximity-based graph clustering method for the identification and application of transcription factor clusters.

PubMed

Spadafore, Maxwell; Najarian, Kayvan; Boyle, Alan P

2017-11-29

Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions. Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF. We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions. The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics.
The colour-magnitude relation of globular clusters in Centaurus and Hydra. Constraints on star cluster self-enrichment with a link to massive Milky Way globular clusters

NASA Astrophysics Data System (ADS)

Fensch, J.; Mieske, S.; Müller-Seidlitz, J.; Hilker, M.

2014-07-01

Aims: We investigate the colour-magnitude relation of metal-poor globular clusters, the so-called blue tilt, in the Hydra and Centaurus galaxy clusters and constrain the primordial conditions for star cluster self-enrichment. Methods: We analyse U,I photometry for about 2500 globular clusters in the central regions of Hydra and Centaurus, based on VLT/FORS1 data. We measure the relation between mean colour and luminosity for the blue and red subpopulation of the globular cluster samples. We convert these relations into mass-metallicity space and compare the obtained GC mass-metallicity relation with predictions from the star cluster self-enrichment model by Bailin & Harris (2009, ApJ, 695, 1082). For this we include effects of dynamical and stellar evolution and a physically well motivated primordial mass-radius scaling. Results: We obtain a mass-metallicity scaling of Z ∝ M0.27 ± 0.05 for Centaurus GCs and Z ∝ M0.40 ± 0.06 for Hydra GCs, consistent with the range of observed relations in other environments. We find that the GC mass-metallicity relation already sets in at present-day masses of a few and is well established in the luminosity range of massive MW clusters like ω Centauri. The inclusion of a primordial mass-radius scaling of star clusters significantly improves the fit of the self-enrichment model to the data. The self-enrichment model accurately reproduces the observed relations for average primordial half-light radii rh ~ 1-1.5 pc, star formation efficiencies f⋆ ~ 0.3-0.4, and pre-enrichment levels of [Fe/H] - 1.7 dex. The slightly steeper blue tilt for Hydra can be explained either by a ~30% smaller average rh at fixed f⋆ ~ 0.3, or analogously by a ~20% smaller f⋆ at fixed rh ~ 1.5 pc. Within the self-enrichment scenario, the observed blue tilt implies a correlation between GC mass and width of the stellar metallicity distribution. We find that this implied correlation matches the trend of width with GC mass measured in Galactic GCs, including extreme cases like ω Centauri and M 54. Conclusions: First, we found that a primordial star cluster mass-radius relation provides a significant improvement to the self-enrichment model fits. Second we show that broadened metallicity distributions as found in some massive MW globular clusters may have arisen naturally from self-enrichment processes, without the need of a dwarf galaxy progenitor.
Data depth based clustering analysis

DOE PAGES

Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...

2016-01-01

Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less
Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster

NASA Astrophysics Data System (ADS)

Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.

2018-04-01

Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.
Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

PubMed

Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

2012-01-01

Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.
Time-lapse ERT interpretation methodology for leachate injection monitoring based on multiple inversions and a clustering strategy (MICS)

NASA Astrophysics Data System (ADS)

Audebert, M.; Clément, R.; Touze-Foltz, N.; Günther, T.; Moreau, S.; Duquennoi, C.

2014-12-01

Leachate recirculation is a key process in municipal waste landfills functioning as bioreactors. To quantify the water content and to assess the leachate injection system, in-situ methods are required to obtain spatially distributed information, usually electrical resistivity tomography (ERT). This geophysical method is based on the inversion process, which presents two major problems in terms of delimiting the infiltration area. First, it is difficult for ERT users to choose an appropriate inversion parameter set. Indeed, it might not be sufficient to interpret only the optimum model (i.e. the model with the chosen regularisation strength) because it is not necessarily the model which best represents the physical process studied. Second, it is difficult to delineate the infiltration front based on resistivity models because of the smoothness of the inversion results. This paper proposes a new methodology called MICS (multiple inversions and clustering strategy), which allows ERT users to improve the delimitation of the infiltration area in leachate injection monitoring. The MICS methodology is based on (i) a multiple inversion step by varying the inversion parameter values to take a wide range of resistivity models into account and (ii) a clustering strategy to improve the delineation of the infiltration front. In this paper, MICS was assessed on two types of data. First, a numerical assessment allows us to optimise and test MICS for different infiltration area sizes, contrasts and shapes. Second, MICS was applied to a field data set gathered during leachate recirculation on a bioreactor.
A two-stage method for microcalcification cluster segmentation in mammography by deformable models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.

Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods aremore » applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross-validation methodology. A previously developed B-spline active rays segmentation method was also considered for comparison purposes. Results: Interobserver and intraobserver segmentation agreements (median and [25%, 75%] quartile range) were substantial with respect to the distance metrics HDIST{sub cluster} (2.3 [1.8, 2.9] and 2.5 [2.1, 3.2] pixels) and AMINDIST{sub cluster} (0.8 [0.6, 1.0] and 1.0 [0.8, 1.2] pixels), while moderate with respect to AOM{sub cluster} (0.64 [0.55, 0.71] and 0.59 [0.52, 0.66]). The proposed segmentation method outperformed (0.80 ± 0.04) statistically significantly (Mann-Whitney U-test, p < 0.05) the B-spline active rays segmentation method (0.69 ± 0.04), suggesting the significance of the proposed semiautomated method. Conclusions: Results indicate a reliable semiautomated segmentation method for MC clusters offered by deformable models, which could be utilized in MC cluster quantitative image analysis.« less
Zone-Based Routing Protocol for Wireless Sensor Networks

PubMed Central

Venkateswarlu Kumaramangalam, Muni; Adiyapatham, Kandasamy; Kandasamy, Chandrasekaran

2014-01-01

Extensive research happening across the globe witnessed the importance of Wireless Sensor Network in the present day application world. In the recent past, various routing algorithms have been proposed to elevate WSN network lifetime. Clustering mechanism is highly successful in conserving energy resources for network activities and has become promising field for researches. However, the problem of unbalanced energy consumption is still open because the cluster head activities are tightly coupled with role and location of a particular node in the network. Several unequal clustering algorithms are proposed to solve this wireless sensor network multihop hot spot problem. Current unequal clustering mechanisms consider only intra- and intercluster communication cost. Proper organization of wireless sensor network into clusters enables efficient utilization of limited resources and enhances lifetime of deployed sensor nodes. This paper considers a novel network organization scheme, energy-efficient edge-based network partitioning scheme, to organize sensor nodes into clusters of equal size. Also, it proposes a cluster-based routing algorithm, called zone-based routing protocol (ZBRP), for elevating sensor network lifetime. Experimental results show that ZBRP out-performs interims of network lifetime and energy conservation with its uniform energy consumption among the cluster heads. PMID:27437455
Zone-Based Routing Protocol for Wireless Sensor Networks.

PubMed

Venkateswarlu Kumaramangalam, Muni; Adiyapatham, Kandasamy; Kandasamy, Chandrasekaran

2014-01-01

Extensive research happening across the globe witnessed the importance of Wireless Sensor Network in the present day application world. In the recent past, various routing algorithms have been proposed to elevate WSN network lifetime. Clustering mechanism is highly successful in conserving energy resources for network activities and has become promising field for researches. However, the problem of unbalanced energy consumption is still open because the cluster head activities are tightly coupled with role and location of a particular node in the network. Several unequal clustering algorithms are proposed to solve this wireless sensor network multihop hot spot problem. Current unequal clustering mechanisms consider only intra- and intercluster communication cost. Proper organization of wireless sensor network into clusters enables efficient utilization of limited resources and enhances lifetime of deployed sensor nodes. This paper considers a novel network organization scheme, energy-efficient edge-based network partitioning scheme, to organize sensor nodes into clusters of equal size. Also, it proposes a cluster-based routing algorithm, called zone-based routing protocol (ZBRP), for elevating sensor network lifetime. Experimental results show that ZBRP out-performs interims of network lifetime and energy conservation with its uniform energy consumption among the cluster heads.
A new parallelization scheme for adaptive mesh refinement

DOE PAGES

Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.; ...

2016-05-06

Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
A new parallelization scheme for adaptive mesh refinement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.

Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
Extraction of Children's Friendship Relation from Activity Level

NASA Astrophysics Data System (ADS)

Kono, Aki; Shintani, Kimio; Katsuki, Takuya; Kihara, Shin'ya; Ueda, Mari; Kaneda, Shigeo; Haga, Hirohide

Children learn to fit into society through living in a group, and it's greatly influenced by their friend relations. Although preschool teachers need to observe them to assist in the growth of children's social progress and support the development each child's personality, only experienced teachers can watch over children while providing high-quality guidance. To resolve the problem, this paper proposes a mathematical and objective method that assists teachers with observation. It uses numerical data of activity level recorded by pedometers, and we make tree diagram called dendrogram based on hierarchical clustering with recorded activity level. Also, we calculate children's ``breadth'' and ``depth'' of friend relations by using more than one dendrogram. When we record children's activity level in a certain kindergarten for two months and evaluated the proposed method, the results usually coincide with remarks of teachers about the children.
Large-Scale Cooperative Dissemination of Governmental Information in Emergency — An Experiment and Future Strategies

NASA Astrophysics Data System (ADS)

Horiba, Katsuhiro; Okawa, Keiko; Murai, Jun

On the 11th of March, 2011, a massive earthquake hit the northeast region of Japan. The government of Japan needed to publish information regarding the earthquake and its influences. However, their capacity of Web services overflowed. They called the industry and academia for help for providing stable information service to the people. Industry and academia formed a team to answer the call and named themselves the “EQ project”. This paper describes how the EQ Project was organized and operated, and gives analyses of the statistics. An academic organization took the lead in the EQ Project. Ten organizations which consisted of commercial IT industry and academics specialized in Internet technology, were participating in the EQ Project and they structured the three clusters based on their relationships and technological approach. In WIDE Cluster, one of three clusters in the structure of EQ, the peak number of file accesses per day was over 90 thousand, the mobile browsers was 3.4% and foreign languages (translated contents) were referred 35%. We have also discussed the future information distribution strategies in emergency situation based on the experiences of the EQ Project, and proposed nine suggestions to the MEXT as a future strategy.
An Accurate Framework for Arbitrary View Pedestrian Detection in Images

NASA Astrophysics Data System (ADS)

Fan, Y.; Wen, G.; Qiu, S.

2018-01-01

We consider the problem of detect pedestrian under from images collected under various viewpoints. This paper utilizes a novel framework called locality-constrained affine subspace coding (LASC). Firstly, the positive training samples are clustered into similar entities which represent similar viewpoint. Then Principal Component Analysis (PCA) is used to obtain the shared feature of each viewpoint. Finally, the samples that can be reconstructed by linear approximation using their top- k nearest shared feature with a small error are regarded as a correct detection. No negative samples are required for our method. Histograms of orientated gradient (HOG) features are used as the feature descriptors, and the sliding window scheme is adopted to detect humans in images. The proposed method exploits the sparse property of intrinsic information and the correlations among the multiple-views samples. Experimental results on the INRIA and SDL human datasets show that the proposed method achieves a higher performance than the state-of-the-art methods in form of effect and efficiency.
Automated UMLS-Based Comparison of Medical Forms

PubMed Central

Dugas, Martin; Fritz, Fleur; Krumm, Rainer; Breil, Bernhard

2013-01-01

Medical forms are very heterogeneous: on a European scale there are thousands of data items in several hundred different systems. To enable data exchange for clinical care and research purposes there is a need to develop interoperable documentation systems with harmonized forms for data capture. A prerequisite in this harmonization process is comparison of forms. So far – to our knowledge – an automated method for comparison of medical forms is not available. A form contains a list of data items with corresponding medical concepts. An automatic comparison needs data types, item names and especially item with these unique concept codes from medical terminologies. The scope of the proposed method is a comparison of these items by comparing their concept codes (coded in UMLS). Each data item is represented by item name, concept code and value domain. Two items are called identical, if item name, concept code and value domain are the same. Two items are called matching, if only concept code and value domain are the same. Two items are called similar, if their concept codes are the same, but the value domains are different. Based on these definitions an open-source implementation for automated comparison of medical forms in ODM format with UMLS-based semantic annotations was developed. It is available as package compareODM from http://cran.r-project.org. To evaluate this method, it was applied to a set of 7 real medical forms with 285 data items from a large public ODM repository with forms for different medical purposes (research, quality management, routine care). Comparison results were visualized with grid images and dendrograms. Automated comparison of semantically annotated medical forms is feasible. Dendrograms allow a view on clustered similar forms. The approach is scalable for a large set of real medical forms. PMID:23861827
Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures.

PubMed

Saeed, Faisal; Salim, Naomie; Abdo, Ammar

2013-07-01

Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Aspects of defects in 3d-3d correspondence

DOE PAGES

Gang, Dongmin; Kim, Nakwoo; Romo, Mauricio; ...

2016-10-12

In this paper we study supersymmetric co-dimension 2 and 4 defects in the compactification of the 6d (2, 0) theory of type A N-1 on a 3-manifold M . The so-called 3d-3d correspondence is a relation between complexified Chern-Simons theory (with gauge group SL(N,C) ) on M and a 3d N=2 theory T N [M ]. We study this correspondence in the presence of supersymmetric defects, which are knots/links inside the 3-manifold. Our study employs a number of different methods: state-integral models for complex Chern-Simons theory, cluster algebra techniques, domain wall theory T [SU(N )], 5d N=2 SYM, and alsomore » supergravity analysis through holography. These methods are complementary and we find agreement between them. In some cases the results lead to highly non-trivial predictions on the partition function. Our discussion includes a general expression for the cluster partition function, which can be used to compute in the presence of maximal and certain class of non-maximal punctures when N > 2. We also highlight the non-Abelian description of the 3d N=2 T N [M ] theory with defect included, when such a description is available. This paper is a companion to our shorter paper, which summarizes our main results.« less
Appplication of statistical mechanical methods to the modeling of social networks

NASA Astrophysics Data System (ADS)

Strathman, Anthony Robert

With the recent availability of large-scale social data sets, social networks have become open to quantitative analysis via the methods of statistical physics. We examine the statistical properties of a real large-scale social network, generated from cellular phone call-trace logs. We find this network, like many other social networks to be assortative (r = 0.31) and clustered (i.e., strongly transitive, C = 0.21). We measure fluctuation scaling to identify the presence of internal structure in the network and find that structural inhomogeneity effectively disappears at the scale of a few hundred nodes, though there is no sharp cutoff. We introduce an agent-based model of social behavior, designed to model the formation and dissolution of social ties. The model is a modified Metropolis algorithm containing agents operating under the basic sociological constraints of reciprocity, communication need and transitivity. The model introduces the concept of a social temperature. We go on to show that this simple model reproduces the global statistical network features (incl. assortativity, connected fraction, mean degree, clustering, and mean shortest path length) of the real network data and undergoes two phase transitions, one being from a "gas" to a "liquid" state and the second from a liquid to a glassy state as function of this social temperature.
Clustered regularly interspaced short palindromic repeats (CRISPRs) analysis of members of the Mycobacterium tuberculosis complex.

PubMed

Botelho, Ana; Canto, Ana; Leão, Célia; Cunha, Mónica V

2015-01-01

Typical CRISPR (clustered, regularly interspaced, short palindromic repeat) regions are constituted by short direct repeats (DRs), interspersed with similarly sized non-repetitive spacers, derived from transmissible genetic elements, acquired when the cell is challenged with foreign DNA. The analysis of the structure, in number and nature, of CRISPR spacers is a valuable tool for molecular typing since these loci are polymorphic among strains, originating characteristic signatures. The existence of CRISPR structures in the genome of the members of Mycobacterium tuberculosis complex (MTBC) enabled the development of a genotyping method, based on the analysis of the presence or absence of 43 oligonucleotide spacers separated by conserved DRs. This method, called spoligotyping, consists on PCR amplification of the DR chromosomal region and recognition after hybridization of the spacers that are present. The workflow beneath this methodology implies that the PCR products are brought onto a membrane containing synthetic oligonucleotides that have complementary sequences to the spacer sequences. Lack of hybridization of the PCR products to a specific oligonucleotide sequence indicates absence of the correspondent spacer sequence in the examined strain. Spoligotyping gained great notoriety as a robust identification and typing tool for members of MTBC, enabling multiple epidemiological studies on human and animal tuberculosis.
Hubble Unveils a Tapestry of Dazzling Diamond-Like Stars

NASA Image and Video Library

2016-01-21

Resembling an opulent diamond tapestry, this image from NASA Hubble Space Telescope shows a glittering star cluster that contains a collection of some of the brightest stars seen in our Milky Way galaxy called Trumpler 14.

Cluster mass inference via random field theory.

PubMed

Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D

2009-01-01

Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.
NCC-RANSAC: a fast plane extraction method for 3-D range data segmentation.

PubMed

Qian, Xiangfei; Ye, Cang

2014-12-01

This paper presents a new plane extraction (PE) method based on the random sample consensus (RANSAC) approach. The generic RANSAC-based PE algorithm may over-extract a plane, and it may fail in case of a multistep scene where the RANSAC procedure results in multiple inlier patches that form a slant plane straddling the steps. The CC-RANSAC PE algorithm successfully overcomes the latter limitation if the inlier patches are separate. However, it fails if the inlier patches are connected. A typical scenario is a stairway with a stair wall where the RANSAC plane-fitting procedure results in inliers patches in the tread, riser, and stair wall planes. They connect together and form a plane. The proposed method, called normal-coherence CC-RANSAC (NCC-RANSAC), performs a normal coherence check to all data points of the inlier patches and removes the data points whose normal directions are contradictory to that of the fitted plane. This process results in separate inlier patches, each of which is treated as a candidate plane. A recursive plane clustering process is then executed to grow each of the candidate planes until all planes are extracted in their entireties. The RANSAC plane-fitting and the recursive plane clustering processes are repeated until no more planes are found. A probabilistic model is introduced to predict the success probability of the NCC-RANSAC algorithm and validated with real data of a 3-D time-of-flight camera-SwissRanger SR4000. Experimental results demonstrate that the proposed method extracts more accurate planes with less computational time than the existing RANSAC-based methods.
NCC-RANSAC: A Fast Plane Extraction Method for 3-D Range Data Segmentation

PubMed Central

Qian, Xiangfei; Ye, Cang

2015-01-01

This paper presents a new plane extraction (PE) method based on the random sample consensus (RANSAC) approach. The generic RANSAC-based PE algorithm may over-extract a plane, and it may fail in case of a multistep scene where the RANSAC procedure results in multiple inlier patches that form a slant plane straddling the steps. The CC-RANSAC PE algorithm successfully overcomes the latter limitation if the inlier patches are separate. However, it fails if the inlier patches are connected. A typical scenario is a stairway with a stair wall where the RANSAC plane-fitting procedure results in inliers patches in the tread, riser, and stair wall planes. They connect together and form a plane. The proposed method, called normal-coherence CC-RANSAC (NCC-RANSAC), performs a normal coherence check to all data points of the inlier patches and removes the data points whose normal directions are contradictory to that of the fitted plane. This process results in separate inlier patches, each of which is treated as a candidate plane. A recursive plane clustering process is then executed to grow each of the candidate planes until all planes are extracted in their entireties. The RANSAC plane-fitting and the recursive plane clustering processes are repeated until no more planes are found. A probabilistic model is introduced to predict the success probability of the NCC-RANSAC algorithm and validated with real data of a 3-D time-of-flight camera–SwissRanger SR4000. Experimental results demonstrate that the proposed method extracts more accurate planes with less computational time than the existing RANSAC-based methods. PMID:24771605
Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

DTIC Science & Technology

1982-02-26

UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an
The breakup of a main-belt asteroid 450 thousand years ago.

PubMed

Nesvorný, David; Vokrouhlický, David; Bottke, William F

2006-06-09

Collisions in the asteroid belt frequently lead to catastrophic breakups, where more than half of the target's mass is ejected into space. Several dozen large asteroids have been disrupted by impacts over the past several billion years. These impact events have produced groups of fragments with similar orbits called asteroid families. Here we report the discovery of a very young asteroid family around the object 1270 Datura. Our work takes advantage of a method for identification of recent breakups in the asteroid belt using catalogs of osculating (i.e., instantaneous) asteroid orbits. The very young families show up in these catalogs as clusters in a five-dimensional space of osculating orbital elements.
Nonrigid synthetic aperture radar and optical image coregistration by combining local rigid transformations using a Kohonen network.

PubMed

Salehpour, Mehdi; Behrad, Alireza

2017-10-01

This study proposes a new algorithm for nonrigid coregistration of synthetic aperture radar (SAR) and optical images. The proposed algorithm employs point features extracted by the binary robust invariant scalable keypoints algorithm and a new method called weighted bidirectional matching for initial correspondence. To refine false matches, we assume that the transformation between SAR and optical images is locally rigid. This property is used to refine false matches by assigning scores to matched pairs and clustering local rigid transformations using a two-layer Kohonen network. Finally, the thin plate spline algorithm and mutual information are used for nonrigid coregistration of SAR and optical images.
Applying Novel Time-Frequency Moments Singular Value Decomposition Method and Artificial Neural Networks for Ballistocardiography

NASA Astrophysics Data System (ADS)

Akhbardeh, Alireza; Junnila, Sakari; Koivuluoma, Mikko; Koivistoinen, Teemu; Värri, Alpo

2006-12-01

As we know, singular value decomposition (SVD) is designed for computing singular values (SVs) of a matrix. Then, if it is used for finding SVs of an [InlineEquation not available: see fulltext.]-by-1 or 1-by- [InlineEquation not available: see fulltext.] array with elements representing samples of a signal, it will return only one singular value that is not enough to express the whole signal. To overcome this problem, we designed a new kind of the feature extraction method which we call ''time-frequency moments singular value decomposition (TFM-SVD).'' In this new method, we use statistical features of time series as well as frequency series (Fourier transform of the signal). This information is then extracted into a certain matrix with a fixed structure and the SVs of that matrix are sought. This transform can be used as a preprocessing stage in pattern clustering methods. The results in using it indicate that the performance of a combined system including this transform and classifiers is comparable with the performance of using other feature extraction methods such as wavelet transforms. To evaluate TFM-SVD, we applied this new method and artificial neural networks (ANNs) for ballistocardiogram (BCG) data clustering to look for probable heart disease of six test subjects. BCG from the test subjects was recorded using a chair-like ballistocardiograph, developed in our project. This kind of device combined with automated recording and analysis would be suitable for use in many places, such as home, office, and so forth. The results show that the method has high performance and it is almost insensitive to BCG waveform latency or nonlinear disturbance.
Clustering of longitudinal data by using an extended baseline: A new method for treatment efficacy clustering in longitudinal data.

PubMed

Schramm, Catherine; Vial, Céline; Bachoud-Lévi, Anne-Catherine; Katsahian, Sandrine

2018-01-01

Heterogeneity in treatment efficacy is a major concern in clinical trials. Clustering may help to identify the treatment responders and the non-responders. In the context of longitudinal cluster analyses, sample size and variability of the times of measurements are the main issues with the current methods. Here, we propose a new two-step method for the Clustering of Longitudinal data by using an Extended Baseline. The first step relies on a piecewise linear mixed model for repeated measurements with a treatment-time interaction. The second step clusters the random predictions and considers several parametric (model-based) and non-parametric (partitioning, ascendant hierarchical clustering) algorithms. A simulation study compares all options of the clustering of longitudinal data by using an extended baseline method with the latent-class mixed model. The clustering of longitudinal data by using an extended baseline method with the two model-based algorithms was the more robust model. The clustering of longitudinal data by using an extended baseline method with all the non-parametric algorithms failed when there were unequal variances of treatment effect between clusters or when the subgroups had unbalanced sample sizes. The latent-class mixed model failed when the between-patients slope variability is high. Two real data sets on neurodegenerative disease and on obesity illustrate the clustering of longitudinal data by using an extended baseline method and show how clustering may help to identify the marker(s) of the treatment response. The application of the clustering of longitudinal data by using an extended baseline method in exploratory analysis as the first stage before setting up stratified designs can provide a better estimation of treatment effect in future clinical trials.
Bullied youth: the impact of bullying through lesbian, gay, and bisexual name calling.

PubMed

Evans, Caroline B R; Chapman, Mimi V

2014-11-01

Bullying is a common experience for many school-aged youth, but the majority of bullying research and intervention does not address the content of bullying behavior, particularly teasing. Understanding the various forms of bullying as well as the language used in bullying is important given that bullying can have persistent consequences, particularly for victims who are bullied through biased-based bullying, such as being called gay, lesbian, or queer. This study examines bullying experiences in a racially and ethnically diverse sample of 3,379 rural elementary-, middle-, and high-school youth. We use latent class analysis to establish clusters of bullying behaviors, including forms of biased-based bullying. The resulting classes are examined to ascertain if and how bullying by biased-based labeling is clustered with other forms of bullying behavior. This analysis identifies 3 classes of youth: youth who experience no bullying victimization, youth who experience social and emotional bullying, and youth who experience all forms of social and physical bullying, including being bullied by being called gay, lesbian, or queer. Youth in Classes 2 and 3 labeled their experiences as bullying. Results indicate that youth bullied by being called gay, lesbian, or queer are at a high risk of experiencing all forms of bullying behavior, highlighting the importance of increased support for this vulnerable group. (c) 2014 APA, all rights reserved.
Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data

DOE PAGES

Hsu, David

2015-09-27

Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression,more » also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.« less
Improving local clustering based top-L link prediction methods via asymmetric link clustering information

NASA Astrophysics Data System (ADS)

Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

2018-02-01

Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.
A new method to prepare colloids of size-controlled clusters from a matrix assembly cluster source

NASA Astrophysics Data System (ADS)

Cai, Rongsheng; Jian, Nan; Murphy, Shane; Bauer, Karl; Palmer, Richard E.

2017-05-01

A new method for the production of colloidal suspensions of physically deposited clusters is demonstrated. A cluster source has been used to deposit size-controlled clusters onto water-soluble polymer films, which are then dissolved to produce colloidal suspensions of clusters encapsulated with polymer molecules. This process has been demonstrated using different cluster materials (Au and Ag) and polymers (polyvinylpyrrolidone, polyvinyl alcohol, and polyethylene glycol). Scanning transmission electron microscopy of the clusters before and after colloidal dispersion confirms that the polymers act as stabilizing agents. We propose that this method is suitable for the production of biocompatible colloids of ultraprecise clusters.
Ultraviolet properties of individual hot stars in globular cluster cores. 1: NGC 1904 (M 79)

NASA Technical Reports Server (NTRS)

Altner, Bruce; Matilsky, Terry A.

1992-01-01

As part of an observing program using the International Ultraviolet Explorer (IUE) satellite to investigate the ultraviolet properties of stars found within the cores of galactic globular clusters with blue horizontal branches (HBs), we obtained three spectra of the cluster NGC 1904 (M 79). All three were long integration-time, short-wavelength (SWP) spectra obtained at the so called 'center of light' and all three showed evidence of sources within the IUE large aperture (21.4 in. by 10 in.). In this paper we shall describe the analysis of these spectra and present evidence that the UV sources represent individual hot stars in the post-HB stage of evolution.
Differential regulation of ParaHox genes by retinoic acid in the invertebrate chordate amphioxus (Branchiostoma floridae).

PubMed

Osborne, Peter W; Benoit, Gérard; Laudet, Vincent; Schubert, Michael; Ferrier, David E K

2009-03-01

The ParaHox cluster is the evolutionary sister to the Hox cluster. Like the Hox cluster, the ParaHox cluster displays spatial and temporal regulation of the component genes along the anterior/posterior axis in a manner that correlates with the gene positions within the cluster (a feature called collinearity). The ParaHox cluster is however a simpler system to study because it is composed of only three genes. We provide a detailed analysis of the amphioxus ParaHox cluster and, for the first time in a single species, examine the regulation of the cluster in response to a single developmental signalling molecule, retinoic acid (RA). Embryos treated with either RA or RA antagonist display altered ParaHox gene expression: AmphiGsx expression shifts in the neural tube, and the endodermal boundary between AmphiXlox and AmphiCdx shifts its anterior/posterior position. We identified several putative retinoic acid response elements and in vitro assays suggest some may participate in RA regulation of the ParaHox genes. By comparison to vertebrate ParaHox gene regulation we explore the evolutionary implications. This work highlights how insights into the regulation and evolution of more complex vertebrate arrangements can be obtained through studies of a simpler, unduplicated amphioxus gene cluster.
A hybrid computational strategy to address WGS variant analysis in >5000 samples.

PubMed

Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli

2016-09-10

The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
A gas-rich AGN near the centre of a galaxy cluster at z ~ 1.4

NASA Astrophysics Data System (ADS)

Casasola, V.; Magrini, L.; Combes, F.; Mignano, A.; Sani, E.; Paladino, R.; Fontani, F.

2013-10-01

Context. The formation of the first virialized structures in overdensities dates back to ~9 Gyr ago, i.e. in the redshift range z ~ 1.4-1.6. Some models of structure formation predict that the star formation activity in clusters was high at that epoch, implying large reservoirs of cold molecular gas. Aims: Aiming at finding a trace of this expected high molecular gas content in primeval clusters, we searched for the 12CO(2-1) line emission in the most luminous active galactic nucleus (AGN) of the cluster around the radio galaxy 7C 1756+6520 at z ~ 1.4, one of the farthest spectroscopic confirmed clusters. This AGN, called AGN.1317, is located in the neighbourhood of the central radio galaxy at a projected distance of ~780 kpc. Methods: The IRAM Plateau de Bure Interferometer was used to investigate the molecular gas quantity in AGN.1317, observing the 12CO(2-1) emission line. Results: We detect CO emission in an AGN belonging to a galaxy cluster at z ~ 1.4. We measured a molecular gas mass of 1.1 × 1010M⊙, comparable to that found in submillimeter galaxies. In optical images, AGN.1317 does not seem to be part of a galaxy interaction or merger. We also derived the nearly instantaneous star formation rate (SFR) from Hα flux obtaining a SFR ~ 65 M⊙ yr-1. This suggests that AGN.1317 is actively forming stars and will exhaust its reservoir of cold gas in ~0.2-1.0 Gyr. Based on observations carried out with the IRAM Plateau de Bure Interferometer. IRAM is supported by INSU/CNRS (France), MPG (Germany), and IGN (Spain).Reduced IRAM data is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/558/A60
Dynamics of fragment formation in neutron-rich matter

NASA Astrophysics Data System (ADS)

Alcain, P. N.; Dorso, C. O.

2018-01-01

Background: Neutron stars are astronomical systems with nucleons subjected to extreme conditions. Due to the longer range Coulomb repulsion between protons, the system has structural inhomogeneities. Several interactions tailored to reproduce nuclear matter plus a screened Coulomb term reproduce these inhomogeneities known as nuclear pasta. These structural inhomogeneities, located in the crusts of neutron stars, can also arise in expanding systems depending on the thermodynamic conditions (temperature, proton fraction, etc.) and the expansion velocity. Purpose: We aim to find the dynamics of the fragment formation for expanding systems simulated according to the little big bang model. This expansion resembles the evolution of merging neutron stars. Method: We study the dynamics of the nucleons with semiclassical molecular dynamics models. Starting with an equilibrium configuration, we expand the system homogeneously until we arrive at an asymptotic configuration (i.e., very low final densities). We study, with four different cluster recognition algorithms, the fragment distribution throughout this expansion and the dynamics of the cluster formation. Results: Studying the topology of the equilibrium states, before the expansion, we reproduced the known pasta phases plus a novel phase we called pregnocchi, consisting of proton aggregates embedded in a neutron sea. We have identified different fragmentation regimes, depending on the initial temperature and fragment velocity. In particular, for the already mentioned pregnocchi, a neutron cloud surrounds the clusters during the early stages of the expansion, resulting in systems that give rise to configurations compatible with the emergence of the r process. Conclusions: We showed that a proper identification of the cluster distribution is highly dependent on the cluster recognition algorithm chosen, and found that the early cluster recognition algorithm (ECRA) was the most stable one. This approach allowed us to identify the dynamics of the fragment formation. These calculations pave the way to a comparison between Earth experiments and neutron star studies.
Asymmetric reproductive character displacement in male aggregation behaviour

PubMed Central

Pfennig, Karin S.; Stewart, Alyssa B.

2011-01-01

Reproductive character displacement—the evolution of traits that minimize reproductive interactions between species—can promote striking divergence in male signals or female mate preferences between populations that do and do not occur with heterospecifics. However, reproductive character displacement can affect other aspects of mating behaviour. Indeed, avoidance of heterospecific interactions might contribute to spatial (or temporal) aggregation of conspecifics. We examined this possibility in two species of hybridizing spadefoot toad (genus Spea). We found that in Spea bombifrons sympatric males were more likely than allopatric males to associate with calling males. Moreover, contrary to allopatric males, sympatric S. bombifrons males preferentially associated with conspecific male calls. By contrast, Spea multiplicata showed no differences between sympatry and allopatry in likelihood to associate with calling males. Further, sympatric and allopatric males did not differ in preference for conspecifics. However, allopatric S. multiplicata were more variable than sympatric males in their responses. Thus, in S. multiplicata, character displacement may have refined pre-existing aggregation behaviour. Our results suggest that heterospecific interactions can foster aggregative behaviour that might ultimately contribute to clustering of conspecifics. Such clustering can generate spatial or temporal segregation of reproductive activities among species and ultimately promote reproductive isolation. PMID:21177683
Stochastic geometry in disordered systems, applications to quantum Hall transitions

NASA Astrophysics Data System (ADS)

Gruzberg, Ilya

2012-02-01

A spectacular success in the study of random fractal clusters and their boundaries in statistical mechanics systems at or near criticality using Schramm-Loewner Evolutions (SLE) naturally calls for extensions in various directions. Can this success be repeated for disordered and/or non-equilibrium systems? Naively, when one thinks about disordered systems and their average correlation functions one of the very basic assumptions of SLE, the so called domain Markov property, is lost. Also, in some lattice models of Anderson transitions (the network models) there are no natural clusters to consider. Nevertheless, in this talk I will argue that one can apply the so called conformal restriction, a notion of stochastic conformal geometry closely related to SLE, to study the integer quantum Hall transition and its variants. I will focus on the Chalker-Coddington network model and will demonstrate that its average transport properties can be mapped to a classical problem where the basic objects are geometric shapes (loosely speaking, the current paths) that obey an important restriction property. At the transition point this allows to use the theory of conformal restriction to derive exact expressions for point contact conductances in the presence of various non-trivial boundary conditions.
Structuring communication relationships for interprofessional teamwork (SCRIPT): a cluster randomized controlled trial

PubMed Central

Zwarenstein, Merrick; Reeves, Scott; Russell, Ann; Kenaszchuk, Chris; Conn, Lesley Gotlib; Miller, Karen-Lee; Lingard, Lorelei; Thorpe, Kevin E

2007-01-01

Background Despite a burgeoning interest in using interprofessional approaches to promote effective collaboration in health care, systematic reviews find scant evidence of benefit. This protocol describes the first cluster randomized controlled trial (RCT) to design and evaluate an intervention intended to improve interprofessional collaborative communication and patient-centred care. Objectives The objective is to evaluate the effects of a four-component, hospital-based staff communication protocol designed to promote collaborative communication between healthcare professionals and enhance patient-centred care. Methods The study is a multi-centre mixed-methods cluster randomized controlled trial involving twenty clinical teaching teams (CTTs) in general internal medicine (GIM) divisions of five Toronto tertiary-care hospitals. CTTs will be randomly assigned either to receive an intervention designed to improve interprofessional collaborative communication, or to continue usual communication practices. Non-participant naturalistic observation, shadowing, and semi-structured, qualitative interviews were conducted to explore existing patterns of interprofessional collaboration in the CTTs, and to support intervention development. Interviews and shadowing will continue during intervention delivery in order to document interactions between the intervention settings and adopters, and changes in interprofessional communication. The primary outcome is the rate of unplanned hospital readmission. Secondary outcomes are length of stay (LOS); adherence to evidence-based prescription drug therapy; patients' satisfaction with care; self-report surveys of CTT staff perceptions of interprofessional collaboration; and frequency of calls to paging devices. Outcomes will be compared on an intention-to-treat basis using adjustment methods appropriate for data from a cluster randomized design. Discussion Pre-intervention qualitative analysis revealed that a substantial amount of interprofessional interaction lacks key core elements of collaborative communication such as self-introduction, description of professional role, and solicitation of other professional perspectives. Incorporating these findings, a four-component intervention was designed with a goal of creating a culture of communication in which the fundamentals of collaboration become a routine part of interprofessional interactions during unstructured work periods on GIM wards. Trial registration Registered with National Institutes of Health as NCT00466297. PMID:17877830

A New Method to Constrain Supernova Fractions Using X-ray Observations of Clusters of Galaxies

NASA Technical Reports Server (NTRS)

Bulbul, Esra; Smith, Randall K.; Loewenstein, Michael

2012-01-01

Supernova (SN) explosions enrich the intracluster medium (ICM) both by creating and dispersing metals. We introduce a method to measure the number of SNe and relative contribution of Type Ia supernovae (SNe Ia) and core-collapse supernovae (SNe cc) by directly fitting X-ray spectral observations. The method has been implemented as an XSPEC model called snapec. snapec utilizes a single-temperature thermal plasma code (apec) to model the spectral emission based on metal abundances calculated using the latest SN yields from SN Ia and SN cc explosion models. This approach provides a self-consistent single set of uncertainties on the total number of SN explosions and relative fraction of SN types in the ICM over the cluster lifetime by directly allowing these parameters to be determined by SN yields provided by simulations. We apply our approach to XMM-Newton European Photon Imaging Camera (EPIC), Reflection Grating Spectrometer (RGS), and 200 ks simulated Astro-H observations of a cooling flow cluster, A3112.We find that various sets of SN yields present in the literature produce an acceptable fit to the EPIC and RGS spectra of A3112. We infer that 30.3% plus or minus 5.4% to 37.1% plus or minus 7.1% of the total SN explosions are SNe Ia, and the total number of SN explosions required to create the observed metals is in the range of (1.06 plus or minus 0.34) x 10(exp 9), to (1.28 plus or minus 0.43) x 10(exp 9), fromsnapec fits to RGS spectra. These values may be compared to the enrichment expected based on well-established empirically measured SN rates per star formed. The proportions of SNe Ia and SNe cc inferred to have enriched the ICM in the inner 52 kiloparsecs of A3112 is consistent with these specific rates, if one applies a correction for the metals locked up in stars. At the same time, the inferred level of SN enrichment corresponds to a star-to-gas mass ratio that is several times greater than the 10% estimated globally for clusters in the A3112 mass range.
Structuring Communication Relationships for Interprofessional Teamwork (SCRIPT): a cluster randomized controlled trial.

PubMed

Zwarenstein, Merrick; Reeves, Scott; Russell, Ann; Kenaszchuk, Chris; Conn, Lesley Gotlib; Miller, Karen-Lee; Lingard, Lorelei; Thorpe, Kevin E

2007-09-18

Despite a burgeoning interest in using interprofessional approaches to promote effective collaboration in health care, systematic reviews find scant evidence of benefit. This protocol describes the first cluster randomized controlled trial (RCT) to design and evaluate an intervention intended to improve interprofessional collaborative communication and patient-centred care. The objective is to evaluate the effects of a four-component, hospital-based staff communication protocol designed to promote collaborative communication between healthcare professionals and enhance patient-centred care. The study is a multi-centre mixed-methods cluster randomized controlled trial involving twenty clinical teaching teams (CTTs) in general internal medicine (GIM) divisions of five Toronto tertiary-care hospitals. CTTs will be randomly assigned either to receive an intervention designed to improve interprofessional collaborative communication, or to continue usual communication practices. Non-participant naturalistic observation, shadowing, and semi-structured, qualitative interviews were conducted to explore existing patterns of interprofessional collaboration in the CTTs, and to support intervention development. Interviews and shadowing will continue during intervention delivery in order to document interactions between the intervention settings and adopters, and changes in interprofessional communication. The primary outcome is the rate of unplanned hospital readmission. Secondary outcomes are length of stay (LOS); adherence to evidence-based prescription drug therapy; patients' satisfaction with care; self-report surveys of CTT staff perceptions of interprofessional collaboration; and frequency of calls to paging devices. Outcomes will be compared on an intention-to-treat basis using adjustment methods appropriate for data from a cluster randomized design. Pre-intervention qualitative analysis revealed that a substantial amount of interprofessional interaction lacks key core elements of collaborative communication such as self-introduction, description of professional role, and solicitation of other professional perspectives. Incorporating these findings, a four-component intervention was designed with a goal of creating a culture of communication in which the fundamentals of collaboration become a routine part of interprofessional interactions during unstructured work periods on GIM wards. Registered with National Institutes of Health as NCT00466297.
A cluster randomised controlled trial to determine the effect of community mobilisation and advocacy on men’s use of violence in periurban South Africa: study protocol

PubMed Central

Christofides, Nicola J; Hatcher, Abigail M; Pino, Angelica; Rebombo, Dumisani; McBride, Ruari Santiago; Anderson, Althea; Peacock, Dean

2018-01-01

Objective This paper describes the design and methods of a cluster randomised controlled trial (C-RCT) to determine the effectiveness of a community mobilisation intervention that is designed to reduce the perpetration of violence against women (VAW). Methods and analysis A C-RCT of nine intervention and nine control clusters is being carried out in a periurban, semiformal settlement near Johannesburg, South Africa, between 2016 and 2018. A community mobilisation and advocacy intervention, called Sonke CHANGE is being implemented over 18 months. It comprises local advocacy and group activities to engage community members to challenge harmful gender norms and reduce VAW. The intervention is hypothesised to improve equitable masculinities, reduce alcohol use and ultimately, to reduce VAW. Intervention effectiveness will be determined through an audio computer-assisted self-interview questionnaire with behavioural measures among 2600 men aged between 18 and 40 years at baseline, 12 months and 24 months. The primary trial outcome is men’s use of physical and/or sexual VAW. Secondary outcomes include harmful alcohol use, gender attitudes, controlling behaviours, transactional sex and social cohesion. The main analysis will be intention-to-treat based on the randomisation of clusters. A qualitative process evaluation is being conducted alongside the C-RCT. Implementers and men participating in the intervention will be interviewed longitudinally over the period of intervention implementation and observations of the workshops and other intervention activities are being carried out. Ethics and dissemination Ethical approval was obtained from the University of the Witwatersrand Human Research Ethics Committee and procedures comply with ethical recommendations of the United Nations Multi-Country Study on Men and Violence. Dissemination of research findings will take place with local stakeholders and through peer-reviewed publications, with data available on request or after 5 years of trial completion. Trial registration number NCT02823288; Pre-result. PMID:29574438
Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials

PubMed Central

Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.

2012-01-01

Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450
Using Gaussian windows to explore a multivariate data set

NASA Technical Reports Server (NTRS)

Jaeckel, Louis A.

1991-01-01

In an earlier paper, I recounted an exploratory analysis, using Gaussian windows, of a data set derived from the Infrared Astronomical Satellite. Here, my goals are to develop strategies for finding structural features in a data set in a many-dimensional space, and to find ways to describe the shape of such a data set. After a brief review of Gaussian windows, I describe the current implementation of the method. I give some ways of describing features that we might find in the data, such as clusters and saddle points, and also extended structures such as a 'bar', which is an essentially one-dimensional concentration of data points. I then define a distance function, which I use to determine which data points are 'associated' with a feature. Data points not associated with any feature are called 'outliers'. I then explore the data set, giving the strategies that I used and quantitative descriptions of the features that I found, including clusters, bars, and a saddle point. I tried to use strategies and procedures that could, in principle, be used in any number of dimensions.
Brownian aggregation rate of colloid particles with several active sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nekrasov, Vyacheslav M.; Yurkin, Maxim A.; Chernyshev, Andrei V., E-mail: chern@ns.kinetics.nsc.ru

2014-08-14

We theoretically analyze the aggregation kinetics of colloid particles with several active sites. Such particles (so-called “patchy particles”) are well known as chemically anisotropic reactants, but the corresponding rate constant of their aggregation has not yet been established in a convenient analytical form. Using kinematic approximation for the diffusion problem, we derived an analytical formula for the diffusion-controlled reaction rate constant between two colloid particles (or clusters) with several small active sites under the following assumptions: the relative translational motion is Brownian diffusion, and the isotropic stochastic reorientation of each particle is Markovian and arbitrarily correlated. This formula was shownmore » to produce accurate results in comparison with more sophisticated approaches. Also, to account for the case of a low number of active sites per particle we used Monte Carlo stochastic algorithm based on Gillespie method. Simulations showed that such discrete model is required when this number is less than 10. Finally, we applied the developed approach to the simulation of immunoagglutination, assuming that the formed clusters have fractal structure.« less
Scoring clustering solutions by their biological relevance.

PubMed

Gat-Viks, I; Sharan, R; Shamir, R

2003-12-12

A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
Finding gene clusters for a replicated time course study

PubMed Central

2014-01-01

Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656
Progeny Clustering: A Method to Identify Biological Phenotypes

PubMed Central

Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

2015-01-01

Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476
A spatial scan statistic for nonisotropic two-level risk cluster.

PubMed

Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

2012-01-30

Spatial scan statistic methods are commonly used for geographical disease surveillance and cluster detection. The standard spatial scan statistic does not model any variability in the underlying risks of subregions belonging to a detected cluster. For a multilevel risk cluster, the isotonic spatial scan statistic could model a centralized high-risk kernel in the cluster. Because variations in disease risks are anisotropic owing to different social, economical, or transport factors, the real high-risk kernel will not necessarily take the central place in a whole cluster area. We propose a spatial scan statistic for a nonisotropic two-level risk cluster, which could be used to detect a whole cluster and a noncentralized high-risk kernel within the cluster simultaneously. The performance of the three methods was evaluated through an intensive simulation study. Our proposed nonisotropic two-level method showed better power and geographical precision with two-level risk cluster scenarios, especially for a noncentralized high-risk kernel. Our proposed method is illustrated using the hand-foot-mouth disease data in Pingdu City, Shandong, China in May 2009, compared with two other methods. In this practical study, the nonisotropic two-level method is the only way to precisely detect a high-risk area in a detected whole cluster. Copyright © 2011 John Wiley & Sons, Ltd.
Size-guided multi-seed heuristic method for geometry optimization of clusters: Application to benzene clusters.

PubMed

Takeuchi, Hiroshi

2018-05-08

Since searching for the global minimum on the potential energy surface of a cluster is very difficult, many geometry optimization methods have been proposed, in which initial geometries are randomly generated and subsequently improved with different algorithms. In this study, a size-guided multi-seed heuristic method is developed and applied to benzene clusters. It produces initial configurations of the cluster with n molecules from the lowest-energy configurations of the cluster with n - 1 molecules (seeds). The initial geometries are further optimized with the geometrical perturbations previously used for molecular clusters. These steps are repeated until the size n satisfies a predefined one. The method locates putative global minima of benzene clusters with up to 65 molecules. The performance of the method is discussed using the computational cost, rates to locate the global minima, and energies of initial geometries. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
Relation between financial market structure and the real economy: comparison between clustering methods.

PubMed

Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T

2015-01-01

We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].
Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

PubMed Central

Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T.

2015-01-01

We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging. PMID:25786703
Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

PubMed

Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

2015-02-01

Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.
Fast optimization of binary clusters using a novel dynamic lattice searching method.

PubMed

Wu, Xia; Cheng, Wen

2014-09-28

Global optimization of binary clusters has been a difficult task despite of much effort and many efficient methods. Directing toward two types of elements (i.e., homotop problem) in binary clusters, two classes of virtual dynamic lattices are constructed and a modified dynamic lattice searching (DLS) method, i.e., binary DLS (BDLS) method, is developed. However, it was found that the BDLS can only be utilized for the optimization of binary clusters with small sizes because homotop problem is hard to be solved without atomic exchange operation. Therefore, the iterated local search (ILS) method is adopted to solve homotop problem and an efficient method based on the BDLS method and ILS, named as BDLS-ILS, is presented for global optimization of binary clusters. In order to assess the efficiency of the proposed method, binary Lennard-Jones clusters with up to 100 atoms are investigated. Results show that the method is proved to be efficient. Furthermore, the BDLS-ILS method is also adopted to study the geometrical structures of (AuPd)79 clusters with DFT-fit parameters of Gupta potential.
Nuclear pasta in hot dense matter and its implications for neutrino scattering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roggero, Alessandro; Margueron, Jerome; Roberts, Luke F.

The abundance of large clusters of nucleons in neutron-rich matter at subnuclear density is found to be greatly reduced by finite-temperature effects when matter is close to β equilibrium, compared to the case where the electron fraction is fixed at Y e > 0.1 , as often considered in the literature. Large nuclei and exotic nonspherical nuclear configurations called pasta, favored in the vicinity of the transition to uniform matter at T = 0 , dissolve at a relatively low temperature T u as protons leak out of nuclei and pasta. For matter at β-equilibrium with a negligible neutrino chemical potential we find that Tmore » $$β\\atop{u}$$ ≃ 4 ± 1 MeV for realistic equations of state. This is lower than the maximum temperature T$$β\\atop{max}$$ ≃ 9 ± 1 MeV at which nuclei can coexist with a gas of nucleons and can be explained by a change in the nature of the transition to uniform matter called retrograde condensation. An important new finding is that coherent neutrino scattering from nuclei and pasta makes a modest contribution to the opacity under the conditions encountered in supernovas and neutron star mergers. This is because large nuclear clusters dissolve at most relevant temperatures, and at lower temperatures, when clusters are present, Coulomb correlations between them suppress coherent neutrino scattering off individual clusters. Lastly, implications for neutrino signals from galactic supernovas are briefly discussed.« less
Nuclear pasta in hot dense matter and its implications for neutrino scattering

DOE PAGES

Roggero, Alessandro; Margueron, Jerome; Roberts, Luke F.; ...

2018-04-16

The abundance of large clusters of nucleons in neutron-rich matter at subnuclear density is found to be greatly reduced by finite-temperature effects when matter is close to β equilibrium, compared to the case where the electron fraction is fixed at Y e > 0.1 , as often considered in the literature. Large nuclei and exotic nonspherical nuclear configurations called pasta, favored in the vicinity of the transition to uniform matter at T = 0 , dissolve at a relatively low temperature T u as protons leak out of nuclei and pasta. For matter at β-equilibrium with a negligible neutrino chemical potential we find that Tmore » $$β\\atop{u}$$ ≃ 4 ± 1 MeV for realistic equations of state. This is lower than the maximum temperature T$$β\\atop{max}$$ ≃ 9 ± 1 MeV at which nuclei can coexist with a gas of nucleons and can be explained by a change in the nature of the transition to uniform matter called retrograde condensation. An important new finding is that coherent neutrino scattering from nuclei and pasta makes a modest contribution to the opacity under the conditions encountered in supernovas and neutron star mergers. This is because large nuclear clusters dissolve at most relevant temperatures, and at lower temperatures, when clusters are present, Coulomb correlations between them suppress coherent neutrino scattering off individual clusters. Lastly, implications for neutrino signals from galactic supernovas are briefly discussed.« less
The Scale Sizes of Globular Clusters: Tidal Limits, Evolution, and the Outer Halo

NASA Astrophysics Data System (ADS)

Harris, William

2011-10-01

The physical factors that determine the linear sizes of massive star clusters are not well understood. Their scale sizes were long thought to be governed by the tidal field of the parent galaxy, but major questions are now emerging. Globular clusters, for example, have mean sizes nearly independent of location in the halo. Paradoxically, the recently discovered "anomalous extended clusters" in M31 and elsewhere have scale sizes that fit much better with tidal theory, but they are puzzlingly rare. Lastly, the persistent size difference between metal-poor and metal-rich clusters still lacks a quantitative explanation. Many aspects of these observations call for better modelling of dynamical evolution in the outskirts of clusters, and also their conditions of formation including the early rapid mass loss phase of protoclusters. A new set of accurate measurements of scale sizes and structural parameters, for a large and homogeneous set of globular clusters, would represent a major advance in this subject. We propose to carry out a {WFC3+ACS} imaging survey of the globular clusters in the supergiant Virgo elliptical M87 to cover the complete run of the halo. M87 is an optimum target system because of its huge numbers of clusters and HST's ability to resolve the cluster profiles accurately. We will derive cluster effective radii, central concentrations, luminosities, and colors for more than 4000 clusters using PSF-convolved King-model profile fitting. In parallel, we are developing theoretical tools to model the expected distribution of cluster sizes versus galactocentric distance as functions of cluster mass, concentration, and orbital anisotropy.
Differences in the rotational properties of multiple stellar populations in M13: a faster rotation for the `extreme' chemical subpopulation

NASA Astrophysics Data System (ADS)

Cordero, M. J.; Hénault-Brunet, V.; Pilachowski, C. A.; Balbinot, E.; Johnson, C. I.; Varri, A. L.

2017-03-01

We use radial velocities from spectra of giants obtained with the WIYN telescope, coupled with existing chemical abundance measurements of Na and O for the same stars, to probe the presence of kinematic differences among the multiple populations of the globular cluster (GC) M13. To characterize the kinematics of various chemical subsamples, we introduce a method using Bayesian inference along with a Markov chain Monte Carlo algorithm to fit a six-parameter kinematic model (including rotation) to these subsamples. We find that the so-called extreme population (Na-enhanced and extremely O-depleted) exhibits faster rotation around the centre of the cluster than the other cluster stars, in particular, when compared with the dominant `intermediate' population (moderately Na-enhanced and O-depleted). The most likely difference between the rotational amplitude of this extreme population and that of the intermediate population is found to be ˜4 km s-1 , with a 98.4 per cent probability that the rotational amplitude of the extreme population is larger than that of the intermediate population. We argue that the observed difference in rotational amplitudes, obtained when splitting subsamples according to their chemistry, is not a product of the long-term dynamical evolution of the cluster, but more likely a surviving feature imprinted early in the formation history of this GC and its multiple populations. We also find an agreement (within uncertainties) in the inferred position angle of the rotation axis of the different subpopulations considered. We discuss the constraints that these results may place on various formation scenarios.
Jovian Trojans: Orbital structures versus the WISE data

NASA Astrophysics Data System (ADS)

Rozehnal, Jakub; Broz, M.

2013-10-01

In this work, we study the relation between orbital characteristics of Jovian Trojans and their albedos and diameters as measured by the WISE/NEOWISE mission (Grav et al. 2011, 2012). In our previous work (Broz & Rozehnal 2011), we concluded that there is only one collisional family with parent body size larger than 100 km among Trojans, namely the Eurybates. This finding was based on the analysis of the observed size distributions, colour data from the Sloan Digital Sky Survey, and simulations of orbital evolution. The WISE albedos serve as an independent source of information which allows us to verify our previous results. We also update our database of suitable resonant elements (i.e. the libration amplidude D, eccentricity e, inclination I) of Trojans and we look for new (to-be-discovered) clusters by the Hierarchical Clustering Method. Using the WISE diameters, we can construct more precise size-frequency distributions of Trojans in both the leading/trailing clouds which we compare to SFD of the cluster(s) mentioned above. We then prepare a collisional model (based on the Boulder code, Morbidelli et al. 2009). Initial conditions of our model are based on an assumption that the Trojans were captured from a destabilised transplanetary disc while Jupiter jumped during its close encounter with a Neptune-mass planet - the so-called "jump capture" (Nesvorny et al. 2013). Within the framework of this model we try to constrain the age of the Eurybates family. The work of MB was supported by grant GACR 13-013085 of the Czech Science Foundation and the Research Programme MSM0021620860 of the Czech Ministry of Education.

Cluster compression algorithm: A joint clustering/data compression concept

NASA Technical Reports Server (NTRS)

Hilbert, E. E.

1977-01-01

The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Review of methods for handling confounding by cluster and informative cluster size in clustered data

PubMed Central

Seaman, Shaun; Pavlou, Menelaos; Copas, Andrew

2014-01-01

Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland. PMID:25087978
Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

PubMed

Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

2017-01-01

Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

PubMed

Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

2018-06-01

Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

PubMed Central

2010-01-01

Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Group-sparse representation with dictionary learning for medical image denoising and fusion.

PubMed

Li, Shutao; Yin, Haitao; Fang, Leyuan

2012-12-01

Recently, sparse representation has attracted a lot of interest in various areas. However, the standard sparse representation does not consider the intrinsic structure, i.e., the nonzero elements occur in clusters, called group sparsity. Furthermore, there is no dictionary learning method for group sparse representation considering the geometrical structure of space spanned by atoms. In this paper, we propose a novel dictionary learning method, called Dictionary Learning with Group Sparsity and Graph Regularization (DL-GSGR). First, the geometrical structure of atoms is modeled as the graph regularization. Then, combining group sparsity and graph regularization, the DL-GSGR is presented, which is solved by alternating the group sparse coding and dictionary updating. In this way, the group coherence of learned dictionary can be enforced small enough such that any signal can be group sparse coded effectively. Finally, group sparse representation with DL-GSGR is applied to 3-D medical image denoising and image fusion. Specifically, in 3-D medical image denoising, a 3-D processing mechanism (using the similarity among nearby slices) and temporal regularization (to perverse the correlations across nearby slices) are exploited. The experimental results on 3-D image denoising and image fusion demonstrate the superiority of our proposed denoising and fusion approaches.
Calculation of the critical overdensity in the spherical-collapse approximation

NASA Astrophysics Data System (ADS)

Herrera, D.; Waga, I.; Jorás, S. E.

2017-03-01

Critical overdensity δc is a key concept in estimating the number count of halos for different redshift and halo-mass bins, and therefore, it is a powerful tool to compare cosmological models to observations. There are currently two different prescriptions in the literature for its calculation, namely, the differential-radius and the constant-infinity methods. In this work we show that the latter yields precise results only if we are careful in the definition of the so-called numerical infinities. Although the subtleties we point out are crucial ingredients for an accurate determination of δc both in general relativity and in any other gravity theory, we focus on f (R )-modified gravity models in the metric approach; in particular, we use the so-called large (F =1 /3 ) and small-field (F =0 ) limits. For both of them, we calculate the relative errors (between our method and the others) in the critical density δc, in the comoving number density of halos per logarithmic mass interval nln M, and in the number of clusters at a given redshift in a given mass bin Nbin, as functions of the redshift. We have also derived an analytical expression for the density contrast in the linear regime as a function of the collapse redshift zc and Ωm 0 for any F .
White Matter Tract Segmentation as Multiple Linear Assignment Problems

PubMed Central

Sharmin, Nusrat; Olivetti, Emanuele; Avesani, Paolo

2018-01-01

Diffusion magnetic resonance imaging (dMRI) allows to reconstruct the main pathways of axons within the white matter of the brain as a set of polylines, called streamlines. The set of streamlines of the whole brain is called the tractogram. Organizing tractograms into anatomically meaningful structures, called tracts, is known as the tract segmentation problem, with important applications to neurosurgical planning and tractometry. Automatic tract segmentation techniques can be unsupervised or supervised. A common criticism of unsupervised methods, like clustering, is that there is no guarantee to obtain anatomically meaningful tracts. In this work, we focus on supervised tract segmentation, which is driven by prior knowledge from anatomical atlases or from examples, i.e., segmented tracts from different subjects. We present a supervised tract segmentation method that segments a given tract of interest in the tractogram of a new subject using multiple examples as prior information. Our proposed tract segmentation method is based on the idea of streamline correspondence i.e., on finding corresponding streamlines across different tractograms. In the literature, streamline correspondence has been addressed with the nearest neighbor (NN) strategy. Differently, here we formulate the problem of streamline correspondence as a linear assignment problem (LAP), which is a cornerstone of combinatorial optimization. With respect to the NN, the LAP introduces a constraint of one-to-one correspondence between streamlines, that forces the correspondences to follow the local anatomical differences between the example and the target tract, neglected by the NN. In the proposed solution, we combined the Jonker-Volgenant algorithm (LAPJV) for solving the LAP together with an efficient way of computing the nearest neighbors of a streamline, which massively reduces the total amount of computations needed to segment a tract. Moreover, we propose a ranking strategy to merge correspondences coming from different examples. We validate the proposed method on tractograms generated from the human connectome project (HCP) dataset and compare the segmentations with the NN method and the ROI-based method. The results show that LAP-based segmentation is vastly more accurate than ROI-based segmentation and substantially more accurate than the NN strategy. We provide a Free/OpenSource implementation of the proposed method. PMID:29467600
White Matter Tract Segmentation as Multiple Linear Assignment Problems.

PubMed

Sharmin, Nusrat; Olivetti, Emanuele; Avesani, Paolo

2017-01-01

Diffusion magnetic resonance imaging (dMRI) allows to reconstruct the main pathways of axons within the white matter of the brain as a set of polylines, called streamlines. The set of streamlines of the whole brain is called the tractogram. Organizing tractograms into anatomically meaningful structures, called tracts, is known as the tract segmentation problem, with important applications to neurosurgical planning and tractometry. Automatic tract segmentation techniques can be unsupervised or supervised. A common criticism of unsupervised methods, like clustering, is that there is no guarantee to obtain anatomically meaningful tracts. In this work, we focus on supervised tract segmentation, which is driven by prior knowledge from anatomical atlases or from examples, i.e., segmented tracts from different subjects. We present a supervised tract segmentation method that segments a given tract of interest in the tractogram of a new subject using multiple examples as prior information. Our proposed tract segmentation method is based on the idea of streamline correspondence i.e., on finding corresponding streamlines across different tractograms. In the literature, streamline correspondence has been addressed with the nearest neighbor (NN) strategy. Differently, here we formulate the problem of streamline correspondence as a linear assignment problem (LAP), which is a cornerstone of combinatorial optimization. With respect to the NN, the LAP introduces a constraint of one-to-one correspondence between streamlines, that forces the correspondences to follow the local anatomical differences between the example and the target tract, neglected by the NN. In the proposed solution, we combined the Jonker-Volgenant algorithm (LAPJV) for solving the LAP together with an efficient way of computing the nearest neighbors of a streamline, which massively reduces the total amount of computations needed to segment a tract. Moreover, we propose a ranking strategy to merge correspondences coming from different examples. We validate the proposed method on tractograms generated from the human connectome project (HCP) dataset and compare the segmentations with the NN method and the ROI-based method. The results show that LAP-based segmentation is vastly more accurate than ROI-based segmentation and substantially more accurate than the NN strategy. We provide a Free/OpenSource implementation of the proposed method.
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

PubMed

Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

2013-03-01

Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
ODE, RDE and SDE models of cell cycle dynamics and clustering in yeast.

PubMed

Boczko, Erik M; Gedeon, Tomas; Stowers, Chris C; Young, Todd R

2010-07-01

Biologists have long observed periodic-like oxygen consumption oscillations in yeast populations under certain conditions, and several unsatisfactory explanations for this phenomenon have been proposed. These ‘autonomous oscillations’ have often appeared with periods that are nearly integer divisors of the calculated doubling time of the culture. We hypothesize that these oscillations could be caused by a form of cell cycle synchronization that we call clustering. We develop some novel ordinary differential equation models of the cell cycle. For these models, and for random and stochastic perturbations, we give both rigorous proofs and simulations showing that both positive and negative growth rate feedback within the cell cycle are possible agents that can cause clustering of populations within the cell cycle. It occurs for a variety of models and for a broad selection of parameter values. These results suggest that the clustering phenomenon is robust and is likely to be observed in nature. Since there are necessarily an integer number of clusters, clustering would lead to periodic-like behaviour with periods that are nearly integer divisors of the period of the cell cycle. Related experiments have shown conclusively that cell cycle clustering occurs in some oscillating yeast cultures.
IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

DOE PAGES

Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken; ...

2016-11-29

Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
Invasive advance of an advantageous mutation: nucleation theory.

PubMed

O'Malley, Lauren; Basham, James; Yasi, Joseph A; Korniss, G; Allstadt, Andrew; Caraco, Thomas

2006-12-01

For sedentary organisms with localized reproduction, spatially clustered growth drives the invasive advance of a favorable mutation. We model competition between two alleles where recurrent mutation introduces a genotype with a rate of local propagation exceeding the resident's rate. We capture ecologically important properties of the rare invader's stochastic dynamics by assuming discrete individuals and local neighborhood interactions. To understand how individual-level processes may govern population patterns, we invoke the physical theory for nucleation of spatial systems. Nucleation theory discriminates between single-cluster and multi-cluster dynamics. A sufficiently low mutation rate, or a sufficiently small environment, generates single-cluster dynamics, an inherently stochastic process; a favorable mutation advances only if the invader cluster reaches a critical radius. For this mode of invasion, we identify the probability distribution of waiting times until the favored allele advances to competitive dominance, and we ask how the critical cluster size varies as propagation or mortality rates vary. Increasing the mutation rate or system size generates multi-cluster invasion, where spatial averaging produces nearly deterministic global dynamics. For this process, an analytical approximation from nucleation theory, called Avrami's Law, describes the time-dependent behavior of the genotype densities with remarkable accuracy.
Hybrid clustering based fuzzy structure for vibration control - Part 1: A novel algorithm for building neuro-fuzzy system

NASA Astrophysics Data System (ADS)

Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok

2015-01-01

This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken

Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
Improved Ant Colony Clustering Algorithm and Its Performance Study

PubMed Central

Gao, Wei

2016-01-01

Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Symptom Clusters in Advanced Cancer Patients: An Empirical Comparison of Statistical Methods and the Impact on Quality of Life.

PubMed

Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M

2016-01-01

Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Topic modeling for cluster analysis of large biological and medical datasets

PubMed Central

2014-01-01

Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106
Topic modeling for cluster analysis of large biological and medical datasets.

PubMed

Zhao, Weizhong; Zou, Wen; Chen, James J

2014-01-01

The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
Study of cluster behavior in the riser of CFB by the DSMC method

NASA Astrophysics Data System (ADS)

Liu, H. P.; Liu, D. Y.; Liu, H.

2010-03-01

The flow behaviors of clusters in the riser of a two-dimensional (2D) circulating fluidized bed was numerically studied based on the Euler-Lagrangian approach. Gas turbulence was modeled by means of Large Eddy Simulation (LES). Particle collision was modeled by means of the direct simulation Monte Carlo (DSMC) method. Clusters' hydrodynamic characteristics are obtained using a cluster identification method proposed by sharrma et al. (2000). The descending clusters near the wall region and the up- and down-flowing clusters in the core were studied separately due to their different flow behaviors. The effects of superficial gas velocity on the cluster behavior were analyzed. Simulated results showed that near wall clusters flow downward and the descent velocity is about -45 cm/s. The occurrence frequency of the up-flowing cluster is higher than that of down-flowing cluster in the core of riser. With the increase of superficial gas velocity, the solid concentration and occurrence frequency of clusters decrease, while the cluster axial velocity increase. Simulated results were in agreement with experimental data. The stochastic method used in present paper is feasible for predicting the cluster flow behavior in CFBs.

A hybrid protection approaches for denial of service (DoS) attacks in wireless sensor networks

NASA Astrophysics Data System (ADS)

Gunasekaran, Mahalakshmi; Periakaruppan, Subathra

2017-06-01

Wireless sensor network (WSN) contains the distributed autonomous devices with the sensing capability of physical and environmental conditions. During the clustering operation, the consumption of more energy causes the draining in battery power that leads to minimum network lifetime. Hence, the WSN devices are initially operated on low-power sleep mode to maximise the lifetime. But, the attacks arrival cause the disruption in low-power operating called denial of service (DoS) attacks. The conventional intrusion detection (ID) approaches such as rule-based and anomaly-based methods effectively detect the DoS attacks. But, the energy consumption and false detection rate are more. The absence of attack information and broadcast of its impact to the other cluster head (CH) leads to easy DoS attacks arrival. This article combines the isolation and routing tables to detect the attack in the specific cluster and broadcasts the information to other CH. The intercommunication between the CHs prevents the DoS attacks effectively. In addition, the swarm-based defence approach is proposed to migrate the fault channel to normal operating channel through frequency hop approaches. The comparative analysis between the proposed table-based intrusion detection systems (IDSs) and swarm-based defence approaches with the traditional IDS regarding the parameters of transmission overhead/efficiency, energy consumption, and false positive/negative rates proves the capability of DoS prediction/prevention in WSN.
Hektor - an exceptional D-type family among Jovian Trojans

NASA Astrophysics Data System (ADS)

Rozehnal, J.; Brož, M.; Nesvorný, D.; Durda, D. D.; Walsh, K.; Richardson, D. C.; Asphaug, E.

2016-11-01

In this work, we analyse Jovian Trojans in the space of suitable resonant elements and we identify clusters of possible collisional origin by two independent methods: the hierarchical clustering and a so-called randombox. Compared to our previous work, we study a twice larger sample. Apart from Eurybates, Ennomos and 1996 RJ families, we have found three more clusters - namely families around asteroids (20961) Arkesilaos, (624) Hektor in the L4 libration zone and (247341) 2001 UV209 in L5. The families fulfill our stringent criteria, I.e. a high statistical significance, an albedo homogeneity and a steeper size-frequency distribution than that of background. In order to understand their nature, we simulate their long term collisional evolution with the Boulder code and dynamical evolution using a modified SWIFT integrator. Within the framework of our evolutionary model, we were able to constrain the age of the Hektor family to be either 1-4 Gyr or, less likely, 0.1-2.5 Gyr, depending on initial impact geometry. Since (624) Hektor itself seems to be a bilobed-shape body with a satellite, I.e. an exceptional object, we address its association with the D-type family and we demonstrate that the moon and family could be created during a single impact event. We simulated the cratering event using a smoothed particle hydrodynamics. This is also the first case of a family associated with a D-type parent body.
Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering

NASA Astrophysics Data System (ADS)

Onishi, Masaki; Yoda, Ikushi

In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.
A cluster merging method for time series microarray with production values.

PubMed

Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio

2014-09-01

A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.
The role of nickel in radiation damage of ferritic alloys

DOE PAGES

Osetsky, Y.; Anento, Napoleon; Serra, Anna; ...

2014-11-26

According to modern theory, damage evolution under neutron irradiation depends on the fraction of self-interstitial atoms (SIAs) produced in the form of one-dimensional glissile clusters. These clusters, having a low interaction cross-section with other defects, are absorbed mainly by grain boundaries and dislocations, creating the so-called production bias. It is known empirically that the addition of certain alloying elements influences many radiation effects, including swelling; however, the mechanisms are unknown in many cases. In this study, we report the results of an extensive multi-technique atomistic level modeling study of SIA clusters mobility in body-centered cubic Fe–Ni alloys. We have foundmore » that Ni interacts strongly with the periphery of clusters, affecting their mobility. The total effect is defined by the number of Ni atoms interacting with the cluster at the same time and can be significant, even in low-Ni alloys. Thus a 1 nm (37SIAs) cluster is practically immobile at T < 500 K in the Fe–0.8 at.% Ni alloy. Increasing cluster size and Ni content enhances cluster immobilization. Finally, this effect should have quite broad consequences in void swelling, matrix damage accumulation and radiation induced hardening and the results obtained help to better understand and predict the effects of radiation in Fe–Ni ferritic alloys.« less
Calibrating the Planck cluster mass scale with cluster velocity dispersions

NASA Astrophysics Data System (ADS)

Amodeo, S.; Mei, S.; Stanford, S. A.; Bartlett, J. G.; Lawrence, C. L.; Chary, R. R.; Shim, H.; Marleau, F.; Stern, D.

2017-12-01

The potential of galaxy clusters as cosmological probes critically depends on the capability to obtain accurate estimates of their mass. This will be a key measurement for the next generation of cosmological surveys, such as Euclid. The discrepancy between the cosmological parameters determined from anisotropies in the cosmic microwave background and those derived from cluster abundance measurements from the Planck satellite calls for careful evaluation of systematic biases in cluster mass estimates. For this purpose, it is crucial to use independent techniques, like analysis of the thermal emission of the intracluster medium (ICM), observed either in the X-rays or through the Sunyaev-Zeldovich (SZ) effect, dynamics of member galaxies or gravitational lensing. We discuss possible bias in the Planck SZ mass proxy, which is based on X-ray observations. Using optical spectroscopy from the Gemini Multi-Object Spectrograph of 17 Planck-selected clusters, we present new estimates of the cluster mass based on the velocity dispersion of the member galaxies and independently of the ICM properties. We show how the difference between the velocity dispersion of galaxy and dark matter particles in simulations is the primary factor limiting interpretation of dynamical cluster mass measurements at this time, and we give the first observational constraints on the velocity bias.
Point counts from clustered populations: Lessons from an experiment with Hawaiian crows

USGS Publications Warehouse

Hayward, G.D.; Kepler, C.B.; Scott, J.M.

1991-01-01

We designed an experiment to identify factors contributing most to error in counts of Hawaiian Crow or Alala (Corvus hawaiiensis) groups that are detected aurally. Seven observers failed to detect calling Alala on 197 of 361 3-min point counts on four transects extending from cages with captive Alala. A detection curve describing the relation between frequency of flock detection and distance typified the distribution expected in transect or point counts. Failure to detect calling Alala was affected most by distance, observer, and Alala calling frequency. The number of individual Alala calling was not important in detection rate. Estimates of the number of Alala calling (flock size) were biased and imprecise: average difference between number of Alala calling and number heard was 3.24 (.+-. 0.277). Distance, observer, number of Alala calling, and Alala calling frequency all contributed to errors in estimates of group size (P < 0.0001). Multiple regression suggested that number of Alala calling contributed most to errors. These results suggest that well-designed point counts may be used to estimate the number of Alala flocks but cast doubt on attempts to estimate flock size when individuals are counted aurally.
Applications of Some Artificial Intelligence Methods to Satellite Soundings

NASA Technical Reports Server (NTRS)

Munteanu, M. J.; Jakubowicz, O.

1985-01-01

Hard clustering of temperature profiles and regression temperature retrievals were used to refine the method using the probabilities of membership of each pattern vector in each of the clusters derived with discriminant analysis. In hard clustering the maximum probability is taken and the corresponding cluster as the correct cluster are considered discarding the rest of the probabilities. In fuzzy partitioned clustering these probabilities are kept and the final regression retrieval is a weighted regression retrieval of several clusters. This method was used in the clustering of brightness temperatures where the purpose was to predict tropopause height. A further refinement is the division of temperature profiles into three major regions for classification purposes. The results are summarized in the tables total r.m.s. errors are displayed. An approach based on fuzzy logic which is intimately related to artificial intelligence methods is recommended.
Clustering-based Feature Learning on Variable Stars

NASA Astrophysics Data System (ADS)

Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos

2016-04-01

The success of automatic classification of variable stars depends strongly on the lightcurve representation. Usually, lightcurves are represented as a vector of many descriptors designed by astronomers called features. These descriptors are expensive in terms of computing, require substantial research effort to develop, and do not guarantee a good classification. Today, lightcurve representation is not entirely automatic; algorithms must be designed and manually tuned up for every survey. The amounts of data that will be generated in the future mean astronomers must develop scalable and automated analysis pipelines. In this work we present a feature learning algorithm designed for variable objects. Our method works by extracting a large number of lightcurve subsequences from a given set, which are then clustered to find common local patterns in the time series. Representatives of these common patterns are then used to transform lightcurves of a labeled set into a new representation that can be used to train a classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias using only labeled data. We test our method on data sets from the Massive Compact Halo Object survey and the Optical Gravitational Lensing Experiment; the results show that our classification performance is as good as and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step toward the automation of the lightcurve classification pipeline.
The detection methods of dynamic objects

NASA Astrophysics Data System (ADS)

Knyazev, N. L.; Denisova, L. A.

2018-01-01

The article deals with the application of cluster analysis methods for solving the task of aircraft detection on the basis of distribution of navigation parameters selection into groups (clusters). The modified method of cluster analysis for search and detection of objects and then iterative combining in clusters with the subsequent count of their quantity for increase in accuracy of the aircraft detection have been suggested. The course of the method operation and the features of implementation have been considered. In the conclusion the noted efficiency of the offered method for exact cluster analysis for finding targets has been shown.
CALL FOR PAPERS: Special cluster issue on `Experimental studies of zonal flow and turbulence'

NASA Astrophysics Data System (ADS)

Itoh, S.-I.

2005-07-01

Plasma Physics and Controlled Fusion (PPCF) invites submissions on the topic of `Experimental studies of zonal flow and turbulence', for consideration for a special topical cluster of articles to be published early in 2006. The topical cluster will be published in an issue of PPCF, combined with regular articles. The Guest Editor for the special cluster will be S-I Itoh, Kyushu University, Japan. There has been remarkable progress in the area of structure formation by turbulence. One of the highlights has been the physics of zonal flow and drift wave turbulence in toroidal plasmas. Extensive theoretical as well as computational studies have revealed the various mechanisms in turbulence and zonal flows. At the same time, experimental research on the zonal flow, geodesic acoustic modes and generation of global electric field by turbulence has evolved rapidly. Fast growth in reports of experimental results has stimulated further efforts to develop increased knowledge and systematic understanding. Each paper considered for the special cluster should describe the present research status and new scientific knowledge/results from the authors on experimental studies of zonal flow, geodesic acoustic modes and generation of electric field by turbulence (including studies of Reynolds-Maxwell stresses, etc). Manuscripts submitted to this special cluster in Plasma Physics and Controlled Fusion will be refereed according to the normal criteria and procedures of the journal. The Guest Editor guides the progress of the cluster from the initial open call, through the standard refereeing process, to publication. To be considered for inclusion in the special cluster, articles must be submitted by 2 September 2005 and must clearly state `for inclusion in the Turbulent Plasma Cluster'. Articles submitted after this deadline may not be included in the cluster issue but may be published in a later issue of the journal. Please submit your manuscript electronically via our web site at www.iop.org/journals/ppcf or by e-mail to mailto:ppcf@iop.org. Electronic submissions are encouraged but if you wish to submit a hard copy of your article then please send your typescript, a set of original figures and a covering letter to: The Publishing Administrator Plasma Physics and Controlled Fusion Institute of Physics Publishing Dirac House Temple Back Bristol BS1 6BE UK Further information on how to submit may be obtained on request by e-mailing the journal at the above address. Alternatively, visit the homepage of the journal (www.iop.org/journals/ppcf).
New Target for an Old Method: Hubble Measures Globular Cluster Parallax

NASA Astrophysics Data System (ADS)

Hensley, Kerry

2018-05-01

Measuring precise distances to faraway objects has long been a challenge in astrophysics. Now, one of the earliest techniques used to measure the distance to astrophysical objects has been applied to a metal-poor globular cluster for the first time.A Classic TechniqueAn artists impression of the European Space Agencys Gaia spacecraft. Gaia is on track to map the positions and motions of a billion stars. [ESA]Distances to nearby stars are often measured using the parallax technique tracing the tiny apparent motion of a target star against the background of more distant stars as Earth orbits the Sun. This technique has come a long way since it was first used in the 1800s to measure the distance to stars a few tens of light-years away; with the advent of space observatories like Hipparcos and Gaia, parallax can now be used to map the positions of stars out to thousands of light-years.Precise distance measurements arent only important for setting the scale of the universe, however; they can also help us better understand stellar evolution over the course of cosmic history. Stellar evolution models are often anchored to a reference star cluster, the properties of which must be known precisely. These precise properties can be readily determined for young, nearby open clusters using parallax measurements. But stellar evolution models that anchor on themore-distant, ancient, metal-poor globular clusters have been hampered by theless-precise indirect methods used tomeasure distance to these faraway clusters until now.Top: An image of NGC 6397 overlaid with the area scanned by Hubble (dashed green) and the footprint of the camera (solid green). The blue ellipse represents the parallax motion of a star in the cluster, exaggerated by a factor of ten thousand. Bottom: An example scan from this field. [Adapted from Brown et al. 2018]New Measurement to an Old ClusterThomas Brown (Space Telescope Science Institute) and collaborators used the Hubble Space Telescope todetermine the distance to NGC 6397, one of the nearest metal-poor globular clusters and anchor for one stellar population model. Brown and coauthors used a technique called spatial scanning to greatly broaden the reach of the parallax method.Spatial scanning was initially developed as a way to increase the signal-to-noise of exoplanet transit observations, but it has also greatly improved the prospects of astrometry precisely determining the separations between astronomical objects. In spatial scanning, the telescope moves while the exposure is being taken, spreading the light out across many pixels.Unprecedented PrecisionThis technique allowed the authors to achieve a precision of 20100microarcseconds. From the observed parallax angle of just 0.418 milliarcseconds (for reference, the moons angular size is about 5 million times larger on the sky!), Brown and collaborators refined the distance to NGC 6397 to 7,795 light-years, with a measurement error of only a few percent.Using spatial scanning, Hubble can make parallax measurements of nearby globular clusters, while Gaia has the potential to reach even farther. Looking ahead, the measurement made by Brown and collaborators can be combined with the recently released Gaia data to trim the uncertainty down to just 1%. This highlights the power of space telescopes to make extremely precise measurements of astoundingly large distances informing our models and helping us measure the universe.CitationThomas Brown et al 2018ApJL856 L6. doi:10.3847/2041-8213/aab55a
Weak-lensing detection of intracluster filaments with ground-based data

NASA Astrophysics Data System (ADS)

Maturi, Matteo; Merten, Julian

2013-11-01

According to the current standard model of cosmology, matter in the Universe arranges itself along a network of filamentary structure. These filaments connect the main nodes of this so-called "cosmic web", which are clusters of galaxies. Although its large-scale distribution is clearly characterized by numerical simulations, constraining the dark-matter content of the cosmic web in reality turns out to be difficult. The natural method of choice is gravitational lensing. However, the direct detection and mapping of the elusive filament signal is challenging and in this work we present two methods that are specifically tailored to achieve this task. A linear matched filter aims at detecting the smooth mass-component of filaments and is optimized to perform a shear decomposition that follows the anisotropic component of the lensing signal. Filaments clearly inherit this property due to their morphology. At the same time, the contamination arising from the central massive cluster is controlled in a natural way. The filament 1σ detection is of about κ ~ 0.01 - 0.005 according to the filter's template width and length, enabling the detection of structures beyond reach with other approaches. The second, complementary method seeks to detect the clumpy component of filaments. The detection is determined by the number density of subclump identifications in an area enclosing the potential filament, as was found within the observed field with the filter approach. We tested both methods against mocked observations based on realistic N-body simulations of filamentary structure and proved the feasibility of detecting filaments with ground-based data.
A theoretical study in extracting the essential features and dynamics of molecular motions: Intrinsic geometry methods for PF(5) pseudorotations and statistical methods for argon clusters

NASA Astrophysics Data System (ADS)

Panahi, Nima S.

We studied the problem of understanding and computing the essential features and dynamics of molecular motions through the development of two theories for two different systems. First, we studied the process of the Berry Pseudorotation of PF5 and the rotations it induces in the molecule through its natural and intrinsic geometric nature by setting it in the language of fiber bundles and graph theory. With these tools, we successfully extracted the essentials of the process' loops and induced rotations. The infinite number of pseudorotation loops were broken down into a small set of essential loops called "super loops", with their intrinsic properties and link to the physical movements of the molecule extensively studied. In addition, only the three "self-edge loops" generated any induced rotations, and then only a finite number of classes of them. Second, we studied applying the statistical methods of Principal Components Analysis (PCA) and Principal Coordinate Analysis (PCO) to capture only the most important changes in Argon clusters so as to reduce computational costs and graph the potential energy surface (PES) in three dimensions respectively. Both methods proved successful, but PCA was only partially successful since one will only see advantages for PES database systems much larger than those both currently being studied and those that can be computationally studied in the next few decades to come. In addition, PCA is only needed for the very rare case of a PES database that does not already include Hessian eigenvalues.
Gauge-free cluster variational method by maximal messages and moment matching.

PubMed

Domínguez, Eduardo; Lage-Castellanos, Alejandro; Mulet, Roberto; Ricci-Tersenghi, Federico

2017-04-01

We present an implementation of the cluster variational method (CVM) as a message passing algorithm. The kind of message passing algorithm used for CVM, usually named generalized belief propagation (GBP), is a generalization of the belief propagation algorithm in the same way that CVM is a generalization of the Bethe approximation for estimating the partition function. However, the connection between fixed points of GBP and the extremal points of the CVM free energy is usually not a one-to-one correspondence because of the existence of a gauge transformation involving the GBP messages. Our contribution is twofold. First, we propose a way of defining messages (fields) in a generic CVM approximation, such that messages arrive on a given region from all its ancestors, and not only from its direct parents, as in the standard parent-to-child GBP. We call this approach maximal messages. Second, we focus on the case of binary variables, reinterpreting the messages as fields enforcing the consistency between the moments of the local (marginal) probability distributions. We provide a precise rule to enforce all consistencies, avoiding any redundancy, that would otherwise lead to a gauge transformation on the messages. This moment matching method is gauge free, i.e., it guarantees that the resulting GBP is not gauge invariant. We apply our maximal messages and moment matching GBP to obtain an analytical expression for the critical temperature of the Ising model in general dimensions at the level of plaquette CVM. The values obtained outperform Bethe estimates, and are comparable with loop corrected belief propagation equations. The method allows for a straightforward generalization to disordered systems.
Gauge-free cluster variational method by maximal messages and moment matching

NASA Astrophysics Data System (ADS)

Domínguez, Eduardo; Lage-Castellanos, Alejandro; Mulet, Roberto; Ricci-Tersenghi, Federico

2017-04-01

We present an implementation of the cluster variational method (CVM) as a message passing algorithm. The kind of message passing algorithm used for CVM, usually named generalized belief propagation (GBP), is a generalization of the belief propagation algorithm in the same way that CVM is a generalization of the Bethe approximation for estimating the partition function. However, the connection between fixed points of GBP and the extremal points of the CVM free energy is usually not a one-to-one correspondence because of the existence of a gauge transformation involving the GBP messages. Our contribution is twofold. First, we propose a way of defining messages (fields) in a generic CVM approximation, such that messages arrive on a given region from all its ancestors, and not only from its direct parents, as in the standard parent-to-child GBP. We call this approach maximal messages. Second, we focus on the case of binary variables, reinterpreting the messages as fields enforcing the consistency between the moments of the local (marginal) probability distributions. We provide a precise rule to enforce all consistencies, avoiding any redundancy, that would otherwise lead to a gauge transformation on the messages. This moment matching method is gauge free, i.e., it guarantees that the resulting GBP is not gauge invariant. We apply our maximal messages and moment matching GBP to obtain an analytical expression for the critical temperature of the Ising model in general dimensions at the level of plaquette CVM. The values obtained outperform Bethe estimates, and are comparable with loop corrected belief propagation equations. The method allows for a straightforward generalization to disordered systems.
Template based protein structure modeling by global optimization in CASP11.

PubMed

Joo, Keehyoung; Joung, InSuk; Lee, Sun Young; Kim, Jong Yun; Cheng, Qianyi; Manavalan, Balachandran; Joung, Jong Young; Heo, Seungryong; Lee, Juyong; Nam, Mikyung; Lee, In-Ho; Lee, Sung Jong; Lee, Jooyoung

2016-09-01

For the template-based modeling (TBM) of CASP11 targets, we have developed three new protein modeling protocols (nns for server prediction and LEE and LEER for human prediction) by improving upon our previous CASP protocols (CASP7 through CASP10). We applied the powerful global optimization method of conformational space annealing to three stages of optimization, including multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain remodeling. For more successful fold recognition, a new alignment method called CRFalign was developed. It can incorporate sensitive positional and environmental dependence in alignment scores as well as strong nonlinear correlations among various features. Modifications and adjustments were made to the form of the energy function and weight parameters pertaining to the chain building procedure. For the side-chain remodeling step, residue-type dependence was introduced to the cutoff value that determines the entry of a rotamer to the side-chain modeling library. The improved performance of the nns server method is attributed to successful fold recognition achieved by combining several methods including CRFalign and to the current modeling formulation that can incorporate native-like structural aspects present in multiple templates. The LEE protocol is identical to the nns one except that CASP11-released server models are used as templates. The success of LEE in utilizing CASP11 server models indicates that proper template screening and template clustering assisted by appropriate cluster ranking promises a new direction to enhance protein 3D modeling. Proteins 2016; 84(Suppl 1):221-232. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
A VLBI resolution of the Pleiades distance controversy.

PubMed

Melis, Carl; Reid, Mark J; Mioduszewski, Amy J; Stauffer, John R; Bower, Geoffrey C

2014-08-29

Because of its proximity and its youth, the Pleiades open cluster of stars has been extensively studied and serves as a cornerstone for our understanding of the physical properties of young stars. This role is called into question by the "Pleiades distance controversy," wherein the cluster distance of 120.2 ± 1.5 parsecs (pc) as measured by the optical space astrometry mission Hipparcos is significantly different from the distance of 133.5 ± 1.2 pc derived with other techniques. We present an absolute trigonometric parallax distance measurement to the Pleiades cluster that uses very long baseline radio interferometry (VLBI). This distance of 136.2 ± 1.2 pc is the most accurate and precise yet presented for the cluster and is incompatible with the Hipparcos distance determination. Our results cement existing astrophysical models for Pleiades-age stars. Copyright © 2014, American Association for the Advancement of Science.
A Giant Gathering of Galaxies

NASA Image and Video Library

2015-11-03

The galaxy cluster called MOO J1142+1527 can be seen here as it existed when light left it 8.5 billion years ago. The red galaxies at the center of the image make up the heart of the galaxy cluster. This color image is constructed from multi-wavelength observations: Infrared observations from NASA's Spitzer Space Telescope are shown in red; near-infrared and visible light captured by the Gemini Observatory atop Mauna Kea in Hawaii is green and blue; and radio light from the Combined Array for Research in Millimeter-wave Astronomy (CARMA), near Owens Valley in California, is purple. In addition to galaxies, clusters also contain a reservoir of hot gas with temperatures in the tens of millions of degrees Celsius/Kelvin. CARMA was used to detect this gas, and to determine the mass of this cluster. http://photojournal.jpl.nasa.gov/catalog/PIA20052
Featured Image: The Birth of Spiral Arms

NASA Astrophysics Data System (ADS)

Kohler, Susanna

2017-01-01

In this figure, the top panels show three spiral galaxies in the Virgo cluster, imaged with the Sloan Digital Sky Survey. The bottom panels provide a comparison with three morphologically similar galaxies generated insimulations. The simulations run by Marcin Semczuk, Ewa okas, and Andrs del Pino (Nicolaus Copernicus Astronomical Center, Poland) were designed to examine how the spiral arms of galaxies like the Milky Way may have formed. In particular, the group exploredthe possibility that so-called grand-design spiral arms are caused by tidal effects as a Milky-Way-like galaxy orbits a cluster of galaxies. The authors show that the gravitational potential of the cluster can trigger the formation of two spiral arms each time the galaxy passes through the pericenter of its orbit around the cluster. Check out the original paper below for more information!CitationMarcin Semczuk et al 2017 ApJ 834 7. doi:10.3847/1538-4357/834/1/7

Rain volume estimation over areas using satellite and radar data

NASA Technical Reports Server (NTRS)

Doneaud, A. A.; Vonderhaar, T. H.

1985-01-01

An investigation of the feasibility of rain volume estimation using satellite data following a technique recently developed with radar data called the Arera Time Integral was undertaken. Case studies were selected on the basis of existing radar and satellite data sets which match in space and time. Four multicell clusters were analyzed. Routines for navigation remapping amd smoothing of satellite images were performed. Visible counts were normalized for solar zenith angle. A radar sector of interest was defined to delineate specific radar echo clusters for each radar time throughout the radar echo cluster lifetime. A satellite sector of interest was defined by applying small adjustments to the radar sector using a manual processing technique. The radar echo area, the IR maximum counts and the IR counts matching radar echo areas were found to evolve similarly, except for the decaying phase of the cluster where the cirrus debris keeps the IR counts high.
Electron–vibration coupling induced renormalization in the photoemission spectrum of diamondoids

DOE PAGES

Gali, Adam; Demján, Tamás; Vörös, Márton; ...

2016-04-22

The development of theories and methods devoted to the accurate calculation of the electronic quasi-particle states and levels of molecules, clusters and solids is of prime importance to interpret the experimental data. These quantum systems are often modelled by using the Born–Oppenheimer approximation where the coupling between the electrons and vibrational modes is not fully taken into account, and the electrons are treated as pure quasi-particles. Here, we show that in small diamond cages, called diamondoids, the electron–vibration coupling leads to the breakdown of the electron quasi-particle picture. More importantly, we demonstrate that the strong electron–vibration coupling is essential tomore » properly describe the overall lineshape of the experimental photoemission spectrum. This cannot be obtained by methods within Born–Oppenheimer approximation. Furthermore, we deduce a link between the vibronic states found by our many-body perturbation theory approach and the well-known Jahn–Teller effect.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gali, Adam; Demján, Tamás; Vörös, Márton

The development of theories and methods devoted to the accurate calculation of the electronic quasi-particle states and levels of molecules, clusters and solids is of prime importance to interpret the experimental data. These quantum systems are often modelled by using the Born–Oppenheimer approximation where the coupling between the electrons and vibrational modes is not fully taken into account, and the electrons are treated as pure quasi-particles. Here, we show that in small diamond cages, called diamondoids, the electron–vibration coupling leads to the breakdown of the electron quasi-particle picture. More importantly, we demonstrate that the strong electron–vibration coupling is essential tomore » properly describe the overall lineshape of the experimental photoemission spectrum. This cannot be obtained by methods within Born–Oppenheimer approximation. Furthermore, we deduce a link between the vibronic states found by our many-body perturbation theory approach and the well-known Jahn–Teller effect.« less
Detection of Tephra Layers in Antarctic Sediment Cores with Hyperspectral Imaging

PubMed Central

Aymerich, Ismael F.; Oliva, Marc; Giralt, Santiago; Martín-Herrero, Julio

2016-01-01

Tephrochronology uses recognizable volcanic ash layers (from airborne pyroclastic deposits, or tephras) in geological strata to set unique time references for paleoenvironmental events across wide geographic areas. This involves the detection of tephra layers which sometimes are not evident to the naked eye, including the so-called cryptotephras. Tests that are expensive, time-consuming, and/or destructive are often required. Destructive testing for tephra layers of cores from difficult regions, such as Antarctica, which are useful sources of other kinds of information beyond tephras, is always undesirable. Here we propose hyperspectral imaging of cores, Self-Organizing Map (SOM) clustering of the preprocessed spectral signatures, and spatial analysis of the classified images as a convenient, fast, non-destructive method for tephra detection. We test the method in five sediment cores from three Antarctic lakes, and show its potential for detection of tephras and cryptotephras. PMID:26815202
Electron–vibration coupling induced renormalization in the photoemission spectrum of diamondoids

PubMed Central

Gali, Adam; Demján, Tamás; Vörös, Márton; Thiering, Gergő; Cannuccia, Elena; Marini, Andrea

2016-01-01

The development of theories and methods devoted to the accurate calculation of the electronic quasi-particle states and levels of molecules, clusters and solids is of prime importance to interpret the experimental data. These quantum systems are often modelled by using the Born–Oppenheimer approximation where the coupling between the electrons and vibrational modes is not fully taken into account, and the electrons are treated as pure quasi-particles. Here, we show that in small diamond cages, called diamondoids, the electron–vibration coupling leads to the breakdown of the electron quasi-particle picture. More importantly, we demonstrate that the strong electron–vibration coupling is essential to properly describe the overall lineshape of the experimental photoemission spectrum. This cannot be obtained by methods within Born–Oppenheimer approximation. Moreover, we deduce a link between the vibronic states found by our many-body perturbation theory approach and the well-known Jahn–Teller effect. PMID:27103340
New imaging algorithm in diffusion tomography

NASA Astrophysics Data System (ADS)

Klibanov, Michael V.; Lucas, Thomas R.; Frank, Robert M.

1997-08-01

A novel imaging algorithm for diffusion/optical tomography is presented for the case of the time dependent diffusion equation. Numerical tests are conducted for ranges of parameters realistic for applications to an early breast cancer diagnosis using ultrafast laser pulses. This is a perturbation-like method which works for both homogeneous a heterogeneous background media. Its main innovation lies in a new approach for a novel linearized problem (LP). Such an LP is derived and reduced to a boundary value problem for a coupled system of elliptic partial differential equations. As is well known, the solution of such a system amounts to the factorization of well conditioned, sparse matrices with few non-zero entries clustered along the diagonal, which can be done very rapidly. Thus, the main advantages of this technique are that it is fast and accurate. The authors call this approach the elliptic systems method (ESM). The ESM can be extended for other data collection schemes.
Embedding Fragment ab Initio Model Potentials in CASSCF/CASPT2 Calculations of Doped Solids: Implementation and Applications.

PubMed

Swerts, Ben; Chibotaru, Liviu F; Lindh, Roland; Seijo, Luis; Barandiaran, Zoila; Clima, Sergiu; Pierloot, Kristin; Hendrickx, Marc F A

2008-04-01

In this article, we present a fragment model potential approach for the description of the crystalline environment as an extension of the use of embedding ab initio model potentials (AIMPs). The biggest limitation of the embedding AIMP method is the spherical nature of its model potentials. This poses problems as soon as the method is applied to crystals containing strongly covalently bonded structures with highly nonspherical electron densities. The newly proposed method addresses this problem by keeping the full electron density as its model potential, thus allowing one to group sets of covalently bonded atoms into fragments. The implementation in the MOLCAS 7.0 quantum chemistry package of the new method, which we call the embedding fragment ab inito model potential method (embedding FAIMP), is reported here, together with results of CASSCF/CASPT2 calculations. The developed methodology is applied for two test problems: (i) the investigation of the lowest ligand field states (2)A1 and (2)B1 of the Cr(V) defect in the YVO4 crystal and (ii) the investigation of the lowest ligand field and ligand-metal charge transfer (LMCT) states at the Mn(II) substitutional impurity doped into CaCO3. Comparison with similar calculations involving AIMPs for all environmental atoms, including those from covalently bounded units, shows that the FAIMP treatment of the YVO4 units surrounding the CrO4(3-) cluster increases the excitation energy (2)B1 → (2)A1 by ca. 1000 cm(-1) at the CASSCF level of calculation. In the case of the Mn(CO3)6(10-) cluster, the FAIMP treatment of the CO3(2-) units of the environment give smaller corrections, of ca. 100 cm(-1), for the ligand-field excitation energies, which is explained by the larger ligands of this cluster. However, the correction for the energy of the lowest LMCT transition is found to be ca. 600 cm(-1) for the CASSCF and ca. 1300 cm(-1) for the CASPT2 calculation.
Structure formation of lipid membranes: Membrane self-assembly and vesicle opening-up to octopus-like micelles

NASA Astrophysics Data System (ADS)

Noguchi, Hiroshi

2013-02-01

We briefly review our recent studies on self-assembly and vesicle rupture of lipid membranes using coarse-grained molecular simulations. For single component membranes, lipid molecules self-assemble from random gas states to vesicles via disk-shaped clusters. Clusters aggregate into larger clusters, and subsequently the large disks close into vesicles. The size of vesicles are determined by kinetics than by thermodynamics. When a vesicle composed of lipid and detergent types of molecules is ruptured, a disk-shaped micelle called bicelle can be formed. When both surfactants have negligibly low critical micelle concentration, it is found that bicelles connected with worm-like micelles are also formed depending on the surfactant ratio and spontaneous curvature of the membrane monolayer.
Comulang: towards a collaborative e-learning system that supports student group modeling.

PubMed

Troussas, Christos; Virvou, Maria; Alepis, Efthimios

2013-01-01

This paper describes an e-learning system that is expected to further enhance the educational process in computer-based tutoring systems by incorporating collaboration between students and work in groups. The resulting system is called "Comulang" while as a test bed for its effectiveness a multiple language learning system is used. Collaboration is supported by a user modeling module that is responsible for the initial creation of student clusters, where, as a next step, working groups of students are created. A machine learning clustering algorithm works towards group formatting, so that co-operations between students from different clusters are attained. One of the resulting system's basic aims is to provide efficient student groups whose limitations and capabilities are well balanced.
Graph-based biomedical text summarization: An itemset mining and sentence clustering approach.

PubMed

Nasr Azadani, Mozhgan; Ghadiri, Nasser; Davoodijam, Ensieh

2018-06-12

Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain. Copyright © 2018. Published by Elsevier Inc.
FUSE: a profit maximization approach for functional summarization of biological networks.

PubMed

Seah, Boon-Siew; Bhowmick, Sourav S; Dewey, C Forbes; Yu, Hanry

2012-03-21

The availability of large-scale curated protein interaction datasets has given rise to the opportunity to investigate higher level organization and modularity within the protein interaction network (PPI) using graph theoretic analysis. Despite the recent progress, systems level analysis of PPIS remains a daunting task as it is challenging to make sense out of the deluge of high-dimensional interaction data. Specifically, techniques that automatically abstract and summarize PPIS at multiple resolutions to provide high level views of its functional landscape are still lacking. We present a novel data-driven and generic algorithm called FUSE (Functional Summary Generator) that generates functional maps of a PPI at different levels of organization, from broad process-process level interactions to in-depth complex-complex level interactions, through a pro t maximization approach that exploits Minimum Description Length (MDL) principle to maximize information gain of the summary graph while satisfying the level of detail constraint. We evaluate the performance of FUSE on several real-world PPIS. We also compare FUSE to state-of-the-art graph clustering methods with GO term enrichment by constructing the biological process landscape of the PPIS. Using AD network as our case study, we further demonstrate the ability of FUSE to quickly summarize the network and identify many different processes and complexes that regulate it. Finally, we study the higher-order connectivity of the human PPI. By simultaneously evaluating interaction and annotation data, FUSE abstracts higher-order interaction maps by reducing the details of the underlying PPI to form a functional summary graph of interconnected functional clusters. Our results demonstrate its effectiveness and superiority over state-of-the-art graph clustering methods with GO term enrichment.
A cluster of measles linked to an imported case, Finland, 2017.

PubMed

Seppälä, Elina; Zöldi, Viktor; Vuorinen, Sakari; Murtopuro, Satu; Elonsalo, Ulpu; van Beek, Janko; Haveri, Anu; Kontio, Mia; Savolainen-Kopra, Carita; Puumalainen, Taneli; Sane, Jussi

2017-08-17

One imported and five secondary cases of measles were detected in Finland between June and August 2017. The measles sequences available for five laboratory-confirmed cases were identical and belonged to serotype D8. The large number of potentially exposed Finnish and foreign individuals called for close cooperation of national and international public health authorities and other stakeholders. Raising awareness among healthcare providers and ensuring universally high vaccination coverage is crucial to prevent future clusters and outbreaks. This article is copyright of The Authors, 2017.
3D Viewer Platform of Cloud Clustering Management System: Google Map 3D

NASA Astrophysics Data System (ADS)

Choi, Sung-Ja; Lee, Gang-Soo

The new management system of framework for cloud envrionemnt is needed by the platfrom of convergence according to computing environments of changes. A ISV and small business model is hard to adapt management system of platform which is offered from super business. This article suggest the clustering management system of cloud computing envirionments for ISV and a man of enterprise in small business model. It applies the 3D viewer adapt from map3D & earth of google. It is called 3DV_CCMS as expand the CCMS[1].
Clustered regularly interspaced short palindromic repeats (CRISPRs) for the genotyping of bacterial pathogens.

PubMed

Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine

2009-01-01

Clustered regularly interspaced short palindromic repeats (CRISPRs) are DNA sequences composed of a succession of repeats (23- to 47-bp long) separated by unique sequences called spacers. Polymorphism can be observed in different strains of a species and may be used for genotyping. We describe protocols and bioinformatics tools that allow the identification of CRISPRs from sequenced genomes, their comparison, and their component determination (the direct repeats and the spacers). A schematic representation of the spacer organization can be produced, allowing an easy comparison between strains.
Omega Centauri Looks Radiant in Infrared

NASA Technical Reports Server (NTRS)

2008-01-01

[figure removed for brevity, see original site] Poster Version
A cluster brimming with millions of stars glistens like an iridescent opal in this image from NASA's Spitzer Space Telescope. Called Omega Centauri, the sparkling orb of stars is like a miniature galaxy. It is the biggest and brightest of the 150 or so similar objects, called globular clusters, that orbit around the outside of our Milky Way galaxy. Stargazers at southern latitudes can spot the stellar gem with the naked eye in the constellation Centaurus.
Globular clusters are some of the oldest objects in our universe. Their stars are over 12 billion years old, and, in most cases, formed all at once when the universe was just a toddler. Omega Centauri is unusual in that its stars are of different ages and possess varying levels of metals, or elements heavier than boron. Astronomers say this points to a different origin for Omega Centauri than other globular clusters: they think it might be the core of a dwarf galaxy that was ripped apart and absorbed by our Milky Way long ago.
In this new view of Omega Centauri, Spitzer's infrared observations have been combined with visible-light data from the National Science Foundation's Blanco 4-meter telescope at Cerro Tololo Inter-American Observatory in Chile. Visible-light data with a wavelength of .55 microns is colored blue, 3.6-micron infrared light captured by Spitzer's infrared array camera is colored green and 24-micron infrared light taken by Spitzer's multiband imaging photometer is colored red.
Where green and red overlap, the color yellow appears. Thus, the yellow and red dots are stars revealed by Spitzer. These stars, called red giants, are more evolved, larger and dustier. The stars that appear blue were spotted in both visible and 3.6-micron-, or near-, infrared light. They are less evolved, like our own sun. Some of the red spots in the picture are distant galaxies beyond our own.
Spitzer found very little dust around any but the most luminous, coolest red giants, implying that the dimmer red giants do not form significant amounts of dust. The space between the stars in Omega Centauri was also found to lack dust, which means the dust is rapidly destroyed or leaves the cluster.
An improved clustering algorithm based on reverse learning in intelligent transportation

NASA Astrophysics Data System (ADS)

Qiu, Guoqing; Kou, Qianqian; Niu, Ting

2017-05-01

With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
Relics in galaxy clusters at high radio frequencies

NASA Astrophysics Data System (ADS)

Kierdorf, M.; Beck, R.; Hoeft, M.; Klein, U.; van Weeren, R. J.; Forman, W. R.; Jones, C.

2017-04-01

Aims: We investigated the magnetic properties of radio relics located at the peripheries of galaxy clusters at high radio frequencies, where the emission is expected to be free of Faraday depolarization. The degree of polarization is a measure of the magnetic field compression and, hence, the Mach number. Polarization observations can also be used to confirm relic candidates. Methods: We observed three radio relics in galaxy clusters and one radio relic candidate at 4.85 and 8.35 GHz in total emission and linearly polarized emission with the Effelsberg 100-m telescope. In addition, we observed one radio relic candidate in X-rays with the Chandra telescope. We derived maps of polarization angle, polarization degree, and Faraday rotation measures. Results: The radio spectra of the integrated emission below 8.35 GHz can be well fitted by single power laws for all four relics. The flat spectra (spectral indices of 0.9 and 1.0) for the so-called Sausage relic in cluster CIZA J2242+53 and the so-called Toothbrush relic in cluster 1RXS 06+42 indicate that models describing the origin of relics have to include effects beyond the assumptions of diffuse shock acceleration. The spectra of the radio relics in ZwCl 0008+52 and in Abell 1612 are steep, as expected from weak shocks (Mach number ≈2.4). Polarization observations of radio relics offer a method of measuring the strength and geometry of the shock front. We find polarization degrees of more than 50% in the two prominent Mpc-sized radio relics, the Sausage and the Toothbrush, which are among the highest percentages of linear polarization detected in any extragalactic radio source to date. This is remarkable because the large beam size of the Effelsberg single-dish telescope corresponds to linear extensions of about 300 kpc at 8.35 GHz at the distances of the relics. The high degree of polarization indicates that the magnetic field vectors are almost perfectly aligned along the relic structure, as expected for shock fronts that are observed edge-on. The polarization degrees correspond to Mach numbers of >2.2. Polarized emission is also detected in the radio relics in ZwCl 0008+52 and, for the first time, in Abell 1612. The smaller sizes and lower degrees of polarizations of the latter relics indicate a weaker shock and/or an inclination between the relic and the sky plane. Abell 1612 shows a complex X-ray surface brightness distribution, indicating a recent major merger and supporting the classification of the radio emission as a radio relic. In our cluster sample, no wavelength-dependent Faraday depolarization is detected between 4.85 GHz and 8.35 GHz, except for one component of the Toothbrush relic. Faraday depolarization between 1.38 GHz and 8.35 GHz varies with distance from the center of the host cluster 1RXS 06+42, which can be explained by a decrease in electron density and/or in strength of a turbulent (or tangled) magnetic field. Faraday rotation measures show large-scale gradients along the relics, which cannot be explained by variations in the Milky Way foreground. Conclusions: Single-dish telescopes are ideal tools to confirm relic candidates and search for new relic candidates. Measurement of the wavelength-dependent depolarization along the Toothbrush relic shows that the electron density of the intra-cluster medium (ICM) and strength of the tangled magnetic field decrease with distance from the center of the foreground cluster. Large-scale regular fields appear to be present in intergalactic space around galaxy clusters. Based on observations with the 100-m telescope at Effelsberg, operated by the Max-Planck-Institut für Radioastronomie (MPIfR) on behalf of the Max-Planck-Gesellschaft.The reduced Stokes parameter images (FITS files) are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/600/A18
Multiconstrained gene clustering based on generalized projections

PubMed Central

2010-01-01

Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386
PROSPECT improves cis-acting regulatory element prediction by integrating expression profile data with consensus pattern searches

PubMed Central

Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David

2001-01-01

Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681
Multistrategy Self-Organizing Map Learning for Classification Problems

PubMed Central

Hasan, S.; Shamsuddin, S. M.

2011-01-01

Multistrategy Learning of Self-Organizing Map (SOM) and Particle Swarm Optimization (PSO) is commonly implemented in clustering domain due to its capabilities in handling complex data characteristics. However, some of these multistrategy learning architectures have weaknesses such as slow convergence time always being trapped in the local minima. This paper proposes multistrategy learning of SOM lattice structure with Particle Swarm Optimisation which is called ESOMPSO for solving various classification problems. The enhancement of SOM lattice structure is implemented by introducing a new hexagon formulation for better mapping quality in data classification and labeling. The weights of the enhanced SOM are optimised using PSO to obtain better output quality. The proposed method has been tested on various standard datasets with substantial comparisons with existing SOM network and various distance measurement. The results show that our proposed method yields a promising result with better average accuracy and quantisation errors compared to the other methods as well as convincing significant test. PMID:21876686

Some links on this page may take you to non-federal websites. Their policies may differ from this site.