Sample records for model based clustering

  1. An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.

    PubMed

    Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei

    2013-05-01

    Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.

  2. A user credit assessment model based on clustering ensemble for broadband network new media service supervision

    NASA Astrophysics Data System (ADS)

    Liu, Fang; Cao, San-xing; Lu, Rui

    2012-04-01

    This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.

  3. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  4. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2011-01-01

    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  5. The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

    NASA Astrophysics Data System (ADS)

    Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

    2017-05-01

    Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.

  6. Double Cluster Heads Model for Secure and Accurate Data Fusion in Wireless Sensor Networks

    PubMed Central

    Fu, Jun-Song; Liu, Yun

    2015-01-01

    Secure and accurate data fusion is an important issue in wireless sensor networks (WSNs) and has been extensively researched in the literature. In this paper, by combining clustering techniques, reputation and trust systems, and data fusion algorithms, we propose a novel cluster-based data fusion model called Double Cluster Heads Model (DCHM) for secure and accurate data fusion in WSNs. Different from traditional clustering models in WSNs, two cluster heads are selected after clustering for each cluster based on the reputation and trust system and they perform data fusion independently of each other. Then, the results are sent to the base station where the dissimilarity coefficient is computed. If the dissimilarity coefficient of the two data fusion results exceeds the threshold preset by the users, the cluster heads will be added to blacklist, and the cluster heads must be reelected by the sensor nodes in a cluster. Meanwhile, feedback is sent from the base station to the reputation and trust system, which can help us to identify and delete the compromised sensor nodes in time. Through a series of extensive simulations, we found that the DCHM performed very well in data fusion security and accuracy. PMID:25608211

  7. Inference from clustering with application to gene-expression microarrays.

    PubMed

    Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M

    2002-01-01

    There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.

  8. Finding gene clusters for a replicated time course study

    PubMed Central

    2014-01-01

    Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656

  9. Model-based clustering for RNA-seq data.

    PubMed

    Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P

    2014-01-15

    RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org

  10. Network-based spatial clustering technique for exploring features in regional industry

    NASA Astrophysics Data System (ADS)

    Chou, Tien-Yin; Huang, Pi-Hui; Yang, Lung-Shih; Lin, Wen-Tzu

    2008-10-01

    In the past researches, industrial cluster mainly focused on single or particular industry and less on spatial industrial structure and mutual relations. Industrial cluster could generate three kinds of spillover effects, including knowledge, labor market pooling, and input sharing. In addition, industrial cluster indeed benefits industry development. To fully control the status and characteristics of district industrial cluster can facilitate to improve the competitive ascendancy of district industry. The related researches on industrial spatial cluster were of great significance for setting up industrial policies and promoting district economic development. In this study, an improved model, GeoSOM, that combines DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and SOM (Self-Organizing Map) was developed for analyzing industrial cluster. Different from former distance-based algorithm for industrial cluster, the proposed GeoSOM model can calculate spatial characteristics between firms based on DBSCAN algorithm and evaluate the similarity between firms based on SOM clustering analysis. The demonstrative data sets, the manufacturers around Taichung County in Taiwan, were analyzed for verifying the practicability of the proposed model. The analyzed results indicate that GeoSOM is suitable for evaluating spatial industrial cluster.

  11. [Predicting Incidence of Hepatitis E in Chinausing Fuzzy Time Series Based on Fuzzy C-Means Clustering Analysis].

    PubMed

    Luo, Yi; Zhang, Tao; Li, Xiao-song

    2016-05-01

    To explore the application of fuzzy time series model based on fuzzy c-means clustering in forecasting monthly incidence of Hepatitis E in mainland China. Apredictive model (fuzzy time series method based on fuzzy c-means clustering) was developed using Hepatitis E incidence data in mainland China between January 2004 and July 2014. The incidence datafrom August 2014 to November 2014 were used to test the fitness of the predictive model. The forecasting results were compared with those resulted from traditional fuzzy time series models. The fuzzy time series model based on fuzzy c-means clustering had 0.001 1 mean squared error (MSE) of fitting and 6.977 5 x 10⁻⁴ MSE of forecasting, compared with 0.0017 and 0.0014 from the traditional forecasting model. The results indicate that the fuzzy time series model based on fuzzy c-means clustering has a better performance in forecasting incidence of Hepatitis E.

  12. Parameters of oscillation generation regions in open star cluster models

    NASA Astrophysics Data System (ADS)

    Danilov, V. M.; Putkov, S. I.

    2017-07-01

    We determine the masses and radii of central regions of open star cluster (OCL) models with small or zero entropy production and estimate the masses of oscillation generation regions in clustermodels based on the data of the phase-space coordinates of stars. The radii of such regions are close to the core radii of the OCL models. We develop a new method for estimating the total OCL masses based on the cluster core mass, the cluster and cluster core radii, and radial distribution of stars. This method yields estimates of dynamical masses of Pleiades, Praesepe, and M67, which agree well with the estimates of the total masses of the corresponding clusters based on proper motions and spectroscopic data for cluster stars.We construct the spectra and dispersion curves of the oscillations of the field of azimuthal velocities v φ in OCL models. Weak, low-amplitude unstable oscillations of v φ develop in cluster models near the cluster core boundary, and weak damped oscillations of v φ often develop at frequencies close to the frequencies of more powerful oscillations, which may reduce the non-stationarity degree in OCL models. We determine the number and parameters of such oscillations near the cores boundaries of cluster models. Such oscillations points to the possible role that gradient instability near the core of cluster models plays in the decrease of the mass of the oscillation generation regions and production of entropy in the cores of OCL models with massive extended cores.

  13. Clustering of change patterns using Fourier coefficients.

    PubMed

    Kim, Jaehee; Kim, Haseong

    2008-01-15

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.

  14. Cluster-based analysis of multi-model climate ensembles

    NASA Astrophysics Data System (ADS)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and useful framework in which to assess and visualise model spread, offering insight into geographical areas of agreement among models and a measure of diversity across an ensemble. Finally, we discuss caveats of the clustering techniques and note that while we have focused on tropospheric ozone, the principles underlying the cluster-based MMMs are applicable to other prognostic variables from climate models.

  15. Density-based cluster algorithms for the identification of core sets

    NASA Astrophysics Data System (ADS)

    Lemke, Oliver; Keller, Bettina G.

    2016-10-01

    The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.

  16. Cluster-specific small airway modeling for imaging-based CFD analysis of pulmonary air flow and particle deposition in COPD smokers

    NASA Astrophysics Data System (ADS)

    Haghighi, Babak; Choi, Jiwoong; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long

    2017-11-01

    Accurate modeling of small airway diameters in patients with chronic obstructive pulmonary disease (COPD) is a crucial step toward patient-specific CFD simulations of regional airflow and particle transport. We proposed to use computed tomography (CT) imaging-based cluster membership to identify structural characteristics of airways in each cluster and use them to develop cluster-specific airway diameter models. We analyzed 284 COPD smokers with airflow limitation, and 69 healthy controls. We used multiscale imaging-based cluster analysis (MICA) to classify smokers into 4 clusters. With representative cluster patients and healthy controls, we performed multiple regressions to quantify variation of airway diameters by generation as well as by cluster. The cluster 2 and 4 showed more diameter decrease as generation increases than other clusters. The cluster 4 had more rapid decreases of airway diameters in the upper lobes, while cluster 2 in the lower lobes. We then used these regression models to estimate airway diameters in CT unresolved regions to obtain pressure-volume hysteresis curves using a 1D resistance model. These 1D flow solutions can be used to provide the patient-specific boundary conditions for 3D CFD simulations in COPD patients. Support for this study was provided, in part, by NIH Grants U01-HL114494, R01-HL112986 and S10-RR022421.

  17. Study of clusters and hypernuclei production within PHSD+FRIGA model

    NASA Astrophysics Data System (ADS)

    Kireyeu, Viktar; Le Fèvre, Arnaud; Bratkovskaya, Elena

    2017-03-01

    We report on the results on the dynamical modelling of cluster formation with the new combined PHSD+FRIGA model at Nuclotron and NICA energies. The FRIGA clusterization algorithm, which can be applied to the transport models, is based on the simulated annealing technique to obtain the most bound configuration of fragments and nucleons. The PHSD+FRIGA model is able to predict isotope yields as well as hypernucleus production. Based on present predictions of the combined model we study the possibility to detect such clusters and hypernuclei in the BM@N and MPD/NICA detectors.

  18. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    PubMed

    Liu, Yuanchao; Liu, Ming; Wang, Xin

    2015-01-01

    The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  19. Towards Semantically Sensitive Text Clustering: A Feature Space Modeling Technology Based on Dimension Extension

    PubMed Central

    Liu, Yuanchao; Liu, Ming; Wang, Xin

    2015-01-01

    The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach. PMID:25794172

  20. Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…

  1. Clustering of financial time series

    NASA Astrophysics Data System (ADS)

    D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo

    2013-05-01

    This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.

  2. MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS

    EPA Science Inventory

    Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...

  3. Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

    PubMed

    Xiao, Yongling; Abrahamowicz, Michal

    2010-03-30

    We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.

  4. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  5. Markov Chain Model-Based Optimal Cluster Heads Selection for Wireless Sensor Networks

    PubMed Central

    Ahmed, Gulnaz; Zou, Jianhua; Zhao, Xi; Sadiq Fareed, Mian Muhammad

    2017-01-01

    The longer network lifetime of Wireless Sensor Networks (WSNs) is a goal which is directly related to energy consumption. This energy consumption issue becomes more challenging when the energy load is not properly distributed in the sensing area. The hierarchal clustering architecture is the best choice for these kind of issues. In this paper, we introduce a novel clustering protocol called Markov chain model-based optimal cluster heads (MOCHs) selection for WSNs. In our proposed model, we introduce a simple strategy for the optimal number of cluster heads selection to overcome the problem of uneven energy distribution in the network. The attractiveness of our model is that the BS controls the number of cluster heads while the cluster heads control the cluster members in each cluster in such a restricted manner that a uniform and even load is ensured in each cluster. We perform an extensive range of simulation using five quality measures, namely: the lifetime of the network, stable and unstable region in the lifetime of the network, throughput of the network, the number of cluster heads in the network, and the transmission time of the network to analyze the proposed model. We compare MOCHs against Sleep-awake Energy Efficient Distributed (SEED) clustering, Artificial Bee Colony (ABC), Zone Based Routing (ZBR), and Centralized Energy Efficient Clustering (CEEC) using the above-discussed quality metrics and found that the lifetime of the proposed model is almost 1095, 2630, 3599, and 2045 rounds (time steps) greater than SEED, ABC, ZBR, and CEEC, respectively. The obtained results demonstrate that the MOCHs is better than SEED, ABC, ZBR, and CEEC in terms of energy efficiency and the network throughput. PMID:28241492

  6. DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

    PubMed

    Sun, Zhe; Wang, Ting; Deng, Ke; Wang, Xiao-Feng; Lafyatis, Robert; Ding, Ying; Hu, Ming; Chen, Wei

    2018-01-01

    Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods. DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html. wei.chen@chp.edu or hum@ccf.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

    PubMed

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-03-13

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  8. Model-based Clustering of Categorical Time Series with Multinomial Logit Classification

    NASA Astrophysics Data System (ADS)

    Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea

    2010-09-01

    A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.

  9. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    NASA Astrophysics Data System (ADS)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  10. Using experimental data to test an n -body dynamical model coupled with an energy-based clusterization algorithm at low incident energies

    NASA Astrophysics Data System (ADS)

    Kumar, Rohit; Puri, Rajeev K.

    2018-03-01

    Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.

  11. Local-world and cluster-growing weighted networks with controllable clustering

    NASA Astrophysics Data System (ADS)

    Yang, Chun-Xia; Tang, Min-Xuan; Tang, Hai-Qiang; Deng, Qiang-Qiang

    2014-12-01

    We constructed an improved weighted network model by introducing local-world selection mechanism and triangle coupling mechanism based on the traditional BBV model. The model gives power-law distributions of degree, strength and edge weight and presents the linear relationship both between the degree and strength and between the degree and the clustering coefficient. Particularly, the model is equipped with an ability to accelerate the speed increase of strength exceeding that of degree. Besides, the model is more sound and efficient in tuning clustering coefficient than the original BBV model. Finally, based on our improved model, we analyze the virus spread process and find that reducing the size of local-world has a great inhibited effect on virus spread.

  12. A hybrid algorithm for clustering of time series data based on affinity search technique.

    PubMed

    Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza

    2014-01-01

    Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.

  13. A Model-Based Cluster Analysis of Maternal Emotion Regulation and Relations to Parenting Behavior.

    PubMed

    Shaffer, Anne; Whitehead, Monica; Davis, Molly; Morelen, Diana; Suveg, Cynthia

    2017-10-15

    In a diverse community sample of mothers (N = 108) and their preschool-aged children (M age  = 3.50 years), this study conducted person-oriented analyses of maternal emotion regulation (ER) based on a multimethod assessment incorporating physiological, observational, and self-report indicators. A model-based cluster analysis was applied to five indicators of maternal ER: maternal self-report, observed negative affect in a parent-child interaction, baseline respiratory sinus arrhythmia (RSA), and RSA suppression across two laboratory tasks. Model-based cluster analyses revealed four maternal ER profiles, including a group of mothers with average ER functioning, characterized by socioeconomic advantage and more positive parenting behavior. A dysregulated cluster demonstrated the greatest challenges with parenting and dyadic interactions. Two clusters of intermediate dysregulation were also identified. Implications for assessment and applications to parenting interventions are discussed. © 2017 Family Process Institute.

  14. A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique

    PubMed Central

    Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza

    2014-01-01

    Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966

  15. Cluster management.

    PubMed

    Katz, R

    1992-11-01

    Cluster management is a management model that fosters decentralization of management, develops leadership potential of staff, and creates ownership of unit-based goals. Unlike shared governance models, there is no formal structure created by committees and it is less threatening for managers. There are two parts to the cluster management model. One is the formation of cluster groups, consisting of all staff and facilitated by a cluster leader. The cluster groups function for communication and problem-solving. The second part of the cluster management model is the creation of task forces. These task forces are designed to work on short-term goals, usually in response to solving one of the unit's goals. Sometimes the task forces are used for quality improvement or system problems. Clusters are groups of not more than five or six staff members, facilitated by a cluster leader. A cluster is made up of individuals who work the same shift. For example, people with job titles who work days would be in a cluster. There would be registered nurses, licensed practical nurses, nursing assistants, and unit clerks in the cluster. The cluster leader is chosen by the manager based on certain criteria and is trained for this specialized role. The concept of cluster management, criteria for choosing leaders, training for leaders, using cluster groups to solve quality improvement issues, and the learning process necessary for manager support are described.

  16. Clustering-based classification of road traffic accidents using hierarchical clustering and artificial neural networks.

    PubMed

    Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf

    2017-09-01

    Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.

  17. Regional SAR Image Segmentation Based on Fuzzy Clustering with Gamma Mixture Model

    NASA Astrophysics Data System (ADS)

    Li, X. L.; Zhao, Q. H.; Li, Y.

    2017-09-01

    Most of stochastic based fuzzy clustering algorithms are pixel-based, which can not effectively overcome the inherent speckle noise in SAR images. In order to deal with the problem, a regional SAR image segmentation algorithm based on fuzzy clustering with Gamma mixture model is proposed in this paper. First, initialize some generating points randomly on the image, the image domain is divided into many sub-regions using Voronoi tessellation technique. Each sub-region is regarded as a homogeneous area in which the pixels share the same cluster label. Then, assume the probability of the pixel to be a Gamma mixture model with the parameters respecting to the cluster which the pixel belongs to. The negative logarithm of the probability represents the dissimilarity measure between the pixel and the cluster. The regional dissimilarity measure of one sub-region is defined as the sum of the measures of pixels in the region. Furthermore, the Markov Random Field (MRF) model is extended from pixels level to Voronoi sub-regions, and then the regional objective function is established under the framework of fuzzy clustering. The optimal segmentation results can be obtained by the solution of model parameters and generating points. Finally, the effectiveness of the proposed algorithm can be proved by the qualitative and quantitative analysis from the segmentation results of the simulated and real SAR images.

  18. Physical model of protein cluster positioning in growing bacteria

    NASA Astrophysics Data System (ADS)

    Wasnik, Vaibhav; Wang, Hui; Wingreen, Ned S.; Mukhopadhyay, Ranjan

    2017-10-01

    Chemotaxic receptors in bacteria form clusters at cell poles and also laterally, and this clustering plays an important role in signal transduction. These clusters were found to be periodically arranged on the surface of the bacterium Escherichia coli, independent of any known positioning mechanism. In this work we extend a model based on diffusion and aggregation to more realistic geometries and present a means based on ‘bursty’ protein production to distinguish spontaneous positioning from an independently existing positioning mechanism. We also consider the case of isotropic cellular growth and characterize the degree of order arising spontaneously. Our model could also be relevant for other examples of periodically positioned protein clusters in bacteria.

  19. Application of clustering analysis in the prediction of photovoltaic power generation based on neural network

    NASA Astrophysics Data System (ADS)

    Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.

    2017-11-01

    In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.

  20. The galaxy clustering crisis in abundance matching

    NASA Astrophysics Data System (ADS)

    Campbell, Duncan; van den Bosch, Frank C.; Padmanabhan, Nikhil; Mao, Yao-Yuan; Zentner, Andrew R.; Lange, Johannes U.; Jiang, Fangzhou; Villarreal, Antonio

    2018-06-01

    Galaxy clustering on small scales is significantly underpredicted by sub-halo abundance matching (SHAM) models that populate (sub-)haloes with galaxies based on peak halo mass, Mpeak. SHAM models based on the peak maximum circular velocity, Vpeak, have had much better success. The primary reason for Mpeak-based models fail is the relatively low abundance of satellite galaxies produced in these models compared to those based on Vpeak. Despite success in predicting clustering, a simple Vpeak-based SHAM model results in predictions for galaxy growth that are at odds with observations. We evaluate three possible remedies that could `save' mass-based SHAM: (1) SHAM models require a significant population of `orphan' galaxies as a result of artificial disruption/merging of sub-haloes in modern high-resolution dark matter simulations; (2) satellites must grow significantly after their accretion; and (3) stellar mass is significantly affected by halo assembly history. No solution is entirely satisfactory. However, regardless of the particulars, we show that popular SHAM models based on Mpeak cannot be complete physical models as presented. Either Vpeak truly is a better predictor of stellar mass at z ˜ 0 and it remains to be seen how the correlation between stellar mass and Vpeak comes about, or SHAM models are missing vital component(s) that significantly affect galaxy clustering.

  1. Hierarchical modeling of cluster size in wildlife surveys

    USGS Publications Warehouse

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  2. Finding Groups Using Model-based Cluster Analysis: Heterogeneous Emotional Self-regulatory Processes and Heavy Alcohol Use Risk

    PubMed Central

    Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2010-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138

  3. A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison

    PubMed Central

    Pfeiffenberger, Erik; Chaleil, Raphael A.G.; Moal, Iain H.

    2017-01-01

    ABSTRACT Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc. PMID:27935158

  4. Study of Clusters and Hypernuclei production within PHSD+FRIGA model

    NASA Astrophysics Data System (ADS)

    Kireyeu, V.; Le Fèvre, A.; Bratkovskaya, E.

    2017-01-01

    We report on the results on the dynamical modelling of cluster formation with the new combined PHSD+FRIGA model at Nuclotron and NICA energies. The FRIGA clusterisation algorithm, which can be applied to the transport models, is based on the simulated annealing technique to obtain the most bound configuration of fragments and nucleons. The PHSD+FRIGA model is able to predict isotope yields as well as hyper-nucleus production. Based on present predictions of the combined model we study the possibility to detect such clusters and hypernuclei in the BM@N and MPD/NICA detectors.

  5. Recognition of genetically modified product based on affinity propagation clustering and terahertz spectroscopy

    NASA Astrophysics Data System (ADS)

    Liu, Jianjun; Kan, Jianquan

    2018-04-01

    In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.

  6. Clustering change patterns using Fourier transformation with time-course gene expression data.

    PubMed

    Kim, Jaehee

    2011-01-01

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.

  7. An Efficient Data Compression Model Based on Spatial Clustering and Principal Component Analysis in Wireless Sensor Networks.

    PubMed

    Yin, Yihang; Liu, Fengzheng; Zhou, Xiang; Li, Quanzhong

    2015-08-07

    Wireless sensor networks (WSNs) have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA). First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.

  8. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    PubMed

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  9. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

    PubMed Central

    Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392

  10. Information Clustering Based on Fuzzy Multisets.

    ERIC Educational Resources Information Center

    Miyamoto, Sadaaki

    2003-01-01

    Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…

  11. MOCCA-SURVEY Database I: Is NGC 6535 a dark star cluster harbouring an IMBH?

    NASA Astrophysics Data System (ADS)

    Askar, Abbas; Bianchini, Paolo; de Vita, Ruggero; Giersz, Mirek; Hypki, Arkadiusz; Kamann, Sebastian

    2017-01-01

    We describe the dynamical evolution of a unique type of dark star cluster model in which the majority of the cluster mass at Hubble time is dominated by an intermediate-mass black hole (IMBH). We analysed results from about 2000 star cluster models (Survey Database I) simulated using the Monte Carlo code MOnte Carlo Cluster simulAtor and identified these dark star cluster models. Taking one of these models, we apply the method of simulating realistic `mock observations' by utilizing the Cluster simulatiOn Comparison with ObservAtions (COCOA) and Simulating Stellar Cluster Observation (SISCO) codes to obtain the photometric and kinematic observational properties of the dark star cluster model at 12 Gyr. We find that the perplexing Galactic globular cluster NGC 6535 closely matches the observational photometric and kinematic properties of the dark star cluster model presented in this paper. Based on our analysis and currently observed properties of NGC 6535, we suggest that this globular cluster could potentially harbour an IMBH. If it exists, the presence of this IMBH can be detected robustly with proposed kinematic observations of NGC 6535.

  12. Clustering of longitudinal data by using an extended baseline: A new method for treatment efficacy clustering in longitudinal data.

    PubMed

    Schramm, Catherine; Vial, Céline; Bachoud-Lévi, Anne-Catherine; Katsahian, Sandrine

    2018-01-01

    Heterogeneity in treatment efficacy is a major concern in clinical trials. Clustering may help to identify the treatment responders and the non-responders. In the context of longitudinal cluster analyses, sample size and variability of the times of measurements are the main issues with the current methods. Here, we propose a new two-step method for the Clustering of Longitudinal data by using an Extended Baseline. The first step relies on a piecewise linear mixed model for repeated measurements with a treatment-time interaction. The second step clusters the random predictions and considers several parametric (model-based) and non-parametric (partitioning, ascendant hierarchical clustering) algorithms. A simulation study compares all options of the clustering of longitudinal data by using an extended baseline method with the latent-class mixed model. The clustering of longitudinal data by using an extended baseline method with the two model-based algorithms was the more robust model. The clustering of longitudinal data by using an extended baseline method with all the non-parametric algorithms failed when there were unequal variances of treatment effect between clusters or when the subgroups had unbalanced sample sizes. The latent-class mixed model failed when the between-patients slope variability is high. Two real data sets on neurodegenerative disease and on obesity illustrate the clustering of longitudinal data by using an extended baseline method and show how clustering may help to identify the marker(s) of the treatment response. The application of the clustering of longitudinal data by using an extended baseline method in exploratory analysis as the first stage before setting up stratified designs can provide a better estimation of treatment effect in future clinical trials.

  13. Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.

    PubMed

    Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai

    2016-03-01

    Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.

  14. CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS

    PubMed Central

    McParland, Damien; Gormley, Isobel Claire; McCormick, Tyler H.; Clark, Samuel J.; Kabudula, Chodziwadziwa Whiteson; Collinson, Mark A.

    2014-01-01

    The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region. PMID:25485026

  15. Distributive Education Competency-Based Curriculum Models by Occupational Clusters. Final Report.

    ERIC Educational Resources Information Center

    Davis, Rodney E.; Husted, Stewart W.

    To meet the needs of distributive education teachers and students, a project was initiated to develop competency-based curriculum models for marketing and distributive education clusters. The models which were developed incorporate competencies, materials and resources, teaching methodologies/learning activities, and evaluative criteria for the…

  16. A Symmetric Time-Varying Cluster Rate of Descent Model

    NASA Technical Reports Server (NTRS)

    Ray, Eric S.

    2015-01-01

    A model of the time-varying rate of descent of the Orion vehicle was developed based on the observed correlation between canopy projected area and drag coefficient. This initial version of the model assumes cluster symmetry and only varies the vertical component of velocity. The cluster fly-out angle is modeled as a series of sine waves based on flight test data. The projected area of each canopy is synchronized with the primary fly-out angle mode. The sudden loss of projected area during canopy collisions is modeled at minimum fly-out angles, leading to brief increases in rate of descent. The cluster geometry is converted to drag coefficient using empirically derived constants. A more complete model is under development, which computes the aerodynamic response of each canopy to its local incidence angle.

  17. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  18. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  19. Ontology-based topic clustering for online discussion data

    NASA Astrophysics Data System (ADS)

    Wang, Yongheng; Cao, Kening; Zhang, Xiaoming

    2013-03-01

    With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.

  20. Price Formation Based on Particle-Cluster Aggregation

    NASA Astrophysics Data System (ADS)

    Wang, Shijun; Zhang, Changshui

    In the present work, we propose a microscopic model of financial markets based on particle-cluster aggregation on a two-dimensional small-world information network in order to simulate the dynamics of the stock markets. "Stylized facts" of the financial market time series, such as fat-tail distribution of returns, volatility clustering and multifractality, are observed in the model. The results of the model agree with empirical data taken from historical records of the daily closures of the NYSE composite index.

  1. Optimal Partitioning of a Data Set Based on the "p"-Median Model

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Kohn, Hans-Friedrich

    2008-01-01

    Although the "K"-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The "p"-median model is an especially well-studied clustering problem that requires the selection of "p" objects to serve as…

  2. A roadmap of clustering algorithms: finding a match for a biomedical application.

    PubMed

    Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

    2009-05-01

    Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

  3. Hierarchical Dirichlet process model for gene expression clustering

    PubMed Central

    2013-01-01

    Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447

  4. A Novel Wireless Power Transfer-Based Weighed Clustering Cooperative Spectrum Sensing Method for Cognitive Sensor Networks.

    PubMed

    Liu, Xin

    2015-10-30

    In a cognitive sensor network (CSN), the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs) becomes very large. In this paper, a novel wireless power transfer (WPT)-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio frequency (RF) energy of the primary node (PN) to the cluster heads, in order to supply the electrical energy needed for sensing and cooperation. A joint resource optimization is formulated to maximize the spectrum access probability of the CSN, through jointly allocating sensing time and clustering number. According to the resource optimization results, a clustering algorithm is proposed. The simulation results have shown that compared to the traditional model, the cluster heads of the proposed model can achieve more transmission power and there exists optimal sensing time and clustering number to maximize the spectrum access probability.

  5. Reconstruction of a digital core containing clay minerals based on a clustering algorithm.

    PubMed

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  6. Online clustering algorithms for radar emitter classification.

    PubMed

    Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max

    2005-08-01

    Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.

  7. Automatic pole-like object modeling via 3D part-based analysis of point cloud

    NASA Astrophysics Data System (ADS)

    He, Liu; Yang, Haoxiang; Huang, Yuchun

    2016-10-01

    Pole-like objects, including trees, lampposts and traffic signs, are indispensable part of urban infrastructure. With the advance of vehicle-based laser scanning (VLS), massive point cloud of roadside urban areas becomes applied in 3D digital city modeling. Based on the property that different pole-like objects have various canopy parts and similar trunk parts, this paper proposed the 3D part-based shape analysis to robustly extract, identify and model the pole-like objects. The proposed method includes: 3D clustering and recognition of trunks, voxel growing and part-based 3D modeling. After preprocessing, the trunk center is identified as the point that has local density peak and the largest minimum inter-cluster distance. Starting from the trunk centers, the remaining points are iteratively clustered to the same centers of their nearest point with higher density. To eliminate the noisy points, cluster border is refined by trimming boundary outliers. Then, candidate trunks are extracted based on the clustering results in three orthogonal planes by shape analysis. Voxel growing obtains the completed pole-like objects regardless of overlaying. Finally, entire trunk, branch and crown part are analyzed to obtain seven feature parameters. These parameters are utilized to model three parts respectively and get signal part-assembled 3D model. The proposed method is tested using the VLS-based point cloud of Wuhan University, China. The point cloud includes many kinds of trees, lampposts and other pole-like posters under different occlusions and overlaying. Experimental results show that the proposed method can extract the exact attributes and model the roadside pole-like objects efficiently.

  8. Cluster analysis of dynamic contrast enhanced MRI reveals tumor subregions related to locoregional relapse for cervical cancer patients.

    PubMed

    Torheim, Turid; Groendahl, Aurora R; Andersen, Erlend K F; Lyng, Heidi; Malinen, Eirik; Kvaal, Knut; Futsaether, Cecilia M

    2016-11-01

    Solid tumors are known to be spatially heterogeneous. Detection of treatment-resistant tumor regions can improve clinical outcome, by enabling implementation of strategies targeting such regions. In this study, K-means clustering was used to group voxels in dynamic contrast enhanced magnetic resonance images (DCE-MRI) of cervical cancers. The aim was to identify clusters reflecting treatment resistance that could be used for targeted radiotherapy with a dose-painting approach. Eighty-one patients with locally advanced cervical cancer underwent DCE-MRI prior to chemoradiotherapy. The resulting image time series were fitted to two pharmacokinetic models, the Tofts model (yielding parameters K trans and ν e ) and the Brix model (A Brix , k ep and k el ). K-means clustering was used to group similar voxels based on either the pharmacokinetic parameter maps or the relative signal increase (RSI) time series. The associations between voxel clusters and treatment outcome (measured as locoregional control) were evaluated using the volume fraction or the spatial distribution of each cluster. One voxel cluster based on the RSI time series was significantly related to locoregional control (adjusted p-value 0.048). This cluster consisted of low-enhancing voxels. We found that tumors with poor prognosis had this RSI-based cluster gathered into few patches, making this cluster a potential candidate for targeted radiotherapy. None of the voxels clusters based on Tofts or Brix parameter maps were significantly related to treatment outcome. We identified one group of tumor voxels significantly associated with locoregional relapse that could potentially be used for dose painting. This tumor voxel cluster was identified using the raw MRI time series rather than the pharmacokinetic maps.

  9. A Linear Algebra Measure of Cluster Quality.

    ERIC Educational Resources Information Center

    Mather, Laura A.

    2000-01-01

    Discussion of models for information retrieval focuses on an application of linear algebra to text clustering, namely, a metric for measuring cluster quality based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. Explains term-document matrices and clustering algorithms. (Author/LRW)

  10. Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

    PubMed

    Shen, Chung-Wei; Chen, Yi-Hau

    2018-03-13

    We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.

  11. Robustness of cluster synchronous patterns in small-world networks with inter-cluster co-competition balance

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jianbao; Ma, Zhongjun, E-mail: mzj1234402@163.com; Chen, Guanrong

    All edges in the classical Watts and Strogatz's small-world network model are unweighted and cooperative (positive). By introducing competitive (negative) inter-cluster edges and assigning edge weights to mimic more realistic networks, this paper develops a modified model which possesses co-competitive weighted couplings and cluster structures while maintaining the common small-world network properties of small average shortest path lengths and large clustering coefficients. Based on theoretical analysis, it is proved that the new model with inter-cluster co-competition balance has an important dynamical property of robust cluster synchronous pattern formation. More precisely, clusters will neither merge nor split regardless of adding ormore » deleting nodes and edges, under the condition of inter-cluster co-competition balance. Numerical simulations demonstrate the robustness of the model against the increase of the coupling strength and several topological variations.« less

  12. Robustness of cluster synchronous patterns in small-world networks with inter-cluster co-competition balance

    NASA Astrophysics Data System (ADS)

    Zhang, Jianbao; Ma, Zhongjun; Chen, Guanrong

    2014-06-01

    All edges in the classical Watts and Strogatz's small-world network model are unweighted and cooperative (positive). By introducing competitive (negative) inter-cluster edges and assigning edge weights to mimic more realistic networks, this paper develops a modified model which possesses co-competitive weighted couplings and cluster structures while maintaining the common small-world network properties of small average shortest path lengths and large clustering coefficients. Based on theoretical analysis, it is proved that the new model with inter-cluster co-competition balance has an important dynamical property of robust cluster synchronous pattern formation. More precisely, clusters will neither merge nor split regardless of adding or deleting nodes and edges, under the condition of inter-cluster co-competition balance. Numerical simulations demonstrate the robustness of the model against the increase of the coupling strength and several topological variations.

  13. Using hierarchical cluster models to systematically identify groups of jobs with similar occupational questionnaire response patterns to assist rule-based expert exposure assessment in population-based studies.

    PubMed

    Friesen, Melissa C; Shortreed, Susan M; Wheeler, David C; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S; Baris, Dalsu; Karagas, Margaret R; Schwenn, Molly; Johnson, Alison; Armenti, Karla R; Silverman, Debra T; Yu, Kai

    2015-05-01

    Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m(-3) respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters' homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job's estimate and the mean estimate for all jobs within the cluster. Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.

  14. An algebraic cluster model based on the harmonic oscillator basis

    NASA Technical Reports Server (NTRS)

    Levai, Geza; Cseh, J.

    1995-01-01

    We discuss the semimicroscopic algebraic cluster model introduced recently, in which the internal structure of the nuclear clusters is described by the harmonic oscillator shell model, while their relative motion is accounted for by the Vibron model. The algebraic formulation of the model makes extensive use of techniques associated with harmonic oscillators and their symmetry group, SU(3). The model is applied to some cluster systems and is found to reproduce important characteristics of nuclei in the sd-shell region. An approximate SU(3) dynamical symmetry is also found to hold for the C-12 + C-12 system.

  15. Multi-mode clustering model for hierarchical wireless sensor networks

    NASA Astrophysics Data System (ADS)

    Hu, Xiangdong; Li, Yongfu; Xu, Huifen

    2017-03-01

    The topology management, i.e., clusters maintenance, of wireless sensor networks (WSNs) is still a challenge due to its numerous nodes, diverse application scenarios and limited resources as well as complex dynamics. To address this issue, a multi-mode clustering model (M2 CM) is proposed to maintain the clusters for hierarchical WSNs in this study. In particular, unlike the traditional time-trigger model based on the whole-network and periodic style, the M2 CM is proposed based on the local and event-trigger operations. In addition, an adaptive local maintenance algorithm is designed for the broken clusters in the WSNs using the spatial-temporal demand changes accordingly. Numerical experiments are performed using the NS2 network simulation platform. Results validate the effectiveness of the proposed model with respect to the network maintenance costs, node energy consumption and transmitted data as well as the network lifetime.

  16. Dynamic Fuzzy Model Development for a Drum-type Boiler-turbine Plant Through GK Clustering

    NASA Astrophysics Data System (ADS)

    Habbi, Ahcène; Zelmat, Mimoun

    2008-10-01

    This paper discusses a TS fuzzy model identification method for an industrial drum-type boiler plant using the GK fuzzy clustering approach. The fuzzy model is constructed from a set of input-output data that covers a wide operating range of the physical plant. The reference data is generated using a complex first-principle-based mathematical model that describes the key dynamical properties of the boiler-turbine dynamics. The proposed fuzzy model is derived by means of fuzzy clustering method with particular attention on structure flexibility and model interpretability issues. This may provide a basement of a new way to design model based control and diagnosis mechanisms for the complex nonlinear plant.

  17. Combining Mixture Components for Clustering*

    PubMed Central

    Baudry, Jean-Patrick; Raftery, Adrian E.; Celeux, Gilles; Lo, Kenneth; Gottardo, Raphaël

    2010-01-01

    Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K. These clusterings can be compared on substantive grounds, and we also describe an automatic way of selecting the number of clusters via a piecewise linear regression fit to the rescaled entropy plot. We illustrate the method with simulated data and a flow cytometry dataset. Supplemental Materials are available on the journal Web site and described at the end of the paper. PMID:20953302

  18. Regression analysis of clustered failure time data with informative cluster size under the additive transformation models.

    PubMed

    Chen, Ling; Feng, Yanqin; Sun, Jianguo

    2017-10-01

    This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.

  19. Using Hierarchical Cluster Models to Systematically Identify Groups of Jobs With Similar Occupational Questionnaire Response Patterns to Assist Rule-Based Expert Exposure Assessment in Population-Based Studies

    PubMed Central

    Friesen, Melissa C.; Shortreed, Susan M.; Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Armenti, Karla R.; Silverman, Debra T.; Yu, Kai

    2015-01-01

    Objectives: Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Methods: Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m−3 respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters’ homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job’s estimate and the mean estimate for all jobs within the cluster. Results: Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. Conclusions: This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. PMID:25477475

  20. Possible world based consistency learning model for clustering and classifying uncertain data.

    PubMed

    Liu, Han; Zhang, Xianchao; Zhang, Xiaotong

    2018-06-01

    Possible world has shown to be effective for handling various types of data uncertainty in uncertain data management. However, few uncertain data clustering and classification algorithms are proposed based on possible world. Moreover, existing possible world based algorithms suffer from the following issues: (1) they deal with each possible world independently and ignore the consistency principle across different possible worlds; (2) they require the extra post-processing procedure to obtain the final result, which causes that the effectiveness highly relies on the post-processing method and the efficiency is also not very good. In this paper, we propose a novel possible world based consistency learning model for uncertain data, which can be extended both for clustering and classifying uncertain data. This model utilizes the consistency principle to learn a consensus affinity matrix for uncertain data, which can make full use of the information across different possible worlds and then improve the clustering and classification performance. Meanwhile, this model imposes a new rank constraint on the Laplacian matrix of the consensus affinity matrix, thereby ensuring that the number of connected components in the consensus affinity matrix is exactly equal to the number of classes. This also means that the clustering and classification results can be directly obtained without any post-processing procedure. Furthermore, for the clustering and classification tasks, we respectively derive the efficient optimization methods to solve the proposed model. Experimental results on real benchmark datasets and real world uncertain datasets show that the proposed model outperforms the state-of-the-art uncertain data clustering and classification algorithms in effectiveness and performs competitively in efficiency. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Partially supervised speaker clustering.

    PubMed

    Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

    2012-05-01

    Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.

  2. Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition.

    PubMed

    Bianne-Bernard, Anne-Laure; Menasri, Farès; Al-Hajj Mohamad, Rami; Mokbel, Chafic; Kermorvant, Christopher; Likforman-Sulem, Laurence

    2011-10-01

    This study aims at building an efficient word recognition system resulting from the combination of three handwriting recognizers. The main component of this combined system is an HMM-based recognizer which considers dynamic and contextual information for a better modeling of writing units. For modeling the contextual units, a state-tying process based on decision tree clustering is introduced. Decision trees are built according to a set of expert-based questions on how characters are written. Questions are divided into global questions, yielding larger clusters, and precise questions, yielding smaller ones. Such clustering enables us to reduce the total number of models and Gaussians densities by 10. We then apply this modeling to the recognition of handwritten words. Experiments are conducted on three publicly available databases based on Latin or Arabic languages: Rimes, IAM, and OpenHart. The results obtained show that contextual information embedded with dynamic modeling significantly improves recognition.

  3. An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network

    PubMed Central

    Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian

    2015-01-01

    Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish–Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection. PMID:26447696

  4. An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network.

    PubMed

    Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian

    2015-01-01

    Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish-Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection.

  5. A cluster expansion model for predicting activation barrier of atomic processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rehman, Tafizur; Jaipal, M.; Chatterjee, Abhijit, E-mail: achatter@iitk.ac.in

    2013-06-15

    We introduce a procedure based on cluster expansion models for predicting the activation barrier of atomic processes encountered while studying the dynamics of a material system using the kinetic Monte Carlo (KMC) method. Starting with an interatomic potential description, a mathematical derivation is presented to show that the local environment dependence of the activation barrier can be captured using cluster interaction models. Next, we develop a systematic procedure for training the cluster interaction model on-the-fly, which involves: (i) obtaining activation barriers for handful local environments using nudged elastic band (NEB) calculations, (ii) identifying the local environment by analyzing the NEBmore » results, and (iii) estimating the cluster interaction model parameters from the activation barrier data. Once a cluster expansion model has been trained, it is used to predict activation barriers without requiring any additional NEB calculations. Numerical studies are performed to validate the cluster expansion model by studying hop processes in Ag/Ag(100). We show that the use of cluster expansion model with KMC enables efficient generation of an accurate process rate catalog.« less

  6. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.

    PubMed

    Kopelman, Naama M; Mayzel, Jonathan; Jakobsson, Mattias; Rosenberg, Noah A; Mayrose, Itay

    2015-09-01

    The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. © 2015 John Wiley & Sons Ltd.

  7. Testing prediction methods: Earthquake clustering versus the Poisson model

    USGS Publications Warehouse

    Michael, A.J.

    1997-01-01

    Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.

  8. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    NASA Astrophysics Data System (ADS)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  9. Rumor Diffusion in an Interests-Based Dynamic Social Network

    PubMed Central

    Mao, Xinjun; Guessoum, Zahia; Zhou, Huiping

    2013-01-01

    To research rumor diffusion in social friend network, based on interests, a dynamic friend network is proposed, which has the characteristics of clustering and community, and a diffusion model is also proposed. With this friend network and rumor diffusion model, based on the zombie-city model, some simulation experiments to analyze the characteristics of rumor diffusion in social friend networks have been conducted. The results show some interesting observations: (1) positive information may evolve to become a rumor through the diffusion process that people may modify the information by word of mouth; (2) with the same average degree, a random social network has a smaller clustering coefficient and is more beneficial for rumor diffusion than the dynamic friend network; (3) a rumor is spread more widely in a social network with a smaller global clustering coefficient than in a social network with a larger global clustering coefficient; and (4) a network with a smaller clustering coefficient has a larger efficiency. PMID:24453911

  10. Rumor diffusion in an interests-based dynamic social network.

    PubMed

    Tang, Mingsheng; Mao, Xinjun; Guessoum, Zahia; Zhou, Huiping

    2013-01-01

    To research rumor diffusion in social friend network, based on interests, a dynamic friend network is proposed, which has the characteristics of clustering and community, and a diffusion model is also proposed. With this friend network and rumor diffusion model, based on the zombie-city model, some simulation experiments to analyze the characteristics of rumor diffusion in social friend networks have been conducted. The results show some interesting observations: (1) positive information may evolve to become a rumor through the diffusion process that people may modify the information by word of mouth; (2) with the same average degree, a random social network has a smaller clustering coefficient and is more beneficial for rumor diffusion than the dynamic friend network; (3) a rumor is spread more widely in a social network with a smaller global clustering coefficient than in a social network with a larger global clustering coefficient; and (4) a network with a smaller clustering coefficient has a larger efficiency.

  11. Automated modal parameter estimation using correlation analysis and bootstrap sampling

    NASA Astrophysics Data System (ADS)

    Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.

    2018-02-01

    The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.

  12. Mechanism for Collective Cell Alignment in Myxococcus xanthus Bacteria

    PubMed Central

    Balagam, Rajesh; Igoshin, Oleg A.

    2015-01-01

    Myxococcus xanthus cells self-organize into aligned groups, clusters, at various stages of their lifecycle. Formation of these clusters is crucial for the complex dynamic multi-cellular behavior of these bacteria. However, the mechanism underlying the cell alignment and clustering is not fully understood. Motivated by studies of clustering in self-propelled rods, we hypothesized that M. xanthus cells can align and form clusters through pure mechanical interactions among cells and between cells and substrate. We test this hypothesis using an agent-based simulation framework in which each agent is based on the biophysical model of an individual M. xanthus cell. We show that model agents, under realistic cell flexibility values, can align and form cell clusters but only when periodic reversals of cell directions are suppressed. However, by extending our model to introduce the observed ability of cells to deposit and follow slime trails, we show that effective trail-following leads to clusters in reversing cells. Furthermore, we conclude that mechanical cell alignment combined with slime-trail-following is sufficient to explain the distinct clustering behaviors observed for wild-type and non-reversing M. xanthus mutants in recent experiments. Our results are robust to variation in model parameters, match the experimentally observed trends and can be applied to understand surface motility patterns of other bacterial species. PMID:26308508

  13. Exemplar-Based Clustering via Simulated Annealing

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Kohn, Hans-Friedrich

    2009-01-01

    Several authors have touted the p-median model as a plausible alternative to within-cluster sums of squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision of "exemplars" as cluster centers, robustness with respect to outliers, and the accommodation of a diverse range of similarity data. We developed…

  14. Elastic K-means using posterior probability.

    PubMed

    Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris

    2017-01-01

    The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.

  15. Composition formulas of Fe-based transition metals-metalloid bulk metallic glasses derived from dual-cluster model of binary eutectics.

    PubMed

    Naz, Gul Jabeen; Dong, Dandan; Geng, Yaoxiang; Wang, Yingmin; Dong, Chuang

    2017-08-22

    It is known that bulk metallic glasses follow simple composition formulas [cluster](glue atom) 1 or 3 with 24 valence electrons within the framework of the cluster-plus-glue-atom model. Though the relevant nearest-neighbor cluster can be readily identified from a devitrification phase, the glue atoms remains poorly defined. The present work is devoted to understanding the composition rule of Fe-(B,P,C) based multi-component bulk metallic glasses, by introducing a cluster-based eutectic liquid model. This model regards a eutectic liquid to be composed of two stable liquids formulated respectively by cluster formulas for ideal metallic glasses from the two eutectic phases. The dual cluster formulas are first established for binary Fe-(B,C,P) eutectics: [Fe-Fe 14 ]B 2 Fe + [B-B 2 Fe 8 ]Fe ≈ Fe 83.3 B 16.7 for eutectic Fe 83 B 17 , [P-Fe 14 ]P + [P-Fe 9 ]P 2 Fe≈Fe 82.8 P 17.2 for Fe 83 P 17 , and [C-Fe 6 ]Fe 3  + [C-Fe 9 ]C 2 Fe ≈ Fe 82.6 C 17.4 for Fe 82.7 C 17.3 . The second formulas in these dual-cluster formulas, being respectively relevant to devitrification phases Fe 2 B, Fe 3 P, and Fe 3 C, well explain the compositions of existing Fe-based transition metals-metalloid bulk metallic glasses. These formulas also satisfy the 24-electron rule. The proposition of the composition formulas for good glass formers, directly from known eutectic points, constitutes a new route towards understanding and eventual designing metallic glasses of high glass forming abilities.

  16. TESTING STELLAR POPULATION SYNTHESIS MODELS WITH SLOAN DIGITAL SKY SURVEY COLORS OF M31's GLOBULAR CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peacock, Mark B.; Zepf, Stephen E.; Maccarone, Thomas J.

    2011-08-10

    Accurate stellar population synthesis models are vital in understanding the properties and formation histories of galaxies. In order to calibrate and test the reliability of these models, they are often compared with observations of star clusters. However, relatively little work has compared these models in the ugriz filters, despite the recent widespread use of this filter set. In this paper, we compare the integrated colors of globular clusters in the Sloan Digital Sky Survey (SDSS) with those predicted from commonly used simple stellar population (SSP) models. The colors are based on SDSS observations of M31's clusters and provide the largestmore » population of star clusters with accurate photometry available from the survey. As such, it is a unique sample with which to compare SSP models with SDSS observations. From this work, we identify a significant offset between the SSP models and the clusters' g - r colors, with the models predicting colors which are too red by g - r {approx} 0.1. This finding is consistent with previous observations of luminous red galaxies in the SDSS, which show a similar discrepancy. The identification of this offset in globular clusters suggests that it is very unlikely to be due to a minority population of young stars. The recently updated SSP model of Maraston and Stroembaeck better represents the observed g - r colors. This model is based on the empirical MILES stellar library, rather than theoretical libraries, suggesting an explanation for the g - r discrepancy.« less

  17. A Game Theoretic Optimization Method for Energy Efficient Global Connectivity in Hybrid Wireless Sensor Networks

    PubMed Central

    Lee, JongHyup; Pak, Dohyun

    2016-01-01

    For practical deployment of wireless sensor networks (WSN), WSNs construct clusters, where a sensor node communicates with other nodes in its cluster, and a cluster head support connectivity between the sensor nodes and a sink node. In hybrid WSNs, cluster heads have cellular network interfaces for global connectivity. However, when WSNs are active and the load of cellular networks is high, the optimal assignment of cluster heads to base stations becomes critical. Therefore, in this paper, we propose a game theoretic model to find the optimal assignment of base stations for hybrid WSNs. Since the communication and energy cost is different according to cellular systems, we devise two game models for TDMA/FDMA and CDMA systems employing power prices to adapt to the varying efficiency of recent wireless technologies. The proposed model is defined on the assumptions of the ideal sensing field, but our evaluation shows that the proposed model is more adaptive and energy efficient than local selections. PMID:27589743

  18. Self-Adaptive Prediction of Cloud Resource Demands Using Ensemble Model and Subtractive-Fuzzy Clustering Based Fuzzy Neural Network

    PubMed Central

    Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong

    2015-01-01

    In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896

  19. Novel layered clustering-based approach for generating ensemble of classifiers.

    PubMed

    Rahman, Ashfaqur; Verma, Brijesh

    2011-05-01

    This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.

  20. Application of multi-scale wavelet entropy and multi-resolution Volterra models for climatic downscaling

    NASA Astrophysics Data System (ADS)

    Sehgal, V.; Lakhanpal, A.; Maheswaran, R.; Khosa, R.; Sridhar, Venkataramana

    2018-01-01

    This study proposes a wavelet-based multi-resolution modeling approach for statistical downscaling of GCM variables to mean monthly precipitation for five locations at Krishna Basin, India. Climatic dataset from NCEP is used for training the proposed models (Jan.'69 to Dec.'94) and are applied to corresponding CanCM4 GCM variables to simulate precipitation for the validation (Jan.'95-Dec.'05) and forecast (Jan.'06-Dec.'35) periods. The observed precipitation data is obtained from the India Meteorological Department (IMD) gridded precipitation product at 0.25 degree spatial resolution. This paper proposes a novel Multi-Scale Wavelet Entropy (MWE) based approach for clustering climatic variables into suitable clusters using k-means methodology. Principal Component Analysis (PCA) is used to obtain the representative Principal Components (PC) explaining 90-95% variance for each cluster. A multi-resolution non-linear approach combining Discrete Wavelet Transform (DWT) and Second Order Volterra (SoV) is used to model the representative PCs to obtain the downscaled precipitation for each downscaling location (W-P-SoV model). The results establish that wavelet-based multi-resolution SoV models perform significantly better compared to the traditional Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) based frameworks. It is observed that the proposed MWE-based clustering and subsequent PCA, helps reduce the dimensionality of the input climatic variables, while capturing more variability compared to stand-alone k-means (no MWE). The proposed models perform better in estimating the number of precipitation events during the non-monsoon periods whereas the models with clustering without MWE over-estimate the rainfall during the dry season.

  1. Population Structure With Localized Haplotype Clusters

    PubMed Central

    Browning, Sharon R.; Weir, Bruce S.

    2010-01-01

    We propose a multilocus version of FST and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific FST estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based FST than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of FST and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data. PMID:20457877

  2. Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

    NASA Technical Reports Server (NTRS)

    Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

    1999-01-01

    We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.

  3. Quantification by qPCR of Pathobionts in Chronic Periodontitis: Development of Predictive Models of Disease Severity at Site-Specific Level.

    PubMed

    Tomás, Inmaculada; Regueira-Iglesias, Alba; López, Maria; Arias-Bujanda, Nora; Novoa, Lourdes; Balsa-Castro, Carlos; Tomás, Maria

    2017-01-01

    Currently, there is little evidence available on the development of predictive models for the diagnosis or prognosis of chronic periodontitis based on the qPCR quantification of subgingival pathobionts. Our objectives were to: (1) analyze and internally validate pathobiont-based models that could be used to distinguish different periodontal conditions at site-specific level within the same patient with chronic periodontitis; (2) develop nomograms derived from predictive models. Subgingival plaque samples were obtained from control and periodontal sites (probing pocket depth and clinical attachment loss <4 mm and >4 mm, respectively) from 40 patients with moderate-severe generalized chronic periodontitis. The samples were analyzed by qPCR using TaqMan probes and specific primers to determine the concentrations of Actinobacillus actinomycetemcomitans (Aa) , Fusobacterium nucleatum (Fn) , Parvimonas micra (Pm) , Porphyromonas gingivalis (Pg) , Prevotella intermedia (Pi) , Tannerella forsythia (Tf) , and Treponema denticola (Td) . The pathobiont-based models were obtained using multivariate binary logistic regression. The best models were selected according to specified criteria. The discrimination was assessed using receiver operating characteristic curves and numerous classification measures were thus obtained. The nomograms were built based on the best predictive models. Eight bacterial cluster-based models showed an area under the curve (AUC) ≥0.760 and a sensitivity and specificity ≥75.0%. The PiTfFn cluster showed an AUC of 0.773 (sensitivity and specificity = 75.0%). When Pm and AaPm were incorporated in the TdPiTfFn cluster, we detected the two best predictive models with an AUC of 0.788 and 0.789, respectively (sensitivity and specificity = 77.5%). The TdPiTfAa cluster had an AUC of 0.785 (sensitivity and specificity = 75.0%). When Pm was incorporated in this cluster, a new predictive model appeared with better AUC and specificity values (0.787 and 80.0%, respectively). Distinct clusters formed by species with different etiopathogenic role (belonging to different Socransky's complexes) had a good predictive accuracy for distinguishing a site with periodontal destruction in a periodontal patient. The predictive clusters with the lowest number of bacteria were PiTfFn and TdPiTfAa , while TdPiTfAaFnPm had the highest number. In all the developed nomograms, high concentrations of these clusters were associated with an increased probability of having a periodontal site in a patient with chronic periodontitis.

  4. Quantification by qPCR of Pathobionts in Chronic Periodontitis: Development of Predictive Models of Disease Severity at Site-Specific Level

    PubMed Central

    Tomás, Inmaculada; Regueira-Iglesias, Alba; López, Maria; Arias-Bujanda, Nora; Novoa, Lourdes; Balsa-Castro, Carlos; Tomás, Maria

    2017-01-01

    Currently, there is little evidence available on the development of predictive models for the diagnosis or prognosis of chronic periodontitis based on the qPCR quantification of subgingival pathobionts. Our objectives were to: (1) analyze and internally validate pathobiont-based models that could be used to distinguish different periodontal conditions at site-specific level within the same patient with chronic periodontitis; (2) develop nomograms derived from predictive models. Subgingival plaque samples were obtained from control and periodontal sites (probing pocket depth and clinical attachment loss <4 mm and >4 mm, respectively) from 40 patients with moderate-severe generalized chronic periodontitis. The samples were analyzed by qPCR using TaqMan probes and specific primers to determine the concentrations of Actinobacillus actinomycetemcomitans (Aa), Fusobacterium nucleatum (Fn), Parvimonas micra (Pm), Porphyromonas gingivalis (Pg), Prevotella intermedia (Pi), Tannerella forsythia (Tf), and Treponema denticola (Td). The pathobiont-based models were obtained using multivariate binary logistic regression. The best models were selected according to specified criteria. The discrimination was assessed using receiver operating characteristic curves and numerous classification measures were thus obtained. The nomograms were built based on the best predictive models. Eight bacterial cluster-based models showed an area under the curve (AUC) ≥0.760 and a sensitivity and specificity ≥75.0%. The PiTfFn cluster showed an AUC of 0.773 (sensitivity and specificity = 75.0%). When Pm and AaPm were incorporated in the TdPiTfFn cluster, we detected the two best predictive models with an AUC of 0.788 and 0.789, respectively (sensitivity and specificity = 77.5%). The TdPiTfAa cluster had an AUC of 0.785 (sensitivity and specificity = 75.0%). When Pm was incorporated in this cluster, a new predictive model appeared with better AUC and specificity values (0.787 and 80.0%, respectively). Distinct clusters formed by species with different etiopathogenic role (belonging to different Socransky’s complexes) had a good predictive accuracy for distinguishing a site with periodontal destruction in a periodontal patient. The predictive clusters with the lowest number of bacteria were PiTfFn and TdPiTfAa, while TdPiTfAaFnPm had the highest number. In all the developed nomograms, high concentrations of these clusters were associated with an increased probability of having a periodontal site in a patient with chronic periodontitis. PMID:28848499

  5. Cluster and propensity based approximation of a network

    PubMed Central

    2013-01-01

    Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust. PMID:23497424

  6. Elastic K-means using posterior probability

    PubMed Central

    Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris

    2017-01-01

    The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model. PMID:29240756

  7. SAR image segmentation using skeleton-based fuzzy clustering

    NASA Astrophysics Data System (ADS)

    Cao, Yun Yi; Chen, Yan Qiu

    2003-06-01

    SAR image segmentation can be converted to a clustering problem in which pixels or small patches are grouped together based on local feature information. In this paper, we present a novel framework for segmentation. The segmentation goal is achieved by unsupervised clustering upon characteristic descriptors extracted from local patches. The mixture model of characteristic descriptor, which combines intensity and texture feature, is investigated. The unsupervised algorithm is derived from the recently proposed Skeleton-Based Data Labeling method. Skeletons are constructed as prototypes of clusters to represent arbitrary latent structures in image data. Segmentation using Skeleton-Based Fuzzy Clustering is able to detect the types of surfaces appeared in SAR images automatically without any user input.

  8. A spatial hazard model for cluster detection on continuous indicators of disease: application to somatic cell score.

    PubMed

    Gay, Emilie; Senoussi, Rachid; Barnouin, Jacques

    2007-01-01

    Methods for spatial cluster detection dealing with diseases quantified by continuous variables are few, whereas several diseases are better approached by continuous indicators. For example, subclinical mastitis of the dairy cow is evaluated using a continuous marker of udder inflammation, the somatic cell score (SCS). Consequently, this study proposed to analyze spatialized risk and cluster components of herd SCS through a new method based on a spatial hazard model. The dataset included annual SCS for 34 142 French dairy herds for the year 2000, and important SCS risk factors: mean parity, percentage of winter and spring calvings, and herd size. The model allowed the simultaneous estimation of the effects of known risk factors and of potential spatial clusters on SCS, and the mapping of the estimated clusters and their range. Mean parity and winter and spring calvings were significantly associated with subclinical mastitis risk. The model with the presence of 3 clusters was highly significant, and the 3 clusters were attractive, i.e. closeness to cluster center increased the occurrence of high SCS. The three localizations were the following: close to the city of Troyes in the northeast of France; around the city of Limoges in the center-west; and in the southwest close to the city of Tarbes. The semi-parametric method based on spatial hazard modeling applies to continuous variables, and takes account of both risk factors and potential heterogeneity of the background population. This tool allows a quantitative detection but assumes a spatially specified form for clusters.

  9. Generating clustered scale-free networks using Poisson based localization of edges

    NASA Astrophysics Data System (ADS)

    Türker, İlker

    2018-05-01

    We introduce a variety of network models using a Poisson-based edge localization strategy, which result in clustered scale-free topologies. We first verify the success of our localization strategy by realizing a variant of the well-known Watts-Strogatz model with an inverse approach, implying a small-world regime of rewiring from a random network through a regular one. We then apply the rewiring strategy to a pure Barabasi-Albert model and successfully achieve a small-world regime, with a limited capacity of scale-free property. To imitate the high clustering property of scale-free networks with higher accuracy, we adapted the Poisson-based wiring strategy to a growing network with the ingredients of both preferential attachment and local connectivity. To achieve the collocation of these properties, we used a routine of flattening the edges array, sorting it, and applying a mixing procedure to assemble both global connections with preferential attachment and local clusters. As a result, we achieved clustered scale-free networks with a computational fashion, diverging from the recent studies by following a simple but efficient approach.

  10. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures

    PubMed Central

    Chen, Yun; Yang, Hui

    2016-01-01

    In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581

  11. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures.

    PubMed

    Chen, Yun; Yang, Hui

    2016-12-14

    In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.

  12. Modeling tensional homeostasis in multicellular clusters.

    PubMed

    Tam, Sze Nok; Smith, Michael L; Stamenović, Dimitrije

    2017-03-01

    Homeostasis of mechanical stress in cells, or tensional homeostasis, is essential for normal physiological function of tissues and organs and is protective against disease progression, including atherosclerosis and cancer. Recent experimental studies have shown that isolated cells are not capable of maintaining tensional homeostasis, whereas multicellular clusters are, with stability increasing with the size of the clusters. Here, we proposed simple mathematical models to interpret experimental results and to obtain insight into factors that determine homeostasis. Multicellular clusters were modeled as one-dimensional arrays of linearly elastic blocks that were either jointed or disjointed. Fluctuating forces that mimicked experimentally measured cell-substrate tractions were obtained from Monte Carlo simulations. These forces were applied to the cluster models, and the corresponding stress field in the cluster was calculated by solving the equilibrium equation. It was found that temporal fluctuations of the cluster stress field became attenuated with increasing cluster size, indicating that the cluster approached tensional homeostasis. These results were consistent with previously reported experimental data. Furthermore, the models revealed that key determinants of tensional homeostasis in multicellular clusters included the cluster size, the distribution of traction forces, and mechanical coupling between adjacent cells. Based on these findings, we concluded that tensional homeostasis was a multicellular phenomenon. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Factors that cause genotype by environment interaction and use of a multiple-trait herd-cluster model for milk yield of Holstein cattle from Brazil and Colombia.

    PubMed

    Cerón-Muñoz, M F; Tonhati, H; Costa, C N; Rojas-Sarmiento, D; Echeverri Echeverri, D M

    2004-08-01

    Descriptive herd variables (DVHE) were used to explain genotype by environment interactions (G x E) for milk yield (MY) in Brazilian and Colombian production environments and to develop a herd-cluster model to estimate covariance components and genetic parameters for each herd environment group. Data consisted of 180,522 lactation records of 94,558 Holstein cows from 937 Brazilian and 400 Colombian herds. Herds in both countries were jointly grouped in thirds according to 8 DVHE: production level, phenotypic variability, age at first calving, calving interval, percentage of imported semen, lactation length, and herd size. For each DVHE, REML bivariate animal model analyses were used to estimate genetic correlations for MY between upper and lower thirds of the data. Based on estimates of genetic correlations, weights were assigned to each DVHE to group herds in a cluster analysis using the FASTCLUS procedure in SAS. Three clusters were defined, and genetic and residual variance components were heterogeneous among herd clusters. Estimates of heritability in clusters 1 and 3 were 0.28 and 0.29, respectively, but the estimate was larger (0.39) in Cluster 2. The genetic correlations of MY from different clusters ranged from 0.89 to 0.97. The herd-cluster model based on DVHE properly takes into account G x E by grouping similar environments accordingly and seems to be an alternative to simply considering country borders to distinguish between environments.

  14. Diversity and Community Can Coexist.

    PubMed

    Stivala, Alex; Robins, Garry; Kashima, Yoshihisa; Kirley, Michael

    2016-03-01

    We examine the (in)compatibility of diversity and sense of community by means of agent-based models based on the well-known Schelling model of residential segregation and Axelrod model of cultural dissemination. We find that diversity and highly clustered social networks, on the assumptions of social tie formation based on spatial proximity and homophily, are incompatible when agent features are immutable, and this holds even for multiple independent features. We include both mutable and immutable features into a model that integrates Schelling and Axelrod models, and we find that even for multiple independent features, diversity and highly clustered social networks can be incompatible on the assumptions of social tie formation based on spatial proximity and homophily. However, this incompatibility breaks down when cultural diversity can be sufficiently large, at which point diversity and clustering need not be negatively correlated. This implies that segregation based on immutable characteristics such as race can possibly be overcome by sufficient similarity on mutable characteristics based on culture, which are subject to a process of social influence, provided a sufficiently large "scope of cultural possibilities" exists. © Society for Community Research and Action 2016.

  15. Low Temperature Kinetics of the First Steps of Water Cluster Formation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bourgalais, J.; Roussel, V.; Capron, M.

    2016-03-01

    We present a combined experimental and theoretical low temperature kinetic study of water cluster formation. Water cluster growth takes place in low temperature (23-69 K) supersonic flows. The observed kinetics of formation of water clusters are reproduced with a kinetic model based on theoretical predictions for the first steps of clusterization. The temperature-and pressure-dependent association and dissociation rate coefficients are predicted with an ab initio transition state theory based master equation approach over a wide range of temperatures (20-100 K) and pressures (10(-6) - 10 bar).

  16. Observing the clustering properties of galaxy clusters in dynamical dark-energy cosmologies

    NASA Astrophysics Data System (ADS)

    Fedeli, C.; Moscardini, L.; Bartelmann, M.

    2009-06-01

    We study the clustering properties of galaxy clusters expected to be observed by various forthcoming surveys both in the X-ray and sub-mm regimes by the thermal Sunyaev-Zel'dovich effect. Several different background cosmological models are assumed, including the concordance ΛCDM and various cosmologies with dynamical evolution of the dark energy. Particular attention is paid to models with a significant contribution of dark energy at early times which affects the process of structure formation. Past light cone and selection effects in cluster catalogs are carefully modeled by realistic scaling relations between cluster mass and observables and by properly taking into account the selection functions of the different instruments. The results show that early dark-energy models are expected to produce significantly lower values of effective bias and both spatial and angular correlation amplitudes with respect to the standard ΛCDM model. Among the cluster catalogs studied in this work, it turns out that those based on eRosita, Planck, and South Pole Telescope observations are the most promising for distinguishing between various dark-energy models.

  17. A density-based clustering model for community detection in complex networks

    NASA Astrophysics Data System (ADS)

    Zhao, Xiang; Li, Yantao; Qu, Zehui

    2018-04-01

    Network clustering (or graph partitioning) is an important technique for uncovering the underlying community structures in complex networks, which has been widely applied in various fields including astronomy, bioinformatics, sociology, and bibliometric. In this paper, we propose a density-based clustering model for community detection in complex networks (DCCN). The key idea is to find group centers with a higher density than their neighbors and a relatively large integrated-distance from nodes with higher density. The experimental results indicate that our approach is efficient and effective for community detection of complex networks.

  18. 3D morphology-based clustering and simulation of human pyramidal cell dendritic spines.

    PubMed

    Luengo-Sanchez, Sergio; Fernaud-Espinosa, Isabel; Bielza, Concha; Benavides-Piccione, Ruth; Larrañaga, Pedro; DeFelipe, Javier

    2018-06-13

    The dendritic spines of pyramidal neurons are the targets of most excitatory synapses in the cerebral cortex. They have a wide variety of morphologies, and their morphology appears to be critical from the functional point of view. To further characterize dendritic spine geometry, we used in this paper over 7,000 individually 3D reconstructed dendritic spines from human cortical pyramidal neurons to group dendritic spines using model-based clustering. This approach uncovered six separate groups of human dendritic spines. To better understand the differences between these groups, the discriminative characteristics of each group were identified as a set of rules. Model-based clustering was also useful for simulating accurate 3D virtual representations of spines that matched the morphological definitions of each cluster. This mathematical approach could provide a useful tool for theoretical predictions on the functional features of human pyramidal neurons based on the morphology of dendritic spines.

  19. Cluster-based control of a separating flow over a smoothly contoured ramp

    NASA Astrophysics Data System (ADS)

    Kaiser, Eurika; Noack, Bernd R.; Spohn, Andreas; Cattafesta, Louis N.; Morzyński, Marek

    2017-12-01

    The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. The proposed closed-loop control framework addresses a key issue of model-based control: The actuation effect often results from slow dynamics of strongly nonlinear interactions which the flow reveals at timescales much longer than the prediction horizon of any model. Hence, we employ a probabilistic approach based on a cluster-based discretization of the Liouville equation for the evolution of the probability distribution. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a control-dependent Markov model. This Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is determined. We examine how the approach can be used to improve the open-loop actuation in a separating flow dominated by Kelvin-Helmholtz shedding. For this purpose, the feature space, in which the model is learned, and the admissible control inputs are tailored to strongly oscillatory flows.

  20. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data

    DOE PAGES

    Hsu, David

    2015-09-27

    Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression,more » also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.« less

  1. Species-richness of the Anopheles annulipes Complex (Diptera: Culicidae) Revealed by Tree and Model-Based Allozyme Clustering Analyses

    DTIC Science & Technology

    2007-01-01

    including tree- based methods such as the unweighted pair group method of analysis ( UPGMA ) and Neighbour-joining (NJ) (Saitou & Nei, 1987). By...based Bayesian approach and the tree-based UPGMA and NJ cluster- ing methods. The results obtained suggest that far more species occur in the An...unlikely that groups that differ by more than these levels are conspecific. Genetic distances were clustered using the UPGMA and NJ algorithms in MEGA

  2. Analytical halo model of galactic conformity

    NASA Astrophysics Data System (ADS)

    Pahwa, Isha; Paranjape, Aseem

    2017-09-01

    We present a fully analytical halo model of colour-dependent clustering that incorporates the effects of galactic conformity in a halo occupation distribution framework. The model, based on our previous numerical work, describes conformity through a correlation between the colour of a galaxy and the concentration of its parent halo, leading to a correlation between central and satellite galaxy colours at fixed halo mass. The strength of the correlation is set by a tunable 'group quenching efficiency', and the model can separately describe group-level correlations between galaxy colour (1-halo conformity) and large-scale correlations induced by assembly bias (2-halo conformity). We validate our analytical results using clustering measurements in mock galaxy catalogues, finding that the model is accurate at the 10-20 per cent level for a wide range of luminosities and length-scales. We apply the formalism to interpret the colour-dependent clustering of galaxies in the Sloan Digital Sky Survey (SDSS). We find good overall agreement between the data and a model that has 1-halo conformity at a level consistent with previous results based on an SDSS group catalogue, although the clustering data require satellites to be redder than suggested by the group catalogue. Within our modelling uncertainties, however, we do not find strong evidence of 2-halo conformity driven by assembly bias in SDSS clustering.

  3. Automated method to differentiate between native and mirror protein models obtained from contact maps.

    PubMed

    Kurczynska, Monika; Kotulska, Malgorzata

    2018-01-01

    Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters-the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.

  4. Riemannian multi-manifold modeling and clustering in brain networks

    NASA Astrophysics Data System (ADS)

    Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.

    2017-08-01

    This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.

  5. Galaxy Cluster Mass Reconstruction Project - II. Quantifying scatter and bias using contrasting mock catalogues

    DOE PAGES

    Old, L.; Wojtak, R.; Mamon, G. A.; ...

    2015-03-26

    Our paper is the second in a series in which we perform an extensive comparison of various galaxy-based cluster mass estimation techniques that utilize the positions, velocities and colours of galaxies. Our aim is to quantify the scatter, systematic bias and completeness of cluster masses derived from a diverse set of 25 galaxy-based methods using two contrasting mock galaxy catalogues based on a sophisticated halo occupation model and a semi-analytic model. Analysing 968 clusters, we find a wide range in the rms errors in log M200c delivered by the different methods (0.18–1.08 dex, i.e. a factor of ~1.5–12), with abundance-matchingmore » and richness methods providing the best results, irrespective of the input model assumptions. In addition, certain methods produce a significant number of catastrophic cases where the mass is under- or overestimated by a factor greater than 10. Given the steeply falling high-mass end of the cluster mass function, we recommend that richness- or abundance-matching-based methods are used in conjunction with these methods as a sanity check for studies selecting high-mass clusters. We also see a stronger correlation of the recovered to input number of galaxies for both catalogues in comparison with the group/cluster mass, however, this does not guarantee that the correct member galaxies are being selected. Finally, we did not observe significantly higher scatter for either mock galaxy catalogues. These results have implications for cosmological analyses that utilize the masses, richnesses, or abundances of clusters, which have different uncertainties when different methods are used.« less

  6. Text Summarization Model based on Facility Location Problem

    NASA Astrophysics Data System (ADS)

    Takamura, Hiroya; Okumura, Manabu

    e propose a novel multi-document generic summarization model based on the budgeted median problem, which is a facility location problem. The summarization method based on our model is an extractive method, which selects sentences from the given document cluster and generates a summary. Each sentence in the document cluster will be assigned to one of the selected sentences, where the former sentece is supposed to be represented by the latter. Our method selects sentences to generate a summary that yields a good sentence assignment and hence covers the whole content of the document cluster. An advantage of this method is that it can incorporate asymmetric relations between sentences such as textual entailment. Through experiments, we showed that the proposed method yields good summaries on the dataset of DUC'04.

  7. Study of lithium cation in water clusters: based on atom-bond electronegativity equalization method fused into molecular mechanics.

    PubMed

    Li, Xin; Yang, Zhong-Zhi

    2005-05-12

    We present a potential model for Li(+)-water clusters based on a combination of the atom-bond electronegativity equalization and molecular mechanics (ABEEM/MM) that is to take ABEEM charges of the cation and all atoms, bonds, and lone pairs of water molecules into the intermolecular electrostatic interaction term in molecular mechanics. The model allows point charges on cationic site and seven sites of an ABEEM-7P water molecule to fluctuate responding to the cluster geometry. The water molecules in the first sphere of Li(+) are strongly structured and there is obvious charge transfer between the cation and the water molecules; therefore, the charge constraint on the ionic cluster includes the charged constraint on the Li(+) and the first-shell water molecules and the charge neutrality constraint on each water molecule in the external hydration shells. The newly constructed potential model based on ABEEM/MM is first applied to ionic clusters and reproduces gas-phase state properties of Li(+)(H(2)O)(n) (n = 1-6 and 8) including optimized geometries, ABEEM charges, binding energies, frequencies, and so on, which are in fair agreement with those measured by available experiments and calculated by ab initio methods. Prospects and benefits introduced by this potential model are pointed out.

  8. Modeling online social signed networks

    NASA Astrophysics Data System (ADS)

    Li, Le; Gu, Ke; Zeng, An; Fan, Ying; Di, Zengru

    2018-04-01

    People's online rating behavior can be modeled by user-object bipartite networks directly. However, few works have been devoted to reveal the hidden relations between users, especially from the perspective of signed networks. We analyze the signed monopartite networks projected by the signed user-object bipartite networks, finding that the networks are highly clustered with obvious community structure. Interestingly, the positive clustering coefficient is remarkably higher than the negative clustering coefficient. Then, a Signed Growing Network model (SGN) based on local preferential attachment is proposed to generate a user's signed network that has community structure and high positive clustering coefficient. Other structural properties of the modeled networks are also found to be similar to the empirical networks.

  9. Generic, network schema agnostic sparse tensor factorization for single-pass clustering of heterogeneous information networks

    PubMed Central

    Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta

    2017-01-01

    Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic. PMID:28245222

  10. Generic, network schema agnostic sparse tensor factorization for single-pass clustering of heterogeneous information networks.

    PubMed

    Wu, Jibing; Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta

    2017-01-01

    Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic.

  11. Using Agent Base Models to Optimize Large Scale Network for Large System Inventories

    NASA Technical Reports Server (NTRS)

    Shameldin, Ramez Ahmed; Bowling, Shannon R.

    2010-01-01

    The aim of this paper is to use Agent Base Models (ABM) to optimize large scale network handling capabilities for large system inventories and to implement strategies for the purpose of reducing capital expenses. The models used in this paper either use computational algorithms or procedure implementations developed by Matlab to simulate agent based models in a principal programming language and mathematical theory using clusters, these clusters work as a high performance computational performance to run the program in parallel computational. In both cases, a model is defined as compilation of a set of structures and processes assumed to underlie the behavior of a network system.

  12. Diagnosis-based Cost Groups in the Dutch Risk-equalization Model: Effects of Clustering Diagnoses and of Allowing Patients to be Classified into Multiple Risk-classes.

    PubMed

    Eijkenaar, Frank; van Vliet, René C J A; van Kleef, Richard C

    2018-01-01

    The risk-equalization (RE) model in the Dutch health insurance market has evolved to a sophisticated model containing direct proxies for health. However, it still has important imperfections, leaving incentives for risk selection. This paper focuses on refining an important health-based risk-adjuster in this model: the diagnosis-based costs groups (DCGs). The current (2017) DCGs are calibrated on "old" data of 2011/2012, are mutually exclusive, and are essentially clusters of about 200 diagnosis-groups ("dxgroups"). Hospital claims data (2013), administrative data (2014) on costs and risk-characteristics for the entire Dutch population (N≈16.9 million), and health survey data (2012, N≈387,000) are used. The survey data are used to identify subgroups of individuals in poor or in good health. The claims and administrative data are used to develop alternative DCG-modalities to examine the impact on individual-level and group-level fit of recalibrating the DCGs based on new data, of allowing patients to be classified in multiple DCGs, and of refraining from clustering. Recalibrating the DCGs and allowing enrolees to be classified into multiple DCGs lead to nontrivial improvements in individual-level and group-level fit (especially for cancer patients and people with comorbid conditions). The improvement resulting from refraining from clustering does not seem to justify the increase in model complexity this would entail. The performance of the sophisticated Dutch RE-model can be improved by allowing classification in multiple (clustered) DCGs and using new data. Irrespective of the modality used, however, various subgroups remain significantly undercompensated. Further improvement of the RE-model merits high priority.

  13. Density-based clustering analyses to identify heterogeneous cellular sub-populations

    NASA Astrophysics Data System (ADS)

    Heaster, Tiffany M.; Walsh, Alex J.; Landman, Bennett A.; Skala, Melissa C.

    2017-02-01

    Autofluorescence microscopy of NAD(P)H and FAD provides functional metabolic measurements at the single-cell level. Here, density-based clustering algorithms were applied to metabolic autofluorescence measurements to identify cell-level heterogeneity in tumor cell cultures. The performance of the density-based clustering algorithm, DENCLUE, was tested in samples with known heterogeneity (co-cultures of breast carcinoma lines). DENCLUE was found to better represent the distribution of cell clusters compared to Gaussian mixture modeling. Overall, DENCLUE is a promising approach to quantify cell-level heterogeneity, and could be used to understand single cell population dynamics in cancer progression and treatment.

  14. Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

    PubMed Central

    Liu, Wenfen

    2017-01-01

    Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447

  15. Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.

    PubMed

    Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin

    2018-05-03

    Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.

  16. A screened independent atom model for the description of ion collisions from atomic and molecular clusters

    NASA Astrophysics Data System (ADS)

    Lüdde, Hans Jürgen; Horbatsch, Marko; Kirchner, Tom

    2018-05-01

    We apply a recently introduced model for an independent-atom-like calculation of ion-impact electron transfer and ionization cross sections to proton collisions from water, neon, and carbon clusters. The model is based on a geometrical interpretation of the cluster cross section as an effective area composed of overlapping circular disks that are representative of the atomic contributions. The latter are calculated using a time-dependent density-functional-theory-based single-particle description with accurate exchange-only ground-state potentials. We find that the net capture and ionization cross sections in p-X n collisions are proportional to n α with 2/3 ≤ α ≤ 1. For capture from water clusters at 100 keV impact energy α is close to one, which is substantially different from the value α = 2/3 predicted by a previous theoretical work based on the simplest-level electron nuclear dynamics method. For ionization at 100 keV and for capture at lower energies we find smaller α values than for capture at 100 keV. This can be understood by considering the magnitude of the atomic cross sections and the resulting overlaps of the circular disks that make up the cluster cross section in our model. Results for neon and carbon clusters confirm these trends. Simple parametrizations are found which fit the cross sections remarkably well and suggest that they depend on the relevant bond lengths.

  17. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification.

    PubMed

    Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan

    2015-06-01

    Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Optical Materials with a Genome: Nanophotonics with DNA-Stabilized Silver Clusters

    NASA Astrophysics Data System (ADS)

    Copp, Stacy M.

    Fluorescent silver clusters with unique rod-like geometries are stabilized by DNA. The sizes and colors of these clusters, or AgN-DNA, are selected by DNA base sequence, which can tune peak emission from blue-green into the near-infrared. Combined with DNA nanostructures, AgN-DNA promise exciting applications in nanophotonics and sensing. Until recently, however, a lack of understanding of the mechanisms controlling AgN-DNA fluorescence has challenged such applications. This dissertation discusses progress toward understanding the role of DNA as a "genome" for silver clusters and toward using DNA to achieve atomic-scale precision of silver cluster size and nanometer-scale precision of silver cluster position on a DNA breadboard. We also investigate sensitivity of AgN-DNA to local solvent environment, with an eye toward applications in chemical and biochemical sensing. Using robotic techniques to generate large data sets, we show that fluorescent silver clusters are templated by certain DNA base motifs that select "magic-sized" cluster cores of enhanced stabilities. The linear arrangement of bases on the phosphate backbone imposes a unique rod-like geometry on the clusters. Harnessing machine learning and bioinformatics techniques, we also demonstrate that sequences of DNA templates can be selected to stabilize silver clusters with desired optical properties, including high fluorescence intensity and specific fluorescence wavelengths, with much higher rates of success as compared to current strategies. The discovered base motifs can be also used to design modular DNA host strands that enable individual silver clusters with atomically precise sizes to bind at specific programmed locations on a DNA nanostructure. We show that DNA-mediated nanoscale arrangement enables near-field coupling of distinct clusters, demonstrated by dual-color cluster assemblies exhibiting resonant energy transfer. These results demonstrate a new degree of control over the optical properties and relative positions of nanoparticles, selected almost solely by the sequence of DNA. AgN-DNA are promising chemical and biochemical sensors due to the sensitivity of their fluorescence to local environment. However, the mechanisms behind many sensing schemes are not understood, and the nature of the excited state of the silver cluster itself remains unknown. To probe the fluorescence mechanisms of AgN-DNA, we investigate the behavior of purified solutions of these clusters in various solvents. We find that standard models for fluorophore solvatochromism, including the Lippert-Mataga model, do not describe AgN-DNA fluorescence because such models neglect specific interactions between the cluster and surrounding solvent molecules. Fluorescence colors are well-modeled by Mie-Gans theory, suggesting that the local dielectric environment of the cluster does play a role in fluorescence, although additional specific solvent interactions and cluster shape changes may also determine fluorescence color and intensity. These results suggest that AgN-DNA may be sensitive to changes in local dielectric environment on nanometer length scales and may also act as sensors for small molecules with affinity for DNA.

  19. Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    PubMed Central

    Boyack, Kevin W.; Newman, David; Duhon, Russell J.; Klavans, Richard; Patek, Michael; Biberstine, Joseph R.; Schijvenaars, Bob; Skupin, André; Ma, Nianli; Börner, Katy

    2011-01-01

    Background We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. Methodology We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE. Conclusions PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts. PMID:21437291

  20. Cluster-based upper body marker models for three-dimensional kinematic analysis: Comparison with an anatomical model and reliability analysis.

    PubMed

    Boser, Quinn A; Valevicius, Aïda M; Lavoie, Ewen B; Chapman, Craig S; Pilarski, Patrick M; Hebert, Jacqueline S; Vette, Albert H

    2018-04-27

    Quantifying angular joint kinematics of the upper body is a useful method for assessing upper limb function. Joint angles are commonly obtained via motion capture, tracking markers placed on anatomical landmarks. This method is associated with limitations including administrative burden, soft tissue artifacts, and intra- and inter-tester variability. An alternative method involves the tracking of rigid marker clusters affixed to body segments, calibrated relative to anatomical landmarks or known joint angles. The accuracy and reliability of applying this cluster method to the upper body has, however, not been comprehensively explored. Our objective was to compare three different upper body cluster models with an anatomical model, with respect to joint angles and reliability. Non-disabled participants performed two standardized functional upper limb tasks with anatomical and cluster markers applied concurrently. Joint angle curves obtained via the marker clusters with three different calibration methods were compared to those from an anatomical model, and between-session reliability was assessed for all models. The cluster models produced joint angle curves which were comparable to and highly correlated with those from the anatomical model, but exhibited notable offsets and differences in sensitivity for some degrees of freedom. Between-session reliability was comparable between all models, and good for most degrees of freedom. Overall, the cluster models produced reliable joint angles that, however, cannot be used interchangeably with anatomical model outputs to calculate kinematic metrics. Cluster models appear to be an adequate, and possibly advantageous alternative to anatomical models when the objective is to assess trends in movement behavior. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. CHIMERA: Top-down model for hierarchical, overlapping and directed cluster structures in directed and weighted complex networks

    NASA Astrophysics Data System (ADS)

    Franke, R.

    2016-11-01

    In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.

  2. Clustering consumers based on trust, confidence and giving behaviour: data-driven model building for charitable involvement in the Australian not-for-profit sector.

    PubMed

    de Vries, Natalie Jane; Reis, Rodrigo; Moscato, Pablo

    2015-01-01

    Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN) followed by a feature saliency method (the CM1 score). A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not-for-profit organisations adopt these strategies, they will be more successful in today's competitive environment.

  3. Clustering Consumers Based on Trust, Confidence and Giving Behaviour: Data-Driven Model Building for Charitable Involvement in the Australian Not-For-Profit Sector

    PubMed Central

    de Vries, Natalie Jane; Reis, Rodrigo; Moscato, Pablo

    2015-01-01

    Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN) followed by a feature saliency method (the CM1 score). A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not-for-profit organisations adopt these strategies, they will be more successful in today's competitive environment. PMID:25849547

  4. A Hidden Markov Model for Urban-Scale Traffic Estimation Using Floating Car Data.

    PubMed

    Wang, Xiaomeng; Peng, Ling; Chi, Tianhe; Li, Mengzhu; Yao, Xiaojing; Shao, Jing

    2015-01-01

    Urban-scale traffic monitoring plays a vital role in reducing traffic congestion. Owing to its low cost and wide coverage, floating car data (FCD) serves as a novel approach to collecting traffic data. However, sparse probe data represents the vast majority of the data available on arterial roads in most urban environments. In order to overcome the problem of data sparseness, this paper proposes a hidden Markov model (HMM)-based traffic estimation model, in which the traffic condition on a road segment is considered as a hidden state that can be estimated according to the conditions of road segments having similar traffic characteristics. An algorithm based on clustering and pattern mining rather than on adjacency relationships is proposed to find clusters with road segments having similar traffic characteristics. A multi-clustering strategy is adopted to achieve a trade-off between clustering accuracy and coverage. Finally, the proposed model is designed and implemented on the basis of a real-time algorithm. Results of experiments based on real FCD confirm the applicability, accuracy, and efficiency of the model. In addition, the results indicate that the model is practicable for traffic estimation on urban arterials and works well even when more than 70% of the probe data are missing.

  5. Kernel spectral clustering with memory effect

    NASA Astrophysics Data System (ADS)

    Langone, Rocco; Alzate, Carlos; Suykens, Johan A. K.

    2013-05-01

    Evolving graphs describe many natural phenomena changing over time, such as social relationships, trade markets, metabolic networks etc. In this framework, performing community detection and analyzing the cluster evolution represents a critical task. Here we propose a new model for this purpose, where the smoothness of the clustering results over time can be considered as a valid prior knowledge. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness. The latter allows the model to cluster the current data well and to be consistent with the recent history. We also propose new model selection criteria in order to carefully choose the hyper-parameters of our model, which is a crucial issue to achieve good performances. We successfully test the model on four toy problems and on a real world network. We also compare our model with Evolutionary Spectral Clustering, which is a state-of-the-art algorithm for community detection of evolving networks, illustrating that the kernel spectral clustering with memory effect can achieve better or equal performances.

  6. Does faint galaxy clustering contradict gravitational instability?

    NASA Technical Reports Server (NTRS)

    Melott, Adrian L.

    1992-01-01

    It has been argued, based on the weakness of clustering of faint galaxies, that these objects cannot be the precursors of present galaxies in a simple Einstein-de Sitter model universe with clustering driven by gravitational instability. It is shown that the assumptions made about the growth of clustering were too restrictive. In such a universe, the growth of clustering can easily be fast enough to match the data.

  7. Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

    PubMed

    Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

    2017-05-01

    The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.

  8. Complex networks as a unified framework for descriptive analysis and predictive modeling in climate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R

    The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less

  9. Search for Directed Networks by Different Random Walk Strategies

    NASA Astrophysics Data System (ADS)

    Zhu, Zi-Qi; Jin, Xiao-Ling; Huang, Zhi-Long

    2012-03-01

    A comparative study is carried out on the efficiency of five different random walk strategies searching on directed networks constructed based on several typical complex networks. Due to the difference in search efficiency of the strategies rooted in network clustering, the clustering coefficient in a random walker's eye on directed networks is defined and computed to be half of the corresponding undirected networks. The search processes are performed on the directed networks based on Erdös—Rényi model, Watts—Strogatz model, Barabási—Albert model and clustered scale-free network model. It is found that self-avoiding random walk strategy is the best search strategy for such directed networks. Compared to unrestricted random walk strategy, path-iteration-avoiding random walks can also make the search process much more efficient. However, no-triangle-loop and no-quadrangle-loop random walks do not improve the search efficiency as expected, which is different from those on undirected networks since the clustering coefficient of directed networks are smaller than that of undirected networks.

  10. Collaborative Filtering Based on Sequential Extraction of User-Item Clusters

    NASA Astrophysics Data System (ADS)

    Honda, Katsuhiro; Notsu, Akira; Ichihashi, Hidetomo

    Collaborative filtering is a computational realization of “word-of-mouth” in network community, in which the items prefered by “neighbors” are recommended. This paper proposes a new item-selection model for extracting user-item clusters from rectangular relation matrices, in which mutual relations between users and items are denoted in an alternative process of “liking or not”. A technique for sequential co-cluster extraction from rectangular relational data is given by combining the structural balancing-based user-item clustering method with sequential fuzzy cluster extraction appraoch. Then, the tecunique is applied to the collaborative filtering problem, in which some items may be shared by several user clusters.

  11. Model-based Clustering of High-Dimensional Data in Astrophysics

    NASA Astrophysics Data System (ADS)

    Bouveyron, C.

    2016-05-01

    The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model-based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.

  12. ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices.

    PubMed

    Wilderjans, Tom F; Ceulemans, Eva; Van Mechelen, Iven; Depril, Dirk

    2011-03-01

    In many areas of psychology, one is interested in disclosing the underlying structural mechanisms that generated an object by variable data set. Often, based on theoretical or empirical arguments, it may be expected that these underlying mechanisms imply that the objects are grouped into clusters that are allowed to overlap (i.e., an object may belong to more than one cluster). In such cases, analyzing the data with Mirkin's additive profile clustering model may be appropriate. In this model: (1) each object may belong to no, one or several clusters, (2) there is a specific variable profile associated with each cluster, and (3) the scores of the objects on the variables can be reconstructed by adding the cluster-specific variable profiles of the clusters the object in question belongs to. Until now, however, no software program has been publicly available to perform an additive profile clustering analysis. For this purpose, in this article, the ADPROCLUS program, steered by a graphical user interface, is presented. We further illustrate its use by means of the analysis of a patient by symptom data matrix.

  13. Space station ECLSS integration analysis: Simplified General Cluster Systems Model, ECLS System Assessment Program enhancements

    NASA Technical Reports Server (NTRS)

    Ferguson, R. E.

    1985-01-01

    The data base verification of the ECLS Systems Assessment Program (ESAP) was documented and changes made to enhance the flexibility of the water recovery subsystem simulations are given. All changes which were made to the data base values are described and the software enhancements performed. The refined model documented herein constitutes the submittal of the General Cluster Systems Model. A source listing of the current version of ESAP is provided in Appendix A.

  14. Galaxy cluster lensing masses in modified lensing potentials

    DOE PAGES

    Barreira, Alexandre; Li, Baojiu; Jennings, Elise; ...

    2015-10-28

    In this study, we determine the concentration–mass relation of 19 X-ray selected galaxy clusters from the Cluster Lensing and Supernova Survey with Hubble survey in theories of gravity that directly modify the lensing potential. We model the clusters as Navarro–Frenk–White haloes and fit their lensing signal, in the Cubic Galileon and Nonlocal gravity models, to the lensing convergence profiles of the clusters. We discuss a number of important issues that need to be taken into account, associated with the use of non-parametric and parametric lensing methods, as well as assumptions about the background cosmology. Our results show that the concentrationmore » and mass estimates in the modified gravity models are, within the error bars, the same as in Λ cold dark matter. This result demonstrates that, for the Nonlocal model, the modifications to gravity are too weak at the cluster redshifts, and for the Galileon model, the screening mechanism is very efficient inside the cluster radius. However, at distances ~ [2–20] Mpc/h from the cluster centre, we find that the surrounding force profiles are enhanced by ~ 20–40% in the Cubic Galileon model. This has an impact on dynamical mass estimates, which means that tests of gravity based on comparisons between lensing and dynamical masses can also be applied to the Cubic Galileon model.« less

  15. Enrichment 2.0 Gifted and Talented Education for the 21st Century

    ERIC Educational Resources Information Center

    Eckstein, Michelle

    2009-01-01

    Enrichment clusters, a component of the Schoolwide Enrichment Model, are multigrade investigative groups based on constructivist learning methodology. Enrichment clusters are organized around major disciplines, interdisciplinary themes, or cross-disciplinary topics. Within clusters, students are grouped across grade levels by interests and focused…

  16. Bivariate functional data clustering: grouping streams based on a varying coefficient model of the stream water and air temperature relationship

    Treesearch

    H. Li; X. Deng; Andy Dolloff; E. P. Smith

    2015-01-01

    A novel clustering method for bivariate functional data is proposed to group streams based on their water–air temperature relationship. A distance measure is developed for bivariate curves by using a time-varying coefficient model and a weighting scheme. This distance is also adjusted by spatial correlation of streams via the variogram. Therefore, the proposed...

  17. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    PubMed

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  18. Solving the scalability issue in quantum-based refinement: Q|R#1.

    PubMed

    Zheng, Min; Moriarty, Nigel W; Xu, Yanting; Reimers, Jeffrey R; Afonine, Pavel V; Waller, Mark P

    2017-12-01

    Accurately refining biomacromolecules using a quantum-chemical method is challenging because the cost of a quantum-chemical calculation scales approximately as n m , where n is the number of atoms and m (≥3) is based on the quantum method of choice. This fundamental problem means that quantum-chemical calculations become intractable when the size of the system requires more computational resources than are available. In the development of the software package called Q|R, this issue is referred to as Q|R#1. A divide-and-conquer approach has been developed that fragments the atomic model into small manageable pieces in order to solve Q|R#1. Firstly, the atomic model of a crystal structure is analyzed to detect noncovalent interactions between residues, and the results of the analysis are represented as an interaction graph. Secondly, a graph-clustering algorithm is used to partition the interaction graph into a set of clusters in such a way as to minimize disruption to the noncovalent interaction network. Thirdly, the environment surrounding each individual cluster is analyzed and any residue that is interacting with a particular cluster is assigned to the buffer region of that particular cluster. A fragment is defined as a cluster plus its buffer region. The gradients for all atoms from each of the fragments are computed, and only the gradients from each cluster are combined to create the total gradients. A quantum-based refinement is carried out using the total gradients as chemical restraints. In order to validate this interaction graph-based fragmentation approach in Q|R, the entire atomic model of an amyloid cross-β spine crystal structure (PDB entry 2oNA) was refined.

  19. Scattering of clusters of spherical particles—Modeling and inverse problem solution in the Rayleigh-Gans approximation

    NASA Astrophysics Data System (ADS)

    Eliçabe, Guillermo E.

    2013-09-01

    In this work, an exact scattering model for a system of clusters of spherical particles, based on the Rayleigh-Gans approximation, has been parameterized in such a way that it can be solved in inverse form using Thikhonov Regularization to obtain the morphological parameters of the clusters. That is to say, the average number of particles per cluster, the size of the primary spherical units that form the cluster, and the Discrete Distance Distribution Function from which the z-average square radius of gyration of the system of clusters is obtained. The methodology is validated through a series of simulated and experimental examples of x-ray and light scattering that show that the proposed methodology works satisfactorily in unideal situations such as: presence of error in the measurements, presence of error in the model, and several types of unideallities present in the experimental cases.

  20. A Cyber-Attack Detection Model Based on Multivariate Analyses

    NASA Astrophysics Data System (ADS)

    Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

    In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.

  1. Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

    1992-01-01

    The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.

  2. Towards Accurate Modelling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-04-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  3. Cluster kinetics model for mixtures of glassformers

    NASA Astrophysics Data System (ADS)

    Brenskelle, Lisa A.; McCoy, Benjamin J.

    2007-10-01

    For glassformers we propose a binary mixture relation for parameters in a cluster kinetics model previously shown to represent pure compound data for viscosity and dielectric relaxation as functions of either temperature or pressure. The model parameters are based on activation energies and activation volumes for cluster association-dissociation processes. With the mixture parameters, we calculated dielectric relaxation times and compared the results to experimental values for binary mixtures. Mixtures of sorbitol and glycerol (seven compositions), sorbitol and xylitol (three compositions), and polychloroepihydrin and polyvinylmethylether (three compositions) were studied.

  4. Multi-exemplar affinity propagation.

    PubMed

    Wang, Chang-Dong; Lai, Jian-Huang; Suen, Ching Y; Zhu, Jun-Yong

    2013-09-01

    The affinity propagation (AP) clustering algorithm has received much attention in the past few years. AP is appealing because it is efficient, insensitive to initialization, and it produces clusters at a lower error rate than other exemplar-based methods. However, its single-exemplar model becomes inadequate when applied to model multisubclasses in some situations such as scene analysis and character recognition. To remedy this deficiency, we have extended the single-exemplar model to a multi-exemplar one to create a new multi-exemplar affinity propagation (MEAP) algorithm. This new model automatically determines the number of exemplars in each cluster associated with a super exemplar to approximate the subclasses in the category. Solving the model is NP-hard and we tackle it with the max-sum belief propagation to produce neighborhood maximum clusters, with no need to specify beforehand the number of clusters, multi-exemplars, and superexemplars. Also, utilizing the sparsity in the data, we are able to reduce substantially the computational time and storage. Experimental studies have shown MEAP's significant improvements over other algorithms on unsupervised image categorization and the clustering of handwritten digits.

  5. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

    PubMed

    Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

    2017-10-25

    Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

  6. Predicting stabilizing treatment outcomes for complex posttraumatic stress disorder and dissociative identity disorder: an expertise-based prognostic model.

    PubMed

    Baars, Erik W; van der Hart, Onno; Nijenhuis, Ellert R S; Chu, James A; Glas, Gerrit; Draijer, Nel

    2011-01-01

    The purpose of this study was to develop an expertise-based prognostic model for the treatment of complex posttraumatic stress disorder (PTSD) and dissociative identity disorder (DID). We developed a survey in 2 rounds: In the first round we surveyed 42 experienced therapists (22 DID and 20 complex PTSD therapists), and in the second round we surveyed a subset of 22 of the 42 therapists (13 DID and 9 complex PTSD therapists). First, we drew on therapists' knowledge of prognostic factors for stabilization-oriented treatment of complex PTSD and DID. Second, therapists prioritized a list of prognostic factors by estimating the size of each variable's prognostic effect; we clustered these factors according to content and named the clusters. Next, concept mapping methodology and statistical analyses (including principal components analyses) were used to transform individual judgments into weighted group judgments for clusters of items. A prognostic model, based on consensually determined estimates of effect sizes, of 8 clusters containing 51 factors for both complex PTSD and DID was formed. It includes the clusters lack of motivation, lack of healthy relationships, lack of healthy therapeutic relationships, lack of other internal and external resources, serious Axis I comorbidity, serious Axis II comorbidity, poor attachment, and self-destruction. In addition, a set of 5 DID-specific items was constructed. The model is supportive of the current phase-oriented treatment model, emphasizing the strengthening of the therapeutic relationship and the patient's resources in the initial stabilization phase. Further research is needed to test the model's statistical and clinical validity.

  7. Similarity measure and domain adaptation in multiple mixture model clustering: An application to image processing.

    PubMed

    Leong, Siow Hoo; Ong, Seng Huat

    2017-01-01

    This paper considers three crucial issues in processing scaled down image, the representation of partial image, similarity measure and domain adaptation. Two Gaussian mixture model based algorithms are proposed to effectively preserve image details and avoids image degradation. Multiple partial images are clustered separately through Gaussian mixture model clustering with a scan and select procedure to enhance the inclusion of small image details. The local image features, represented by maximum likelihood estimates of the mixture components, are classified by using the modified Bayes factor (MBF) as a similarity measure. The detection of novel local features from MBF will suggest domain adaptation, which is changing the number of components of the Gaussian mixture model. The performance of the proposed algorithms are evaluated with simulated data and real images and it is shown to perform much better than existing Gaussian mixture model based algorithms in reproducing images with higher structural similarity index.

  8. Similarity measure and domain adaptation in multiple mixture model clustering: An application to image processing

    PubMed Central

    Leong, Siow Hoo

    2017-01-01

    This paper considers three crucial issues in processing scaled down image, the representation of partial image, similarity measure and domain adaptation. Two Gaussian mixture model based algorithms are proposed to effectively preserve image details and avoids image degradation. Multiple partial images are clustered separately through Gaussian mixture model clustering with a scan and select procedure to enhance the inclusion of small image details. The local image features, represented by maximum likelihood estimates of the mixture components, are classified by using the modified Bayes factor (MBF) as a similarity measure. The detection of novel local features from MBF will suggest domain adaptation, which is changing the number of components of the Gaussian mixture model. The performance of the proposed algorithms are evaluated with simulated data and real images and it is shown to perform much better than existing Gaussian mixture model based algorithms in reproducing images with higher structural similarity index. PMID:28686634

  9. The effect of mining data k-means clustering toward students profile model drop out potential

    NASA Astrophysics Data System (ADS)

    Purba, Windania; Tamba, Saut; Saragih, Jepronel

    2018-04-01

    The high of student success and the low of student failure can reflect the quality of a college. One of the factors of fail students was drop out. To solve the problem, so mining data with K-means Clustering was applied. K-Means Clustering method would be implemented to clustering the drop out students potentially. Firstly the the result data would be clustering to get the information of all students condition. Based on the model taken was found that students who potentially drop out because of the unexciting students in learning, unsupported parents, diffident students and less of students behavior time. The result of process of K-Means Clustering could known that students who more potentially drop out were in Cluster 1 caused Credit Total System, Quality Total, and the lowest Grade Point Average (GPA) compared between cluster 2 and 3.

  10. Hidden electronic rule in the “cluster-plus-glue-atom” model

    PubMed Central

    Du, Jinglian; Dong, Chuang; Melnik, Roderick; Kawazoe, Yoshiyuki; Wen, Bin

    2016-01-01

    Electrons and their interactions are intrinsic factors to affect the structure and properties of materials. Based on the “cluster-cluster-plus-glue-atom” model, an electron counting rule for complex metallic alloys (CMAs) has been revealed in this work (i. e. the CPGAMEC rule). Our results on the cluster structure and electron concentration of CMAs with apparent cluster features, indicate that the valence electrons’ number per unit cluster formula for these CMAs are specific constants of eight-multiples and twelve-multiples. It is thus termed as specific electrons cluster formula. This CPGAMEC rule has been demonstrated as a useful guidance to direct the design of CMAs with desired properties, while its practical applications and underlying mechanism have been illustrated on the basis of CMAs’ cluster structural features. Our investigation provides an aggregate picture with intriguing electronic rule and atomic structural features of CMAs. PMID:27642002

  11. A cluster merging method for time series microarray with production values.

    PubMed

    Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio

    2014-09-01

    A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.

  12. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    PubMed Central

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  13. Fierz Convergence Criterion: A Controlled Approach to Strongly Interacting Systems with Small Embedded Clusters.

    PubMed

    Ayral, Thomas; Vučičević, Jaksa; Parcollet, Olivier

    2017-10-20

    We present an embedded-cluster method, based on the triply irreducible local expansion formalism. It turns the Fierz ambiguity, inherent to approaches based on a bosonic decoupling of local fermionic interactions, into a convergence criterion. It is based on the approximation of the three-leg vertex by a coarse-grained vertex computed from a self-consistently determined cluster impurity model. The computed self-energies are, by construction, continuous functions of momentum. We show that, in three interaction and doping regimes of the two-dimensional Hubbard model, self-energies obtained with clusters of size four only are very close to numerically exact benchmark results. We show that the Fierz parameter, which parametrizes the freedom in the Hubbard-Stratonovich decoupling, can be used as a quality control parameter. By contrast, the GW+extended dynamical mean field theory approximation with four cluster sites is shown to yield good results only in the weak-coupling regime and for a particular decoupling. Finally, we show that the vertex has spatially nonlocal components only at low Matsubara frequencies.

  14. Substructures in DAFT/FADA survey clusters based on XMM and optical data

    NASA Astrophysics Data System (ADS)

    Durret, F.; DAFT/FADA Team

    2014-07-01

    The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.

  15. Inherent Structure versus Geometric Metric for State Space Discretization

    PubMed Central

    Liu, Hanzhong; Li, Minghai; Fan, Jue; Huo, Shuanghong

    2016-01-01

    Inherent structure (IS) and geometry-based clustering methods are commonly used for analyzing molecular dynamics trajectories. ISs are obtained by minimizing the sampled conformations into local minima on potential/effective energy surface. The conformations that are minimized into the same energy basin belong to one cluster. We investigate the influence of the applications of these two methods of trajectory decomposition on our understanding of the thermodynamics and kinetics of alanine tetrapeptide. We find that at the micro cluster level, the IS approach and root-mean-square deviation (RMSD) based clustering method give totally different results. Depending on the local features of energy landscape, the conformations with close RMSDs can be minimized into different minima, while the conformations with large RMSDs could be minimized into the same basin. However, the relaxation timescales calculated based on the transition matrices built from the micro clusters are similar. The discrepancy at the micro cluster level leads to different macro clusters. Although the dynamic models established through both clustering methods are validated approximately Markovian, the IS approach seems to give a meaningful state space discretization at the macro cluster level. PMID:26915811

  16. Simulating star clusters with the AMUSE software framework. I. Dependence of cluster lifetimes on model assumptions and cluster dissolution modes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitehead, Alfred J.; McMillan, Stephen L. W.; Vesperini, Enrico

    2013-12-01

    We perform a series of simulations of evolving star clusters using the Astrophysical Multipurpose Software Environment (AMUSE), a new community-based multi-physics simulation package, and compare our results to existing work. These simulations model a star cluster beginning with a King model distribution and a selection of power-law initial mass functions and contain a tidal cutoff. They are evolved using collisional stellar dynamics and include mass loss due to stellar evolution. After studying and understanding that the differences between AMUSE results and results from previous studies are understood, we explored the variation in cluster lifetimes due to the random realization noisemore » introduced by transforming a King model to specific initial conditions. This random realization noise can affect the lifetime of a simulated star cluster by up to 30%. Two modes of star cluster dissolution were identified: a mass evolution curve that contains a runaway cluster dissolution with a sudden loss of mass, and a dissolution mode that does not contain this feature. We refer to these dissolution modes as 'dynamical' and 'relaxation' dominated, respectively. For Salpeter-like initial mass functions, we determined the boundary between these two modes in terms of the dynamical and relaxation timescales.« less

  17. Dynamic Evolution Model Based on Social Network Services

    NASA Astrophysics Data System (ADS)

    Xiong, Xi; Gou, Zhi-Jian; Zhang, Shi-Bin; Zhao, Wen

    2013-11-01

    Based on the analysis of evolutionary characteristics of public opinion in social networking services (SNS), in the paper we propose a dynamic evolution model, in which opinions are coupled with topology. This model shows the clustering phenomenon of opinions in dynamic network evolution. The simulation results show that the model can fit the data from a social network site. The dynamic evolution of networks accelerates the opinion, separation and aggregation. The scale and the number of clusters are influenced by confidence limit and rewiring probability. Dynamic changes of the topology reduce the number of isolated nodes, while the increased confidence limit allows nodes to communicate more sufficiently. The two effects make the distribution of opinion more neutral. The dynamic evolution of networks generates central clusters with high connectivity and high betweenness, which make it difficult to control public opinions in SNS.

  18. Community detection using Kernel Spectral Clustering with memory

    NASA Astrophysics Data System (ADS)

    Langone, Rocco; Suykens, Johan A. K.

    2013-02-01

    This work is related to the problem of community detection in dynamic scenarios, which for instance arises in the segmentation of moving objects, clustering of telephone traffic data, time-series micro-array data etc. A desirable feature of a clustering model which has to capture the evolution of communities over time is the temporal smoothness between clusters in successive time-steps. In this way the model is able to track the long-term trend and in the same time it smooths out short-term variation due to noise. We use the Kernel Spectral Clustering with Memory effect (MKSC) which allows to predict cluster memberships of new nodes via out-of-sample extension and has a proper model selection scheme. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness as a valid prior knowledge. The latter, in fact, allows the model to cluster the current data well and to be consistent with the recent history. Here we propose a generalization of the MKSC model with an arbitrary memory, not only one time-step in the past. The experiments conducted on toy problems confirm our expectations: the more memory we add to the model, the smoother over time are the clustering results. We also compare with the Evolutionary Spectral Clustering (ESC) algorithm which is a state-of-the art method, and we obtain comparable or better results.

  19. Chemical models for simulating single-walled nanotube production in arc vaporization and laser ablation processes

    NASA Technical Reports Server (NTRS)

    Scott, Carl D.

    2004-01-01

    Chemical kinetic models for the nucleation and growth of clusters and single-walled carbon nanotube (SWNT) growth are developed for numerical simulations of the production of SWNTs. Two models that involve evaporation and condensation of carbon and metal catalysts, a full model involving all carbon clusters up to C80, and a reduced model are discussed. The full model is based on a fullerene model, but nickel and carbon/nickel cluster reactions are added to form SWNTs from soot and fullerenes. The full model has a large number of species--so large that to incorporate them into a flow field computation for simulating laser ablation and arc processes requires that they be simplified. The model is reduced by defining large clusters that represent many various sized clusters. Comparisons are given between these models for cases that may be applicable to arc and laser ablation production. Solutions to the system of chemical rate equations of these models for a ramped temperature profile show that production of various species, including SWNTs, agree to within about 50% for a fast ramp, and within 10% for a slower temperature decay time.

  20. A clustering-based fuzzy wavelet neural network model for short-term load forecasting.

    PubMed

    Kodogiannis, Vassilis S; Amina, Mahdi; Petrounias, Ilias

    2013-10-01

    Load forecasting is a critical element of power system operation, involving prediction of the future level of demand to serve as the basis for supply and demand planning. This paper presents the development of a novel clustering-based fuzzy wavelet neural network (CB-FWNN) model and validates its prediction on the short-term electric load forecasting of the Power System of the Greek Island of Crete. The proposed model is obtained from the traditional Takagi-Sugeno-Kang fuzzy system by replacing the THEN part of fuzzy rules with a "multiplication" wavelet neural network (MWNN). Multidimensional Gaussian type of activation functions have been used in the IF part of the fuzzyrules. A Fuzzy Subtractive Clustering scheme is employed as a pre-processing technique to find out the initial set and adequate number of clusters and ultimately the number of multiplication nodes in MWNN, while Gaussian Mixture Models with the Expectation Maximization algorithm are utilized for the definition of the multidimensional Gaussians. The results corresponding to the minimum and maximum power load indicate that the proposed load forecasting model provides significantly accurate forecasts, compared to conventional neural networks models.

  1. Buried landmine detection using multivariate normal clustering

    NASA Astrophysics Data System (ADS)

    Duston, Brian M.

    2001-10-01

    A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.

  2. Mississippi State University Center for Air Sea Technology. FY93 and FY 94 Research Program in Navy Ocean Modeling and Prediction

    DTIC Science & Technology

    1994-09-30

    relational versus object oriented DBMS, knowledge discovery, data models, rnetadata, data filtering, clustering techniques, and synthetic data. A secondary...The first was the investigation of Al/ES Lapplications (knowledge discovery, data mining, and clustering ). Here CAST collabo.rated with Dr. Fred Petry...knowledge discovery system based on clustering techniques; implemented an on-line data browser to the DBMS; completed preliminary efforts to apply object

  3. Cluster Physics with Merging Galaxy Clusters

    NASA Astrophysics Data System (ADS)

    Molnar, Sandor

    Collisions between galaxy clusters provide a unique opportunity to study matter in a parameter space which cannot be explored in our laboratories on Earth. In the standard ΛCDM model, where the total density is dominated by the cosmological constant (Λ) and the matter density by cold dark matter (CDM), structure formation is hierarchical, and clusters grow mostly by merging. Mergers of two massive clusters are the most energetic events in the universe after the Big Bang, hence they provide a unique laboratory to study cluster physics. The two main mass components in clusters behave differently during collisions: the dark matter is nearly collisionless, responding only to gravity, while the gas is subject to pressure forces and dissipation, and shocks and turbulence are developed during collisions. In the present contribution we review the different methods used to derive the physical properties of merging clusters. Different physical processes leave their signatures on different wavelengths, thus our review is based on a multifrequency analysis. In principle, the best way to analyze multifrequency observations of merging clusters is to model them using N-body/HYDRO numerical simulations. We discuss the results of such detailed analyses. New high spatial and spectral resolution ground and space based telescopes will come online in the near future. Motivated by these new opportunities, we briefly discuss methods which will be feasible in the near future in studying merging clusters.

  4. Characterizing the spatial structure of endangered species habitat using geostatistical analysis of IKONOS imagery

    USGS Publications Warehouse

    Wallace, C.S.A.; Marsh, S.E.

    2005-01-01

    Our study used geostatistics to extract measures that characterize the spatial structure of vegetated landscapes from satellite imagery for mapping endangered Sonoran pronghorn habitat. Fine spatial resolution IKONOS data provided information at the scale of individual trees or shrubs that permitted analysis of vegetation structure and pattern. We derived images of landscape structure by calculating local estimates of the nugget, sill, and range variogram parameters within 25 ?? 25-m image windows. These variogram parameters, which describe the spatial autocorrelation of the 1-m image pixels, are shown in previous studies to discriminate between different species-specific vegetation associations. We constructed two independent models of pronghorn landscape preference by coupling the derived measures with Sonoran pronghorn sighting data: a distribution-based model and a cluster-based model. The distribution-based model used the descriptive statistics for variogram measures at pronghorn sightings, whereas the cluster-based model used the distribution of pronghorn sightings within clusters of an unsupervised classification of derived images. Both models define similar landscapes, and validation results confirm they effectively predict the locations of an independent set of pronghorn sightings. Such information, although not a substitute for field-based knowledge of the landscape and associated ecological processes, can provide valuable reconnaissance information to guide natural resource management efforts. ?? 2005 Taylor & Francis Group Ltd.

  5. Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes.

    PubMed

    Johnson, Jacqueline L; Kreidler, Sarah M; Catellier, Diane J; Murray, David M; Muller, Keith E; Glueck, Deborah H

    2015-11-30

    We used theoretical and simulation-based approaches to study Type I error rates for one-stage and two-stage analytic methods for cluster-randomized designs. The one-stage approach uses the observed data as outcomes and accounts for within-cluster correlation using a general linear mixed model. The two-stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one-stage and two-stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one-stage and six two-stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two-stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one-stage model with Kenward-Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one-stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster-randomized trials, the Kenward-Roger method is the preferred one-stage approach. Copyright © 2015 John Wiley & Sons, Ltd.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sills, Alison; Glebbeek, Evert; Chatterjee, Sourav

    We created artificial color-magnitude diagrams of Monte Carlo dynamical models of globular clusters and then used observational methods to determine the number of blue stragglers in those clusters. We compared these blue stragglers to various cluster properties, mimicking work that has been done for blue stragglers in Milky Way globular clusters to determine the dominant formation mechanism(s) of this unusual stellar population. We find that a mass-based prescription for selecting blue stragglers will select approximately twice as many blue stragglers than a selection criterion that was developed for observations of real clusters. However, the two numbers of blue stragglers aremore » well-correlated, so either selection criterion can be used to characterize the blue straggler population of a cluster. We confirm previous results that the simplified prescription for the evolution of a collision or merger product in the BSE code overestimates their lifetimes. We show that our model blue stragglers follow similar trends with cluster properties (core mass, binary fraction, total mass, collision rate) as the true Milky Way blue stragglers as long as we restrict ourselves to model clusters with an initial binary fraction higher than 5%. We also show that, in contrast to earlier work, the number of blue stragglers in the cluster core does have a weak dependence on the collisional parameter Γ in both our models and in Milky Way globular clusters.« less

  7. Cosmic variance of the galaxy cluster weak lensing signal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gruen, D.; Seitz, S.; Becker, M. R.

    Intrinsic variations of the projected density profiles of clusters of galaxies at fixed mass are a source of uncertainty for cluster weak lensing. We present a semi-analytical model to account for this effect, based on a combination of variations in halo concentration, ellipticity and orientation, and the presence of correlated haloes. We calibrate the parameters of our model at the 10 per cent level to match the empirical cosmic variance of cluster profiles at M 200m ≈ 10 14…10 15h –1M ⊙, z = 0.25…0.5 in a cosmological simulation. We show that weak lensing measurements of clusters significantly underestimate massmore » uncertainties if intrinsic profile variations are ignored, and that our model can be used to provide correct mass likelihoods. Effects on the achievable accuracy of weak lensing cluster mass measurements are particularly strong for the most massive clusters and deep observations (with ≈20 per cent uncertainty from cosmic variance alone at M 200m ≈ 10 15h –1M ⊙ and z = 0.25), but significant also under typical ground-based conditions. We show that neglecting intrinsic profile variations leads to biases in the mass-observable relation constrained with weak lensing, both for intrinsic scatter and overall scale (the latter at the 15 per cent level). Furthermore, these biases are in excess of the statistical errors of upcoming surveys and can be avoided if the cosmic variance of cluster profiles is accounted for.« less

  8. Cosmic variance of the galaxy cluster weak lensing signal

    DOE PAGES

    Gruen, D.; Seitz, S.; Becker, M. R.; ...

    2015-04-13

    Intrinsic variations of the projected density profiles of clusters of galaxies at fixed mass are a source of uncertainty for cluster weak lensing. We present a semi-analytical model to account for this effect, based on a combination of variations in halo concentration, ellipticity and orientation, and the presence of correlated haloes. We calibrate the parameters of our model at the 10 per cent level to match the empirical cosmic variance of cluster profiles at M 200m ≈ 10 14…10 15h –1M ⊙, z = 0.25…0.5 in a cosmological simulation. We show that weak lensing measurements of clusters significantly underestimate massmore » uncertainties if intrinsic profile variations are ignored, and that our model can be used to provide correct mass likelihoods. Effects on the achievable accuracy of weak lensing cluster mass measurements are particularly strong for the most massive clusters and deep observations (with ≈20 per cent uncertainty from cosmic variance alone at M 200m ≈ 10 15h –1M ⊙ and z = 0.25), but significant also under typical ground-based conditions. We show that neglecting intrinsic profile variations leads to biases in the mass-observable relation constrained with weak lensing, both for intrinsic scatter and overall scale (the latter at the 15 per cent level). Furthermore, these biases are in excess of the statistical errors of upcoming surveys and can be avoided if the cosmic variance of cluster profiles is accounted for.« less

  9. Hard X-ray emission from accretion shocks around galaxy clusters

    NASA Astrophysics Data System (ADS)

    Kushnir, Doron; Waxman, Eli

    2010-02-01

    We show that the hard X-ray (HXR) emission observed from several galaxy clusters is consistent with a simple model, in which the nonthermal emission is produced by inverse Compton scattering of cosmic microwave background photons by electrons accelerated in cluster accretion shocks: The dependence of HXR surface brightness on cluster temperature is consistent with that predicted by the model, and the observed HXR luminosity is consistent with the fraction of shock thermal energy deposited in relativistic electrons being lesssim0.1. Alternative models, where the HXR emission is predicted to be correlated with the cluster thermal emission, are disfavored by the data. The implications of our predictions to future HXR observations (e.g. by NuStar, Simbol-X) and to (space/ground based) γ-ray observations (e.g. by Fermi, HESS, MAGIC, VERITAS) are discussed.

  10. The cosmological analysis of X-ray cluster surveys. III. 4D X-ray observable diagrams

    NASA Astrophysics Data System (ADS)

    Pierre, M.; Valotti, A.; Faccioli, L.; Clerc, N.; Gastaud, R.; Koulouridis, E.; Pacaud, F.

    2017-11-01

    Context. Despite compelling theoretical arguments, the use of clusters as cosmological probes is, in practice, frequently questioned because of the many uncertainties surrounding cluster-mass estimates. Aims: Our aim is to develop a fully self-consistent cosmological approach of X-ray cluster surveys, exclusively based on observable quantities rather than masses. This procedure is justified given the possibility to directly derive the cluster properties via ab initio modelling, either analytically or by using hydrodynamical simulations. In this third paper, we evaluate the method on cluster toy-catalogues. Methods: We model the population of detected clusters in the count-rate - hardness-ratio - angular size - redshift space and compare the corresponding four-dimensional diagram with theoretical predictions. The best cosmology+physics parameter configuration is determined using a simple minimisation procedure; errors on the parameters are estimated by averaging the results from ten independent survey realisations. The method allows a simultaneous fit of the cosmological parameters of the cluster evolutionary physics and of the selection effects. Results: When using information from the X-ray survey alone plus redshifts, this approach is shown to be as accurate as the modelling of the mass function for the cosmological parameters and to perform better for the cluster physics, for a similar level of assumptions on the scaling relations. It enables the identification of degenerate combinations of parameter values. Conclusions: Given the considerably shorter computer times involved for running the minimisation procedure in the observed parameter space, this method appears to clearly outperform traditional mass-based approaches when X-ray survey data alone are available.

  11. Automated method to differentiate between native and mirror protein models obtained from contact maps

    PubMed Central

    Kurczynska, Monika

    2018-01-01

    Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters–the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models. PMID:29787567

  12. Numerical study of base pressure characteristic curve for a four-engine clustered nozzle configuration

    NASA Technical Reports Server (NTRS)

    Wang, Ten-See

    1993-01-01

    The objective of this study is to benchmark a four-engine clustered nozzle base flowfield with a computational fluid dynamics (CFD) model. The CFD model is a three-dimensional pressure-based, viscous flow formulation. An adaptive upwind scheme is employed for the spatial discretization. The upwind scheme is based on second and fourth order central differencing with adaptive artificial dissipation. Qualitative base flow features such as the reverse jet, wall jet, recompression shock, and plume-plume impingement have been captured. The computed quantitative flow properties such as the radial base pressure distribution, model centerline Mach number and static pressure variation, and base pressure characteristic curve agreed reasonably well with those of the measurement. Parametric study on the effect of grid resolution, turbulence model, inlet boundary condition and difference scheme on convective terms has been performed. The results showed that grid resolution had a strong influence on the accuracy of the base flowfield prediction.

  13. Enrichment Clusters: A Practical Plan for Real-World, Student-Driven Learning.

    ERIC Educational Resources Information Center

    Renzulli, Joseph S.; Gentry, Marcia; Reis, Sally M.

    This guidebook provides a rationale and guidelines for implementing a student-driven learning approach using enrichment clusters. Enrichment clusters allow students who share a common interest to meet each week to produce a product, performance, or targeted service based on that common interest. Chapter 1 discusses different models of learning.…

  14. Globular Clusters: Absolute Proper Motions and Galactic Orbits

    NASA Astrophysics Data System (ADS)

    Chemel, A. A.; Glushkova, E. V.; Dambis, A. K.; Rastorguev, A. S.; Yalyalieva, L. N.; Klinichev, A. D.

    2018-04-01

    We cross-match objects from several different astronomical catalogs to determine the absolute proper motions of stars within the 30-arcmin radius fields of 115 Milky-Way globular clusters with the accuracy of 1-2 mas yr-1. The proper motions are based on positional data recovered from the USNO-B1, 2MASS, URAT1, ALLWISE, UCAC5, and Gaia DR1 surveys with up to ten positions spanning an epoch difference of up to about 65 years, and reduced to Gaia DR1 TGAS frame using UCAC5 as the reference catalog. Cluster members are photometrically identified by selecting horizontal- and red-giant branch stars on color-magnitude diagrams, and the mean absolute proper motions of the clusters with a typical formal error of about 0.4 mas yr-1 are computed by averaging the proper motions of selected members. The inferred absolute proper motions of clusters are combined with available radial-velocity data and heliocentric distance estimates to compute the cluster orbits in terms of the Galactic potential models based on Miyamoto and Nagai disk, Hernquist spheroid, and modified isothermal dark-matter halo (axisymmetric model without a bar) and the same model + rotating Ferre's bar (non-axisymmetric). Five distant clusters have higher-than-escape velocities, most likely due to large errors of computed transversal velocities, whereas the computed orbits of all other clusters remain bound to the Galaxy. Unlike previously published results, we find the bar to affect substantially the orbits of most of the clusters, even those at large Galactocentric distances, bringing appreciable chaotization, especially in the portions of the orbits close to the Galactic center, and stretching out the orbits of some of the thick-disk clusters.

  15. A stress sensitivity model for the permeability of porous media based on bi-dispersed fractal theory

    NASA Astrophysics Data System (ADS)

    Tan, X.-H.; Liu, C.-Y.; Li, X.-P.; Wang, H.-Q.; Deng, H.

    A stress sensitivity model for the permeability of porous media based on bidispersed fractal theory is established, considering the change of the flow path, the fractal geometry approach and the mechanics of porous media. It is noted that the two fractal parameters of the porous media construction perform differently when the stress changes. The tortuosity fractal dimension of solid cluster DcTσ become bigger with an increase of stress. However, the pore fractal dimension of solid cluster Dcfσ and capillary bundle Dpfσ remains the same with an increase of stress. The definition of normalized permeability is introduced for the analyzation of the impacts of stress sensitivity on permeability. The normalized permeability is related to solid cluster tortuosity dimension, pore fractal dimension, solid cluster maximum diameter, Young’s modulus and Poisson’s ratio. Every parameter has clear physical meaning without the use of empirical constants. Predictions of permeability of the model is accordant with the obtained experimental data. Thus, the proposed model can precisely depict the flow of fluid in porous media under stress.

  16. Spatial cluster detection using dynamic programming.

    PubMed

    Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F

    2012-03-25

    The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.

  17. Spatial cluster detection using dynamic programming

    PubMed Central

    2012-01-01

    Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103

  18. Coupling of Acoustic Cavitation with Dem-Based Particle Solvers for Modeling De-agglomeration of Particle Clusters in Liquid Metals

    NASA Astrophysics Data System (ADS)

    Manoylov, Anton; Lebon, Bruno; Djambazov, Georgi; Pericleous, Koulis

    2017-11-01

    The aerospace and automotive industries are seeking advanced materials with low weight yet high strength and durability. Aluminum and magnesium-based metal matrix composites with ceramic micro- and nano-reinforcements promise the desirable properties. However, larger surface-area-to-volume ratio in micro- and especially nanoparticles gives rise to van der Waals and adhesion forces that cause the particles to agglomerate in clusters. Such clusters lead to adverse effects on final properties, no longer acting as dislocation anchors but instead becoming defects. Also, agglomeration causes the particle distribution to become uneven, leading to inconsistent properties. To break up clusters, ultrasonic processing may be used via an immersed sonotrode, or alternatively via electromagnetic vibration. This paper combines a fundamental study of acoustic cavitation in liquid aluminum with a study of the interaction forces causing particles to agglomerate, as well as mechanisms of cluster breakup. A non-linear acoustic cavitation model utilizing pressure waves produced by an immersed horn is presented, and then applied to cavitation in liquid aluminum. Physical quantities related to fluid flow and quantities specific to the cavitation solver are passed to a discrete element method particles model. The coupled system is then used for a detailed study of clusters' breakup by cavitation.

  19. Towards accurate modelling of galaxy clustering on small scales: testing the standard ΛCDM + halo model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-07-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  20. Cluster based architecture and network maintenance protocol for medical priority aware cognitive radio based hospital.

    PubMed

    Al Mamoon, Ishtiak; Muzahidul Islam, A K M; Baharun, Sabariah; Ahmed, Ashir; Komaki, Shozo

    2016-08-01

    Due to the rapid growth of wireless medical devices in near future, wireless healthcare services may face some inescapable issue such as medical spectrum scarcity, electromagnetic interference (EMI), bandwidth constraint, security and finally medical data communication model. To mitigate these issues, cognitive radio (CR) or opportunistic radio network enabled wireless technology is suitable for the upcoming wireless healthcare system. The up-to-date research on CR based healthcare has exposed some developments on EMI and spectrum problems. However, the investigation recommendation on system design and network model for CR enabled hospital is rare. Thus, this research designs a hierarchy based hybrid network architecture and network maintenance protocols for previously proposed CR hospital system, known as CogMed. In the previous study, the detail architecture of CogMed and its maintenance protocols were not present. The proposed architecture includes clustering concepts for cognitive base stations and non-medical devices. Two cluster head (CH selector equations are formulated based on priority of location, device, mobility rate of devices and number of accessible channels. In order to maintain the integrity of the proposed network model, node joining and node leaving protocols are also proposed. Finally, the simulation results show that the proposed network maintenance time is very low for emergency medical devices (average maintenance period 9.5 ms) and the re-clustering effects for different mobility enabled non-medical devices are also balanced.

  1. Analysis of heterogeneous water vapor uptake by metal iodide cluster ions via differential mobility analysis-mass spectrometry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oberreit, Derek; Fluid Measurement Technologies, Inc., Saint Paul, Minnesota 55110; Rawat, Vivek K.

    The sorption of vapor molecules onto pre-existing nanometer sized clusters is of importance in understanding particle formation and growth in gas phase environments and devising gas phase separation schemes. Here, we apply a differential mobility analyzer-mass spectrometer based approach to observe directly the sorption of vapor molecules onto iodide cluster ions of the form (MI){sub x}M{sup +} (x = 1-13, M = Na, K, Rb, or Cs) in air at 300 K and with water saturation ratios in the 0.01-0.64 range. The extent of vapor sorption is quantified in measurements by the shift in collision cross section (CCS) for eachmore » ion. We find that CCS measurements are sensitive enough to detect the transient binding of several vapor molecules to clusters, which shift CCSs by only several percent. At the same time, for the highest saturation ratios examined, we observed CCS shifts of up to 45%. For x < 4, cesium, rubidium, and potassium iodide cluster ions are found to uptake water to a similar extent, while sodium iodide clusters uptake less water. For x ≥ 4, sodium iodide cluster ions uptake proportionally more water vapor than rubidium and potassium iodide cluster ions, while cesium iodide ions exhibit less uptake. Measured CCS shifts are compared to predictions based upon a Kelvin-Thomson-Raoult (KTR) model as well as a Langmuir adsorption model. We find that the Langmuir adsorption model can be fit well to measurements. Meanwhile, KTR predictions deviate from measurements, which suggests that the earliest stages of vapor uptake by nanometer scale species are not well described by the KTR model.« less

  2. Cancer detection based on Raman spectra super-paramagnetic clustering

    NASA Astrophysics Data System (ADS)

    González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual

    2016-08-01

    The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.

  3. Lane detection based on color probability model and fuzzy clustering

    NASA Astrophysics Data System (ADS)

    Yu, Yang; Jo, Kang-Hyun

    2018-04-01

    In the vehicle driver assistance systems, the accuracy and speed of lane line detection are the most important. This paper is based on color probability model and Fuzzy Local Information C-Means (FLICM) clustering algorithm. The Hough transform and the constraints of structural road are used to detect the lane line accurately. The global map of the lane line is drawn by the lane curve fitting equation. The experimental results show that the algorithm has good robustness.

  4. Topic modeling for cluster analysis of large biological and medical datasets

    PubMed Central

    2014-01-01

    Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106

  5. Topic modeling for cluster analysis of large biological and medical datasets.

    PubMed

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.

  6. How large are the consequences of covariate imbalance in cluster randomized trials: a simulation study with a continuous outcome and a binary covariate at the cluster level.

    PubMed

    Moerbeek, Mirjam; van Schie, Sander

    2016-07-11

    The number of clusters in a cluster randomized trial is often low. It is therefore likely random assignment of clusters to treatment conditions results in covariate imbalance. There are no studies that quantify the consequences of covariate imbalance in cluster randomized trials on parameter and standard error bias and on power to detect treatment effects. The consequences of covariance imbalance in unadjusted and adjusted linear mixed models are investigated by means of a simulation study. The factors in this study are the degree of imbalance, the covariate effect size, the cluster size and the intraclass correlation coefficient. The covariate is binary and measured at the cluster level; the outcome is continuous and measured at the individual level. The results show covariate imbalance results in negligible parameter bias and small standard error bias in adjusted linear mixed models. Ignoring the possibility of covariate imbalance while calculating the sample size at the cluster level may result in a loss in power of at most 25 % in the adjusted linear mixed model. The results are more severe for the unadjusted linear mixed model: parameter biases up to 100 % and standard error biases up to 200 % may be observed. Power levels based on the unadjusted linear mixed model are often too low. The consequences are most severe for large clusters and/or small intraclass correlation coefficients since then the required number of clusters to achieve a desired power level is smallest. The possibility of covariate imbalance should be taken into account while calculating the sample size of a cluster randomized trial. Otherwise more sophisticated methods to randomize clusters to treatments should be used, such as stratification or balance algorithms. All relevant covariates should be carefully identified, be actually measured and included in the statistical model to avoid severe levels of parameter and standard error bias and insufficient power levels.

  7. Copula based flexible modeling of associations between clustered event times.

    PubMed

    Geerdens, Candida; Claeskens, Gerda; Janssen, Paul

    2016-07-01

    Multivariate survival data are characterized by the presence of correlation between event times within the same cluster. First, we build multi-dimensional copulas with flexible and possibly symmetric dependence structures for such data. In particular, clustered right-censored survival data are modeled using mixtures of max-infinitely divisible bivariate copulas. Second, these copulas are fit by a likelihood approach where the vast amount of copula derivatives present in the likelihood is approximated by finite differences. Third, we formulate conditions for clustered right-censored survival data under which an information criterion for model selection is either weakly consistent or consistent. Several of the familiar selection criteria are included. A set of four-dimensional data on time-to-mastitis is used to demonstrate the developed methodology.

  8. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

    PubMed

    Sauzet, Odile; Peacock, Janet L

    2017-07-20

    The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  9. An ``Alternating-Curvature'' Model for the Nanometer-scale Structure of the Nafion Ionomer, Based on Backbone Properties Detected by NMR

    NASA Astrophysics Data System (ADS)

    Schmidt-Rohr, Klaus; Chen, Q.

    2006-03-01

    The perfluorinated ionomer, Nafion, which consists of a (-CF2-)n backbone and charged side branches, is useful as a proton exchange membrane in H2/O2 fuel cells. A modified model of the nanometer-scale structure of hydrated Nafion will be presented. It features hydrated ionic clusters familiar from some previous models, but is based most prominently on pronounced backbone rigidity between branch points and limited orientational correlation of local chain axes. These features have been revealed by solid-state NMR measurements, which take advantage of fast rotations of the backbones around their local axes. The resulting alternating curvature of the backbones towards the hydrated clusters also better satisfies the requirement of dense space filling in solids. Simulations based on this ``alternating curvature'' model reproduce orientational correlation data from NMR, as well as scattering features such as the ionomer peak and the I(q) ˜ 1/q power law at small q values, which can be attributed to modulated cylinders resulting from the chain stiffness. The shortcomings of previous models, including Gierke's cluster model and more recent lamellar or bundle models, in matching all requirements imposed by the experimental data will be discussed.

  10. Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh

    2013-11-30

    Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less

  11. Study of cluster behavior in the riser of CFB by the DSMC method

    NASA Astrophysics Data System (ADS)

    Liu, H. P.; Liu, D. Y.; Liu, H.

    2010-03-01

    The flow behaviors of clusters in the riser of a two-dimensional (2D) circulating fluidized bed was numerically studied based on the Euler-Lagrangian approach. Gas turbulence was modeled by means of Large Eddy Simulation (LES). Particle collision was modeled by means of the direct simulation Monte Carlo (DSMC) method. Clusters' hydrodynamic characteristics are obtained using a cluster identification method proposed by sharrma et al. (2000). The descending clusters near the wall region and the up- and down-flowing clusters in the core were studied separately due to their different flow behaviors. The effects of superficial gas velocity on the cluster behavior were analyzed. Simulated results showed that near wall clusters flow downward and the descent velocity is about -45 cm/s. The occurrence frequency of the up-flowing cluster is higher than that of down-flowing cluster in the core of riser. With the increase of superficial gas velocity, the solid concentration and occurrence frequency of clusters decrease, while the cluster axial velocity increase. Simulated results were in agreement with experimental data. The stochastic method used in present paper is feasible for predicting the cluster flow behavior in CFBs.

  12. Investigating the Consistency of Stellar Evolution Models with Globular Cluster Observations via the Red Giant Branch Bump

    NASA Astrophysics Data System (ADS)

    Joyce, Meridith; Chaboyer, Brian

    2016-01-01

    Synthetic Red Giant Branch Bump (RGBB) magnitudes are generated with the most recent theoretical stellar evolution models computed with the Dartmouth Stellar Evolution Program (DSEP) code. They are compared to the observational work of Nataf et al. (2013), who present RGBB magnitudes for 72 globular clusters. A DSEP model using a chemical composition with enhanced α capture [α/Fe] =+0.4 and an age of 13 Gyr shows agreement with observations over metallicities ranging from [Fe/H] = 0 to [Fe/H] ≈-1.5, with discrepancy emerging at lower metallicities. A model-independent, density-based outlier detection routine known as the Local Outlying Factor (LOF) algorithm is applied to the observations in order to identify clusters that deviate most in magnitude-metallicity space from the bulk of the observations. Our model's fit is scrutinized with a series of χ^2 routines performed on subsets of the data from which highly anomalous clusters have been selectively removed based on LOF identification. In particular, NGCs 6254, 6681, 6218, and 1904 are tagged recurrently as outliers. The effects of systematic and non-systematic error in metallicity are assessed, and the robustness of observational error bars is investigated.

  13. Groundwater source contamination mechanisms: Physicochemical profile clustering, risk factor analysis and multivariate modelling

    NASA Astrophysics Data System (ADS)

    Hynds, Paul; Misstear, Bruce D.; Gill, Laurence W.; Murphy, Heather M.

    2014-04-01

    An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p = 0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p < 0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.

  14. A novel artificial immune algorithm for spatial clustering with obstacle constraint and its applications.

    PubMed

    Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji

    2014-01-01

    An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.

  15. A hierarchical clustering methodology for the estimation of toxicity.

    PubMed

    Martin, Todd M; Harten, Paul; Venkatapathy, Raghuraman; Das, Shashikala; Young, Douglas M

    2008-01-01

    ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.

  16. Core-halo age gradients and star formation in the Orion Nebula and NGS 2024 young stellar clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Getman, Konstantin V.; Feigelson, Eric D.; Kuhn, Michael A.

    2014-06-01

    We analyze age distributions of two nearby rich stellar clusters, the NGC 2024 (Flame Nebula) and Orion Nebula cluster (ONC) in the Orion molecular cloud complex. Our analysis is based on samples from the MYStIX survey and a new estimator of pre-main sequence (PMS) stellar ages, Age{sub JX} , derived from X-ray and near-infrared photometric data. To overcome the problem of uncertain individual ages and large spreads of age distributions for entire clusters, we compute median ages and their confidence intervals of stellar samples within annular subregions of the clusters. We find core-halo age gradients in both the NGC 2024more » cluster and ONC: PMS stars in cluster cores appear younger and thus were formed later than PMS stars in cluster peripheries. These findings are further supported by the spatial gradients in the disk fraction and K-band excess frequency. Our age analysis is based on Age{sub JX} estimates for PMS stars and is independent of any consideration of OB stars. The result has important implications for the formation of young stellar clusters. One basic implication is that clusters form slowly and the apparent age spreads in young stellar clusters, which are often controversial, are (at least in part) real. The result further implies that simple models where clusters form inside-out are incorrect and more complex models are needed. We provide several star formation scenarios that alone or in combination may lead to the observed core-halo age gradients.« less

  17. Cosmological constraints from strong gravitational lensing in clusters of galaxies.

    PubMed

    Jullo, Eric; Natarajan, Priyamvada; Kneib, Jean-Paul; D'Aloisio, Anson; Limousin, Marceau; Richard, Johan; Schimd, Carlo

    2010-08-20

    Current efforts in observational cosmology are focused on characterizing the mass-energy content of the universe. We present results from a geometric test based on strong lensing in galaxy clusters. Based on Hubble Space Telescope images and extensive ground-based spectroscopic follow-up of the massive galaxy cluster Abell 1689, we used a parametric model to simultaneously constrain the cluster mass distribution and dark energy equation of state. Combining our cosmological constraints with those from x-ray clusters and the Wilkinson Microwave Anisotropy Probe 5-year data gives Omega(m) = 0.25 +/- 0.05 and w(x) = -0.97 +/- 0.07, which are consistent with results from other methods. Inclusion of our method with all other available techniques brings down the current 2sigma contours on the dark energy equation-of-state parameter w(x) by approximately 30%.

  18. On the applicability of one- and many-electron quantum chemistry models for hydrated electron clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Turi, László, E-mail: turi@chem.elte.hu

    2016-04-21

    We evaluate the applicability of a hierarchy of quantum models in characterizing the binding energy of excess electrons to water clusters. In particular, we calculate the vertical detachment energy of an excess electron from water cluster anions with methods that include one-electron pseudopotential calculations, density functional theory (DFT) based calculations, and ab initio quantum chemistry using MP2 and eom-EA-CCSD levels of theory. The examined clusters range from the smallest cluster size (n = 2) up to nearly nanosize clusters with n = 1000 molecules. The examined cluster configurations are extracted from mixed quantum-classical molecular dynamics trajectories of cluster anions withmore » n = 1000 water molecules using two different one-electron pseudopotenial models. We find that while MP2 calculations with large diffuse basis set provide a reasonable description for the hydrated electron system, DFT methods should be used with precaution and only after careful benchmarking. Strictly tested one-electron psudopotentials can still be considered as reasonable alternatives to DFT methods, especially in large systems. The results of quantum chemistry calculations performed on configurations, that represent possible excess electron binding motifs in the clusters, appear to be consistent with the results using a cavity structure preferring one-electron pseudopotential for the hydrated electron, while they are in sharp disagreement with the structural predictions of a non-cavity model.« less

  19. On the applicability of one- and many-electron quantum chemistry models for hydrated electron clusters

    NASA Astrophysics Data System (ADS)

    Turi, László

    2016-04-01

    We evaluate the applicability of a hierarchy of quantum models in characterizing the binding energy of excess electrons to water clusters. In particular, we calculate the vertical detachment energy of an excess electron from water cluster anions with methods that include one-electron pseudopotential calculations, density functional theory (DFT) based calculations, and ab initio quantum chemistry using MP2 and eom-EA-CCSD levels of theory. The examined clusters range from the smallest cluster size (n = 2) up to nearly nanosize clusters with n = 1000 molecules. The examined cluster configurations are extracted from mixed quantum-classical molecular dynamics trajectories of cluster anions with n = 1000 water molecules using two different one-electron pseudopotenial models. We find that while MP2 calculations with large diffuse basis set provide a reasonable description for the hydrated electron system, DFT methods should be used with precaution and only after careful benchmarking. Strictly tested one-electron psudopotentials can still be considered as reasonable alternatives to DFT methods, especially in large systems. The results of quantum chemistry calculations performed on configurations, that represent possible excess electron binding motifs in the clusters, appear to be consistent with the results using a cavity structure preferring one-electron pseudopotential for the hydrated electron, while they are in sharp disagreement with the structural predictions of a non-cavity model.

  20. Numerical analysis of base flowfield at high altitude for a four-engine clustered nozzle configuration

    NASA Technical Reports Server (NTRS)

    Wang, Ten-See

    1993-01-01

    The objective of this study is to benchmark a four-engine clustered nozzle base flowfield with a computational fluid dynamics (CFD) model. The CFD model is a pressure based, viscous flow formulation. An adaptive upwind scheme is employed for the spatial discretization. The upwind scheme is based on second and fourth order central differencing with adaptive artificial dissipation. Qualitative base flow features such as the reverse jet, wall jet, recompression shock, and plume-plume impingement have been captured. The computed quantitative flow properties such as the radial base pressure distribution, model centerline Mach number and static pressure variation, and base pressure characteristic curve agreed reasonably well with those of the measurement. Parametric study on the effect of grid resolution, turbulence model, inlet boundary condition and difference scheme on convective terms has been performed. The results showed that grid resolution and turbulence model are two primary factors that influence the accuracy of the base flowfield prediction.

  1. Running and rotating: modelling the dynamics of migrating cell clusters

    NASA Astrophysics Data System (ADS)

    Copenhagen, Katherine; Gov, Nir; Gopinathan, Ajay

    Collective motion of cells is a common occurrence in many biological systems, including tissue development and repair, and tumor formation. Recent experiments have shown cells form clusters in a chemical gradient, which display three different phases of motion: translational, rotational, and random. We present a model for cell clusters based loosely on other models seen in the literature that involves a Vicsek-like alignment as well as physical collisions and adhesions between cells. With this model we show that a mechanism for driving rotational motion in this kind of system is an increased motility of rim cells. Further, we examine the details of the relationship between rim and core cells, and find that the phases of the cluster as a whole are correlated with the creation and annihilation of topological defects in the tangential component of the velocity field.

  2. Clustering promotes switching dynamics in networks of noisy neurons

    NASA Astrophysics Data System (ADS)

    Franović, Igor; Klinshov, Vladimir

    2018-02-01

    Macroscopic variability is an emergent property of neural networks, typically manifested in spontaneous switching between the episodes of elevated neuronal activity and the quiescent episodes. We investigate the conditions that facilitate switching dynamics, focusing on the interplay between the different sources of noise and heterogeneity of the network topology. We consider clustered networks of rate-based neurons subjected to external and intrinsic noise and derive an effective model where the network dynamics is described by a set of coupled second-order stochastic mean-field systems representing each of the clusters. The model provides an insight into the different contributions to effective macroscopic noise and qualitatively indicates the parameter domains where switching dynamics may occur. By analyzing the mean-field model in the thermodynamic limit, we demonstrate that clustering promotes multistability, which gives rise to switching dynamics in a considerably wider parameter region compared to the case of a non-clustered network with sparse random connection topology.

  3. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    PubMed Central

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

    2016-01-01

    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  4. Energy Efficient Cluster Based Scheduling Scheme for Wireless Sensor Networks

    PubMed Central

    Srie Vidhya Janani, E.; Ganesh Kumar, P.

    2015-01-01

    The energy utilization of sensor nodes in large scale wireless sensor network points out the crucial need for scalable and energy efficient clustering protocols. Since sensor nodes usually operate on batteries, the maximum utility of network is greatly dependent on ideal usage of energy leftover in these sensor nodes. In this paper, we propose an Energy Efficient Cluster Based Scheduling Scheme for wireless sensor networks that balances the sensor network lifetime and energy efficiency. In the first phase of our proposed scheme, cluster topology is discovered and cluster head is chosen based on remaining energy level. The cluster head monitors the network energy threshold value to identify the energy drain rate of all its cluster members. In the second phase, scheduling algorithm is presented to allocate time slots to cluster member data packets. Here congestion occurrence is totally avoided. In the third phase, energy consumption model is proposed to maintain maximum residual energy level across the network. Moreover, we also propose a new packet format which is given to all cluster member nodes. The simulation results prove that the proposed scheme greatly contributes to maximum network lifetime, high energy, reduced overhead, and maximum delivery ratio. PMID:26495417

  5. Energy spectra of vibron and cluster models in molecular and nuclear systems

    NASA Astrophysics Data System (ADS)

    Jalili Majarshin, A.; Sabri, H.; Jafarizadeh, M. A.

    2018-03-01

    The relation of the algebraic cluster model, i.e., of the vibron model and its extension, to the collective structure, is discussed. In the first section of the paper, we study the energy spectra of vibron model, for diatomic molecule then we derive the rotation-vibration spectrum of 2α, 3α and 4α configuration in the low-lying spectrum of 8Be, 12C and 16O nuclei. All vibrational and rotational states with ground and excited A, E and F states appear to have been observed, moreover the transitional descriptions of the vibron model and α-cluster model were considered by using an infinite-dimensional algebraic method based on the affine \\widehat{SU(1,1)} Lie algebra. The calculated energy spectra are compared with experimental data. Applications to the rotation-vibration spectrum for the diatomic molecule and many-body nuclear clusters indicate that there are solvable models and they can be approximated very well using the transitional theory.

  6. Cluster-based adaptive power control protocol using Hidden Markov Model for Wireless Sensor Networks

    NASA Astrophysics Data System (ADS)

    Vinutha, C. B.; Nalini, N.; Nagaraja, M.

    2017-06-01

    This paper presents strategies for an efficient and dynamic transmission power control technique, in order to reduce packet drop and hence energy consumption of power-hungry sensor nodes operated in highly non-linear channel conditions of Wireless Sensor Networks. Besides, we also focus to prolong network lifetime and scalability by designing cluster-based network structure. Specifically we consider weight-based clustering approach wherein, minimum significant node is chosen as Cluster Head (CH) which is computed stemmed from the factors distance, remaining residual battery power and received signal strength (RSS). Further, transmission power control schemes to fit into dynamic channel conditions are meticulously implemented using Hidden Markov Model (HMM) where probability transition matrix is formulated based on the observed RSS measurements. Typically, CH estimates initial transmission power of its cluster members (CMs) from RSS using HMM and broadcast this value to its CMs for initialising their power value. Further, if CH finds that there are variations in link quality and RSS of the CMs, it again re-computes and optimises the transmission power level of the nodes using HMM to avoid packet loss due noise interference. We have demonstrated our simulation results to prove that our technique efficiently controls the power levels of sensing nodes to save significant quantity of energy for different sized network.

  7. Phylogenetic relationships of chrysanthemums in Korea based on novel SSR markers.

    PubMed

    Khaing, A A; Moe, K T; Hong, W J; Park, C S; Yeon, K H; Park, H S; Kim, D C; Choi, B J; Jung, J Y; Chae, S C; Lee, K M; Park, Y J

    2013-11-07

    Chrysanthemums are well known for their esthetic and medicinal values. Characterization of chrysanthemums is vital for their conservation and management as well as for understanding their genetic relationships. We found 12 simple sequence repeat markers (SSRs) of 100 designed primers to be polymorphic. These novel SSR markers were used to evaluate 95 accessions of chrysanthemums (3 indigenous and 92 cultivated accessions). Two hundred alleles were identified, with an average of 16.7 alleles per locus. KNUCRY-77 gave the highest polymorphic information content value (0.879), while KNUCRY-10 gave the lowest (0.218). Similar patterns of grouping were observed with a distance-based dendrogram developed using PowerMarker and model-based clustering with Structure. Three clusters with some admixtures were identified by model-based clustering. These newly developed SSR markers will be useful for further studies of chrysanthemums, such as taxonomy and marker-assisted selection breeding.

  8. MASSCLEANage—Stellar Cluster Ages from Integrated Colors

    NASA Astrophysics Data System (ADS)

    Popescu, Bogdan; Hanson, M. M.

    2010-11-01

    We present the recently updated and expanded MASSCLEANcolors, a database of 70 million Monte Carlo models selected to match the properties (metallicity, ages, and masses) of stellar clusters found in the Large Magellanic Cloud (LMC). This database shows the rather extreme and non-Gaussian distribution of integrated colors and magnitudes expected with different cluster age and mass and the enormous age degeneracy of integrated colors when mass is unknown. This degeneracy could lead to catastrophic failures in estimating age with standard simple stellar population models, particularly if most of the clusters are of intermediate or low mass, like in the LMC. Utilizing the MASSCLEANcolors database, we have developed MASSCLEANage, a statistical inference package which assigns the most likely age and mass (solved simultaneously) to a cluster based only on its integrated broadband photometric properties. Finally, we use MASSCLEANage to derive the age and mass of LMC clusters based on integrated photometry alone. First, we compare our cluster ages against those obtained for the same seven clusters using more accurate integrated spectroscopy. We find improved agreement with the integrated spectroscopy ages over the original photometric ages. A close examination of our results demonstrates the necessity of solving simultaneously for mass and age to reduce degeneracies in the cluster ages derived via integrated colors. We then selected an additional subset of 30 photometric clusters with previously well-constrained ages and independently derive their age using the MASSCLEANage with the same photometry with very good agreement. The MASSCLEANage program is freely available under GNU General Public License.

  9. Projections of Temperature-Attributable Premature Deaths in 209 U.S. Cities Using a Cluster-Based Poisson Approach

    NASA Technical Reports Server (NTRS)

    Schwartz, Joel D.; Lee, Mihye; Kinney, Patrick L.; Yang, Suijia; Mills, David; Sarofim, Marcus C.; Jones, Russell; Streeter, Richard; St. Juliana, Alexis; Peers, Jennifer; hide

    2015-01-01

    Background: A warming climate will affect future temperature-attributable premature deaths. This analysis is the first to project these deaths at a near national scale for the United States using city and month-specific temperature-mortality relationships. Methods: We used Poisson regressions to model temperature-attributable premature mortality as a function of daily average temperature in 209 U.S. cities by month. We used climate data to group cities into clusters and applied an Empirical Bayes adjustment to improve model stability and calculate cluster-based month-specific temperature-mortality functions. Using data from two climate models, we calculated future daily average temperatures in each city under Representative Concentration Pathway 6.0. Holding population constant at 2010 levels, we combined the temperature data and cluster-based temperature-mortality functions to project city-specific temperature-attributable premature deaths for multiple future years which correspond to a single reporting year. Results within the reporting periods are then averaged to account for potential climate variability and reported as a change from a 1990 baseline in the future reporting years of 2030, 2050 and 2100. Results: We found temperature-mortality relationships that vary by location and time of year. In general, the largest mortality response during hotter months (April - September) was in July in cities with cooler average conditions. The largest mortality response during colder months (October-March) was at the beginning (October) and end (March) of the period. Using data from two global climate models, we projected a net increase in premature deaths, aggregated across all 209 cities, in all future periods compared to 1990. However, the magnitude and sign of the change varied by cluster and city. Conclusions: We found increasing future premature deaths across the 209 modeled U.S. cities using two climate model projections, based on constant temperature-mortality relationships from 1997 to 2006 without any future adaptation. However, results varied by location, with some locations showing net reductions in premature temperature-attributable deaths with climate change.

  10. Semiempirical limits on the thermal conductivity of intracluster gas

    NASA Technical Reports Server (NTRS)

    David, Laurence P.; Hughes, John P.; Tucker, Wallace H.

    1992-01-01

    A semiempirical method for establishing lower limits on the thermal conductivity of hot gas in clusters of galaxies is described. The method is based on the observation that the X-ray imaging data (e.g., Einstein IPC) for clusters are well described by the hydrostatic-isothermal beta model, even for cooling flow clusters beyond about one core radius. In addition, there are strong indications that noncooling flow clusters (like the Coma Cluster) have a large central region (up to several core radii) of nearly constant gas temperature. This suggests that thermal conduction is an effective means of transporting and redistributing the thermal energy of the gas. This in turn has implications for the extent to which magnetic fields in the cluster are effective in reducing the thermal conductivity of the gas. Time-dependent hydrodynamic simulations for the gas in the Coma Cluster under two separate evolutionary scenarios are presented. One scenario assumes that the cluster potential is static and that the gas has an initial adiabatic distribution. The second scenario uses an evolving cluster potential. These models along with analytic results show that the thermal conductivity of the gas in the Coma Cluster cannot be less than 0.1 of full Spitzer conductivity. These models also show that high gas conductivity assists rather than hinders the development of radiative cooling in the central regions of clusters.

  11. Research on potential user identification model for electric energy substitution

    NASA Astrophysics Data System (ADS)

    Xia, Huaijian; Chen, Meiling; Lin, Haiying; Yang, Shuo; Miao, Bo; Zhu, Xinzhi

    2018-01-01

    The implementation of energy substitution plays an important role in promoting the development of energy conservation and emission reduction in china. Energy service management platform of alternative energy users based on the data in the enterprise production value, product output, coal and other energy consumption as a potential evaluation index, using principal component analysis model to simplify the formation of characteristic index, comprehensive index contains the original variables, and using fuzzy clustering model for the same industry user’s flexible classification. The comprehensive index number and user clustering classification based on constructed particle optimization neural network classification model based on the user, user can replace electric potential prediction. The results of an example show that the model can effectively predict the potential of users’ energy potential.

  12. Assessment of economic status in trauma registries: A new algorithm for generating population-specific clustering-based models of economic status for time-constrained low-resource settings.

    PubMed

    Eyler, Lauren; Hubbard, Alan; Juillard, Catherine

    2016-10-01

    Low and middle-income countries (LMICs) and the world's poor bear a disproportionate share of the global burden of injury. Data regarding disparities in injury are vital to inform injury prevention and trauma systems strengthening interventions targeted towards vulnerable populations, but are limited in LMICs. We aim to facilitate injury disparities research by generating a standardized methodology for assessing economic status in resource-limited country trauma registries where complex metrics such as income, expenditures, and wealth index are infeasible to assess. To address this need, we developed a cluster analysis-based algorithm for generating simple population-specific metrics of economic status using nationally representative Demographic and Health Surveys (DHS) household assets data. For a limited number of variables, g, our algorithm performs weighted k-medoids clustering of the population using all combinations of g asset variables and selects the combination of variables and number of clusters that maximize average silhouette width (ASW). In simulated datasets containing both randomly distributed variables and "true" population clusters defined by correlated categorical variables, the algorithm selected the correct variable combination and appropriate cluster numbers unless variable correlation was very weak. When used with 2011 Cameroonian DHS data, our algorithm identified twenty economic clusters with ASW 0.80, indicating well-defined population clusters. This economic model for assessing health disparities will be used in the new Cameroonian six-hospital centralized trauma registry. By describing our standardized methodology and algorithm for generating economic clustering models, we aim to facilitate measurement of health disparities in other trauma registries in resource-limited countries. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  13. Mindfulness-Based Stress Reduction in Post-treatment Breast Cancer Patients: Immediate and Sustained Effects Across Multiple Symptom Clusters.

    PubMed

    Reich, Richard R; Lengacher, Cecile A; Alinat, Carissa B; Kip, Kevin E; Paterson, Carly; Ramesar, Sophia; Han, Heather S; Ismail-Khan, Roohi; Johnson-Mallard, Versie; Moscoso, Manolete; Budhrani-Shani, Pinky; Shivers, Steve; Cox, Charles E; Goodman, Matthew; Park, Jong

    2017-01-01

    Breast cancer survivors (BCS) face adverse physical and psychological symptoms, often co-occurring. Biologic and psychological factors may link symptoms within clusters, distinguishable by prevalence and/or severity. Few studies have examined the effects of behavioral interventions or treatment of symptom clusters. The aim of this study was to identify symptom clusters among post-treatment BCS and determine symptom cluster improvement following the Mindfulness-Based Stress Reduction for Breast Cancer (MBSR(BC)) program. Three hundred twenty-two Stage 0-III post-treatment BCS were randomly assigned to either a six-week MBSR(BC) program or usual care. Psychological (depression, anxiety, stress, and fear of recurrence), physical (fatigue, pain, sleep, and drowsiness), and cognitive symptoms and quality of life were assessed at baseline, six, and 12 weeks, along with demographic and clinical history data at baseline. A three-step analytic process included the error-accounting models of factor analysis and structural equation modeling. Four symptom clusters emerged at baseline: pain, psychological, fatigue, and cognitive. From baseline to six weeks, the model demonstrated evidence of MBSR(BC) effectiveness in both the psychological (anxiety, depression, perceived stress and QOL, emotional well-being) (P = 0.007) and fatigue (fatigue, sleep, and drowsiness) (P < 0.001) clusters. Results between six and 12 weeks showed sustained effects, but further improvement was not observed. Our results provide clinical effectiveness evidence that MBSR(BC) works to improve symptom clusters, particularly for psychological and fatigue symptom clusters, with the greatest improvement occurring during the six-week program with sustained effects for several weeks after MBSR(BC) training. Name and URL of Registry: ClinicalTrials.gov. Registration number: NCT01177124. Copyright © 2016. Published by Elsevier Inc.

  14. Carbon Fibers Conductivity Studies

    NASA Technical Reports Server (NTRS)

    Yang, C. Y.; Butkus, A. M.

    1980-01-01

    In an attempt to understand the process of electrical conduction in polyacrylonitrile (PAN)-based carbon fibers, calculations were carried out on cluster models of the fiber consisting of carbon, nitrogen, and hydrogen atoms using the modified intermediate neglect of differential overlap (MINDO) molecular orbital (MO) method. The models were developed based on the assumption that PAN carbon fibers obtained with heat treatment temperatures (HTT) below 1000 C retain nitrogen in a graphite-like lattice. For clusters modeling an edge nitrogen site, analysis of the occupied MO's indicated an electron distribution similar to that of graphite. A similar analysis for the somewhat less stable interior nitrogen site revealed a partially localized II electron distribution around the nitrogen atom. The differences in bonding trends and structural stability between edge and interior nitrogen clusters led to a two-step process proposed for nitrogen evolution with increasing HTT.

  15. Long memory and volatility clustering: Is the empirical evidence consistent across stock markets?

    NASA Astrophysics Data System (ADS)

    Bentes, Sónia R.; Menezes, Rui; Mendes, Diana A.

    2008-06-01

    Long memory and volatility clustering are two stylized facts frequently related to financial markets. Traditionally, these phenomena have been studied based on conditionally heteroscedastic models like ARCH, GARCH, IGARCH and FIGARCH, inter alia. One advantage of these models is their ability to capture nonlinear dynamics. Another interesting manner to study the volatility phenomenon is by using measures based on the concept of entropy. In this paper we investigate the long memory and volatility clustering for the SP 500, NASDAQ 100 and Stoxx 50 indexes in order to compare the US and European Markets. Additionally, we compare the results from conditionally heteroscedastic models with those from the entropy measures. In the latter, we examine Shannon entropy, Renyi entropy and Tsallis entropy. The results corroborate the previous evidence of nonlinear dynamics in the time series considered.

  16. A self-contamination model for the formation of globular star clusters

    NASA Astrophysics Data System (ADS)

    Brown, James Howard

    Described here is a model of globular cluster formation which allows the self contamination of the cluster by an earlier generation of massive stars. It is first shown that such self-contamination naturally produces an Fe/H in the range from -2.5 to -1.0, precisely the same range observed in the metal poor (halo) globular clusters; this also seems to require that the disk clusters started with a substantial initial metallicity. To minimize the problem of creating homogeneous globular clusters, the second (currently observed) generation of stars is assumed to form in the expanding supershell around the first generation stars. Both numerical and analytic models are used to address this problem. The most important result of this investigation was that the late evolution of the supershell is the most important, and that this phase of the evolution is dominated by the external medium in which the cloud is embedded. This result and the requirement that only the most tightly bound systems may become globular clusters lead to the conclusion that a globular cluster with the mass and binding energy typically observed can be formed at star formation efficiences as low as 10-20 percent. Furthermore, self contamination requires that the typical Fe/H of a bound system be about -1.6, independent of the free parameters of the model, allowing the clusters and field stars to form with different metallicity distributions in spite of their forming at the same time. Since the formation of globular clusters in this model is tied to the external pressure, the halo globular cluster masses and distribution can be used as probes of the early galactic structure. In particular, this model requires an increase in the typical globular cluster mass as one moves out from the galactic center; the masses of the halo clusters are examined, and they show considerable evidence for such a gradient. Based on a pressure distribution derived from this data, the effect of the galactic tidal field on the model is also investigated using an N-body simulation.

  17. Analyzing gene expression time-courses based on multi-resolution shape mixture model.

    PubMed

    Li, Ying; He, Ye; Zhang, Yu

    2016-11-01

    Biological processes actually are a dynamic molecular process over time. Time course gene expression experiments provide opportunities to explore patterns of gene expression change over a time and understand the dynamic behavior of gene expression, which is crucial for study on development and progression of biology and disease. Analysis of the gene expression time-course profiles has not been fully exploited so far. It is still a challenge problem. We propose a novel shape-based mixture model clustering method for gene expression time-course profiles to explore the significant gene groups. Based on multi-resolution fractal features and mixture clustering model, we proposed a multi-resolution shape mixture model algorithm. Multi-resolution fractal features is computed by wavelet decomposition, which explore patterns of change over time of gene expression at different resolution. Our proposed multi-resolution shape mixture model algorithm is a probabilistic framework which offers a more natural and robust way of clustering time-course gene expression. We assessed the performance of our proposed algorithm using yeast time-course gene expression profiles compared with several popular clustering methods for gene expression profiles. The grouped genes identified by different methods are evaluated by enrichment analysis of biological pathways and known protein-protein interactions from experiment evidence. The grouped genes identified by our proposed algorithm have more strong biological significance. A novel multi-resolution shape mixture model algorithm based on multi-resolution fractal features is proposed. Our proposed model provides a novel horizons and an alternative tool for visualization and analysis of time-course gene expression profiles. The R and Matlab program is available upon the request. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  19. The Massive Star Content of Circumnuclear Star Clusters in M83

    NASA Astrophysics Data System (ADS)

    Wofford, A.; Chandar, R.; Leitherer, C.

    2011-06-01

    The circumnuclear starburst of M83 (NGC 5236), the nearest such example (4.6 Mpc), constitutes an ideal site for studying the massive star IMF at high metallicity (12+log[O/H]=9.1±0.2, Bresolin & Kennicutt 2002). We analyzed archival HST/STIS FUV imaging and spectroscopy of 13 circumnuclear star clusters in M83. We compared the observed spectra with two types of single stellar population (SSP) models; semi-empirical models, which are based on an empirical library of Galactic O and B stars observed with IUE (Robert et al. 1993), and theoretical models, which are based on a new theoretical UV library of hot massive stars described in Leitherer et al. (2010) and computed with WM-Basic (Pauldrach et al. 2001). The models were generated with Starburst99 (Leitherer & Chen 2009). We derived the reddenings, the ages, and the masses of the clusters from model fits to the FUV spectroscopy, as well as from optical HST/WFC3 photometry.

  20. Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    PubMed Central

    Hallac, David; Vare, Sagar; Boyd, Stephen; Leskovec, Jure

    2018-01-01

    Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios. PMID:29770257

  1. The Gaia-ESO Survey: open clusters in Gaia-DR1 . A way forward to stellar age calibration

    NASA Astrophysics Data System (ADS)

    Randich, S.; Tognelli, E.; Jackson, R.; Jeffries, R. D.; Degl'Innocenti, S.; Pancino, E.; Re Fiorentin, P.; Spagna, A.; Sacco, G.; Bragaglia, A.; Magrini, L.; Prada Moroni, P. G.; Alfaro, E.; Franciosini, E.; Morbidelli, L.; Roccatagliata, V.; Bouy, H.; Bravi, L.; Jiménez-Esteban, F. M.; Jordi, C.; Zari, E.; Tautvaišiene, G.; Drazdauskas, A.; Mikolaitis, S.; Gilmore, G.; Feltzing, S.; Vallenari, A.; Bensby, T.; Koposov, S.; Korn, A.; Lanzafame, A.; Smiljanic, R.; Bayo, A.; Carraro, G.; Costado, M. T.; Heiter, U.; Hourihane, A.; Jofré, P.; Lewis, J.; Monaco, L.; Prisinzano, L.; Sbordone, L.; Sousa, S. G.; Worley, C. C.; Zaggia, S.

    2018-05-01

    Context. Determination and calibration of the ages of stars, which heavily rely on stellar evolutionary models, are very challenging, while representing a crucial aspect in many astrophysical areas. Aims: We describe the methodologies that, taking advantage of Gaia-DR1 and the Gaia-ESO Survey data, enable the comparison of observed open star cluster sequences with stellar evolutionary models. The final, long-term goal is the exploitation of open clusters as age calibrators. Methods: We perform a homogeneous analysis of eight open clusters using the Gaia-DR1 TGAS catalogue for bright members and information from the Gaia-ESO Survey for fainter stars. Cluster membership probabilities for the Gaia-ESO Survey targets are derived based on several spectroscopic tracers. The Gaia-ESO Survey also provides the cluster chemical composition. We obtain cluster parallaxes using two methods. The first one relies on the astrometric selection of a sample of bona fide members, while the other one fits the parallax distribution of a larger sample of TGAS sources. Ages and reddening values are recovered through a Bayesian analysis using the 2MASS magnitudes and three sets of standard models. Lithium depletion boundary (LDB) ages are also determined using literature observations and the same models employed for the Bayesian analysis. Results: For all but one cluster, parallaxes derived by us agree with those presented in Gaia Collaboration (2017, A&A, 601, A19), while a discrepancy is found for NGC 2516; we provide evidence supporting our own determination. Inferred cluster ages are robust against models and are generally consistent with literature values. Conclusions: The systematic parallax errors inherent in the Gaia DR1 data presently limit the precision of our results. Nevertheless, we have been able to place these eight clusters onto the same age scale for the first time, with good agreement between isochronal and LDB ages where there is overlap. Our approach appears promising and demonstrates the potential of combining Gaia and ground-based spectroscopic datasets. Based on observations collected with the FLAMES instrument at VLT/UT2 telescope (Paranal Observatory, ESO, Chile), for the Gaia-ESO Large Public Spectroscopic Survey (188.B-3002, 193.B-0936).Additional tables are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/612/A99

  2. Emergence of clustering in an acquaintance model without homophily

    NASA Astrophysics Data System (ADS)

    Bhat, Uttam; Krapivsky, P. L.; Redner, S.

    2014-11-01

    We introduce an agent-based acquaintance model in which social links are created by processes in which there is no explicit homophily. In spite of the homogeneous nature of the social interactions, highly-clustered social networks can arise. The crucial feature of our model is that of variable transitive interactions. Namely, when an agent introduces two unconnected friends, the rate at which a connection actually occurs between them depends on the number of their mutual acquaintances. As this transitive interaction rate is varied, the social network undergoes a dramatic clustering transition. Close to the transition, the network consists of a collection of well-defined communities. As a function of time, the network can also undergo an incomplete gelation transition, in which the gel, or giant cluster, does not constitute the entire network, even at infinite time. Some of the clustering properties of our model also arise, but in a more gradual manner, in Facebook networks. Finally, we discuss a more realistic variant of our original model in which network realizations can be constructed that quantitatively match Facebook networks.

  3. Functionalizing graphene by embedded boron clusters

    NASA Astrophysics Data System (ADS)

    Quandt, Alexander; Özdoğan, Cem; Kunstmann, Jens; Fehske, Holger

    2008-08-01

    We present a model system that might serve as a blueprint for the controlled layout of graphene based nanodevices. The systems consists of chains of B7 clusters implanted in a graphene matrix, where the boron clusters are not directly connected. We show that the graphene matrix easily accepts these alternating B7-C6 chains and that the implanted boron components may dramatically modify the electronic properties of graphene based nanomaterials. This suggests a functionalization of graphene nanomaterials, where the semiconducting properties might be supplemented by parts of the graphene matrix itself, but the basic wiring will be provided by alternating chains of implanted boron clusters that connect these areas.

  4. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    PubMed

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  5. Accelerating Information Retrieval from Profile Hidden Markov Model Databases.

    PubMed

    Tamimi, Ahmad; Ashhab, Yaqoub; Tamimi, Hashem

    2016-01-01

    Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.

  6. Hybrid clustering based fuzzy structure for vibration control - Part 1: A novel algorithm for building neuro-fuzzy system

    NASA Astrophysics Data System (ADS)

    Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok

    2015-01-01

    This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.

  7. Multilevel models for cost-effectiveness analyses that use cluster randomised trial data: An approach to model choice.

    PubMed

    Ng, Edmond S-W; Diaz-Ordaz, Karla; Grieve, Richard; Nixon, Richard M; Thompson, Simon G; Carpenter, James R

    2016-10-01

    Multilevel models provide a flexible modelling framework for cost-effectiveness analyses that use cluster randomised trial data. However, there is a lack of guidance on how to choose the most appropriate multilevel models. This paper illustrates an approach for deciding what level of model complexity is warranted; in particular how best to accommodate complex variance-covariance structures, right-skewed costs and missing data. Our proposed models differ according to whether or not they allow individual-level variances and correlations to differ across treatment arms or clusters and by the assumed cost distribution (Normal, Gamma, Inverse Gaussian). The models are fitted by Markov chain Monte Carlo methods. Our approach to model choice is based on four main criteria: the characteristics of the data, model pre-specification informed by the previous literature, diagnostic plots and assessment of model appropriateness. This is illustrated by re-analysing a previous cost-effectiveness analysis that uses data from a cluster randomised trial. We find that the most useful criterion for model choice was the deviance information criterion, which distinguishes amongst models with alternative variance-covariance structures, as well as between those with different cost distributions. This strategy for model choice can help cost-effectiveness analyses provide reliable inferences for policy-making when using cluster trials, including those with missing data. © The Author(s) 2013.

  8. Managing distance and covariate information with point-based clustering.

    PubMed

    Whigham, Peter A; de Graaf, Brandon; Srivastava, Rashmi; Glue, Paul

    2016-09-01

    Geographic perspectives of disease and the human condition often involve point-based observations and questions of clustering or dispersion within a spatial context. These problems involve a finite set of point observations and are constrained by a larger, but finite, set of locations where the observations could occur. Developing a rigorous method for pattern analysis in this context requires handling spatial covariates, a method for constrained finite spatial clustering, and addressing bias in geographic distance measures. An approach, based on Ripley's K and applied to the problem of clustering with deliberate self-harm (DSH), is presented. Point-based Monte-Carlo simulation of Ripley's K, accounting for socio-economic deprivation and sources of distance measurement bias, was developed to estimate clustering of DSH at a range of spatial scales. A rotated Minkowski L1 distance metric allowed variation in physical distance and clustering to be assessed. Self-harm data was derived from an audit of 2 years' emergency hospital presentations (n = 136) in a New Zealand town (population ~50,000). Study area was defined by residential (housing) land parcels representing a finite set of possible point addresses. Area-based deprivation was spatially correlated. Accounting for deprivation and distance bias showed evidence for clustering of DSH for spatial scales up to 500 m with a one-sided 95 % CI, suggesting that social contagion may be present for this urban cohort. Many problems involve finite locations in geographic space that require estimates of distance-based clustering at many scales. A Monte-Carlo approach to Ripley's K, incorporating covariates and models for distance bias, are crucial when assessing health-related clustering. The case study showed that social network structure defined at the neighbourhood level may account for aspects of neighbourhood clustering of DSH. Accounting for covariate measures that exhibit spatial clustering, such as deprivation, are crucial when assessing point-based clustering.

  9. Charging of nanoparticles in stationary plasma in a gas aggregation cluster source

    NASA Astrophysics Data System (ADS)

    Blažek, J.; Kousal, J.; Biederman, H.; Kylián, O.; Hanuš, J.; Slavínská, D.

    2015-10-01

    Clusters that grow into nanoparticles near the magnetron target of the gas aggregation cluster source (GAS) may acquire electric charge by collecting electrons and ions or through other mechanisms like secondary- or photo-electron emissions. The region of the GAS close to magnetron may be considered as stationary plasma. The steady state charge distribution on nanoparticles can be determined by means of three possible models—fluid model, kinetic model and model employing Monte Carlo simulations—of cluster charging. In the paper the mathematical and numerical aspects of these models are analyzed in detail and close links between them are clarified. Among others it is shown that Monte Carlo simulation may be considered as a particular numerical technique of solving kinetic equations. Similarly the equations of the fluid model result, after some approximation, from averaged kinetic equations. A new algorithm solving an in principle unlimited set of kinetic equations is suggested. Its efficiency is verified on physical models based on experimental input data.

  10. Characteristics of airflow and particle deposition in COPD current smokers

    NASA Astrophysics Data System (ADS)

    Zou, Chunrui; Choi, Jiwoong; Haghighi, Babak; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long

    2017-11-01

    A recent imaging-based cluster analysis of computed tomography (CT) lung images in a chronic obstructive pulmonary disease (COPD) cohort identified four clusters, viz. disease sub-populations. Cluster 1 had relatively normal airway structures; Cluster 2 had wall thickening; Cluster 3 exhibited decreased wall thickness and luminal narrowing; Cluster 4 had a significant decrease of luminal diameter and a significant reduction of lung deformation, thus having relatively low pulmonary functions. To better understand the characteristics of airflow and particle deposition in these clusters, we performed computational fluid and particle dynamics analyses on representative cluster patients and healthy controls using CT-based airway models and subject-specific 3D-1D coupled boundary conditions. The results show that particle deposition in central airways of cluster 4 patients was noticeably increased especially with increasing particle size despite reduced vital capacity as compared to other clusters and healthy controls. This may be attributable in part to significant airway constriction in cluster 4. This study demonstrates the potential application of cluster-guided CFD analysis in disease populations. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837.

  11. A class of spherical, truncated, anisotropic models for application to globular clusters

    NASA Astrophysics Data System (ADS)

    de Vita, Ruggero; Bertin, Giuseppe; Zocchi, Alice

    2016-05-01

    Recently, a class of non-truncated, radially anisotropic models (the so-called f(ν)-models), originally constructed in the context of violent relaxation and modelling of elliptical galaxies, has been found to possess interesting qualities in relation to observed and simulated globular clusters. In view of new applications to globular clusters, we improve this class of models along two directions. To make them more suitable for the description of small stellar systems hosted by galaxies, we introduce a "tidal" truncation by means of a procedure that guarantees full continuity of the distribution function. The new fT(ν)-models are shown to provide a better fit to the observed photometric and spectroscopic profiles for a sample of 13 globular clusters studied earlier by means of non-truncated models; interestingly, the best-fit models also perform better with respect to the radial-orbit instability. Then, we design a flexible but simple two-component family of truncated models to study the separate issues of mass segregation and multiple populations. We do not aim at a fully realistic description of globular clusters to compete with the description currently obtained by means of dedicated simulations. The goal here is to try to identify the simplest models, that is, those with the smallest number of free parameters, but still have the capacity to provide a reasonable description for clusters that are evidently beyond the reach of one-component models. With this tool, we aim at identifying the key factors that characterize mass segregation or the presence of multiple populations. To reduce the relevant parameter space, we formulate a few physical arguments based on recent observations and simulations. A first application to two well-studied globular clusters is briefly described and discussed.

  12. The Clusters - Collaborative Models of Sustainable Regional Development

    NASA Astrophysics Data System (ADS)

    Mănescu, Gabriel; Kifor, Claudiu

    2014-12-01

    The clusters are the subject of actions and of whole series of documents issued by national and international organizations, and, based on experience, many authorities promote the idea that because of the clusters, competitiveness increases, the workforce specializes, regional businesses and economies grow. The present paper is meant to be an insight into the initiatives of forming clusters in Romania. Starting from a comprehensive analysis of the development potential offered by each region of economic development, we present the main types of clusters grouped according to fields of activity and their overall objectives

  13. Connectionist Interaction Information Retrieval.

    ERIC Educational Resources Information Center

    Dominich, Sandor

    2003-01-01

    Discussion of connectionist views for adaptive clustering in information retrieval focuses on a connectionist clustering technique and activation spreading-based information retrieval model using the interaction information retrieval method. Presents theoretical as well as simulation results as regards computational complexity and includes…

  14. Description of alternating-parity bands within the dinuclear-system model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shneidman, T. M.; Adamian, G. G., E-mail: adamian@theor.jinr.ru; Antonenko, N. V.

    2016-11-15

    A cluster approach is used to describe ground-state-based alternating-parity bands in even–even nuclei and to study the band-termination mechanism. A method is proposed for testing the cluster nature of alternating-parity bands.

  15. Verification of Bayesian Clustering in Travel Behaviour Research – First Step to Macroanalysis of Travel Behaviour

    NASA Astrophysics Data System (ADS)

    Satra, P.; Carsky, J.

    2018-04-01

    Our research is looking at the travel behaviour from a macroscopic view, taking one municipality as a basic unit. The travel behaviour of one municipality as a whole is becoming one piece of a data in the research of travel behaviour of a larger area, perhaps a country. A data pre-processing is used to cluster the municipalities in groups, which show similarities in their travel behaviour. Such groups can be then researched for reasons of their prevailing pattern of travel behaviour without any distortion caused by municipalities with a different pattern. This paper deals with actual settings of the clustering process, which is based on Bayesian statistics, particularly the mixture model. An optimization of the settings parameters based on correlation of pointer model parameters and relative number of data in clusters is helpful, however not fully reliable method. Thus, method for graphic representation of clusters needs to be developed in order to check their quality. A training of the setting parameters in 2D has proven to be a beneficial method, because it allows visual control of the produced clusters. The clustering better be applied on separate groups of municipalities, where competition of only identical transport modes can be found.

  16. Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.

    PubMed

    Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si

    2017-07-01

    Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.

  17. Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks.

    PubMed

    Wang, Zhaowei; Zeng, Peng; Zhou, Mingtuo; Li, Dong; Wang, Jintao

    2017-01-13

    Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs), and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs' demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS) method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays.

  18. Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks †

    PubMed Central

    Wang, Zhaowei; Zeng, Peng; Zhou, Mingtuo; Li, Dong; Wang, Jintao

    2017-01-01

    Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs), and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs’ demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS) method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays. PMID:28098750

  19. Discharge-nitrate data clustering for characterizing surface-subsurface flow interaction and calibration of a hydrologic model

    NASA Astrophysics Data System (ADS)

    Shrestha, R. R.; Rode, M.

    2008-12-01

    Concentration of reactive chemicals has different chemical signatures in baseflow and surface runoff. Previous studies on nitrate export from a catchment indicate that the transport processes are driven by subsurface flow. Therefore nitrate signature can be used for understanding the event and pre-event contributions to streamflow and surface-subsurface flow interactions. The study uses flow and nitrate concentration time series data for understanding the relationship between these two variables. Unsupervised artificial neural network based learning method called self organizing map is used for the identification of clusters in the datasets. Based on the cluster results, five different pattern in the datasets are identified which correspond to (i) baseflow, (ii) subsurface flow increase, (iii) surface runoff increase, (iv) surface runoff recession, and (v) subsurface flow decrease regions. The cluster results in combination with a hydrologic model are used for discharge separation. For this purpose, a multi-objective optimization tool NSGA-II is used, where violation of cluster results is used as one of the objective functions. The results show that the use of cluster results as supplementary information for the calibration of a hydrologic model gives a plausible simulation of subsurface flow as well total runoff at the catchment outlet. The study is undertaken using data from the Weida catchment in the North-Eastern Germany, which is a sub-catchment of the Weisse Elster river in the Elbe river basin.

  20. Weak lensing calibration of mass bias in the REFLEX+BCS X-ray galaxy cluster catalogue

    NASA Astrophysics Data System (ADS)

    Simet, Melanie; Battaglia, Nicholas; Mandelbaum, Rachel; Seljak, Uroš

    2017-04-01

    The use of large, X-ray-selected Galaxy cluster catalogues for cosmological analyses requires a thorough understanding of the X-ray mass estimates. Weak gravitational lensing is an ideal method to shed light on such issues, due to its insensitivity to the cluster dynamical state. We perform a weak lensing calibration of 166 galaxy clusters from the REFLEX and BCS cluster catalogue and compare our results to the X-ray masses based on scaled luminosities from that catalogue. To interpret the weak lensing signal in terms of cluster masses, we compare the lensing signal to simple theoretical Navarro-Frenk-White models and to simulated cluster lensing profiles, including complications such as cluster substructure, projected large-scale structure and Eddington bias. We find evidence of underestimation in the X-ray masses, as expected, with = 0.75 ± 0.07 stat. ±0.05 sys. for our best-fitting model. The biases in cosmological parameters in a typical cluster abundance measurement that ignores this mass bias will typically exceed the statistical errors.

  1. TOPTRAC: Topical Trajectory Pattern Mining

    PubMed Central

    Kim, Younghoon; Han, Jiawei; Yuan, Cangzhou

    2015-01-01

    With the increasing use of GPS-enabled mobile phones, geo-tagging, which refers to adding GPS information to media such as micro-blogging messages or photos, has seen a surge in popularity recently. This enables us to not only browse information based on locations, but also discover patterns in the location-based behaviors of users. Many techniques have been developed to find the patterns of people's movements using GPS data, but latent topics in text messages posted with local contexts have not been utilized effectively. In this paper, we present a latent topic-based clustering algorithm to discover patterns in the trajectories of geo-tagged text messages. We propose a novel probabilistic model to capture the semantic regions where people post messages with a coherent topic as well as the patterns of movement between the semantic regions. Based on the model, we develop an efficient inference algorithm to calculate model parameters. By exploiting the estimated model, we next devise a clustering algorithm to find the significant movement patterns that appear frequently in data. Our experiments on real-life data sets show that the proposed algorithm finds diverse and interesting trajectory patterns and identifies the semantic regions in a finer granularity than the traditional geographical clustering methods. PMID:26709365

  2. Persistent Topology and Metastable State in Conformational Dynamics

    PubMed Central

    Chang, Huang-Wei; Bacallado, Sergio; Pande, Vijay S.; Carlsson, Gunnar E.

    2013-01-01

    The large amount of molecular dynamics simulation data produced by modern computational models brings big opportunities and challenges to researchers. Clustering algorithms play an important role in understanding biomolecular kinetics from the simulation data, especially under the Markov state model framework. However, the ruggedness of the free energy landscape in a biomolecular system makes common clustering algorithms very sensitive to perturbations of the data. Here, we introduce a data-exploratory tool which provides an overview of the clustering structure under different parameters. The proposed Multi-Persistent Clustering analysis combines insights from recent studies on the dynamics of systems with dominant metastable states with the concept of multi-dimensional persistence in computational topology. We propose to explore the clustering structure of the data based on its persistence on scale and density. The analysis provides a systematic way to discover clusters that are robust to perturbations of the data. The dominant states of the system can be chosen with confidence. For the clusters on the borderline, the user can choose to do more simulation or make a decision based on their structural characteristics. Furthermore, our multi-resolution analysis gives users information about the relative potential of the clusters and their hierarchical relationship. The effectiveness of the proposed method is illustrated in three biomolecules: alanine dipeptide, Villin headpiece, and the FiP35 WW domain. PMID:23565139

  3. Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

    ERIC Educational Resources Information Center

    Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

    2013-01-01

    This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…

  4. Shocks and Tides Quantified in the “Sausage” Cluster, CIZA J2242.8+5301 Using N-body/Hydrodynamical Simulations

    NASA Astrophysics Data System (ADS)

    Molnar, S. M.; Broadhurst, T.

    2017-05-01

    The colliding cluster, CIZA J2242.8+5301, displays a spectacular, almost 2 Mpc long shock front with a radio based Mach number M≃ 5, that is puzzlingly large compared to the X-ray estimate of M≃ 2.5. The extent to which the X-ray temperature jump is diluted by cooler unshocked gas projected through the cluster currently lacks quantification. Here we apply our self-consistent N-body/hydrodynamical code (based on FLASH) to model this binary cluster encounter. We can account for the location of the shock front and also the elongated X-ray emission by tidal stretching of the gas and dark matter between the two cluster centers. The required total mass is 8.9× {10}14 {M}⊙ with a 1.3:1 mass ratio favoring the southern cluster component. The relative velocity we derive is ≃ 2500 {km} {{{s}}}-1 initially between the two main cluster components, with an impact parameter of 120 kpc. This solution implies that the shock temperature jump derived from the low angular resolution X-ray satellite Suzaku is underestimated by a factor of two, due to cool gas in projection, bringing the observed X-ray and radio estimates into agreement. Finally, we use our model to generate Compton-y maps to estimate the thermal Sunyaev-Zel’dovich (SZ) effect. At 30 GHz, this amounts to {{Δ }}{S}n=-0.072 mJy/arcmin2 and {{Δ }}{S}s=-0.075 mJy/arcmin2 at the locations of the northern and southern shock fronts respectively. Our model estimate agrees with previous empirical estimates that have inferred the measured radio spectra of the radio relics can be significantly affected by the SZ effect, with implications for charged particle acceleration models.

  5. A mixture model-based approach to the clustering of microarray expression data.

    PubMed

    McLachlan, G J; Bean, R W; Peel, D

    2002-03-01

    This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/

  6. Rigid-Cluster Models of Conformational Transitions in Macromolecular Machines and Assemblies

    PubMed Central

    Kim, Moon K.; Jernigan, Robert L.; Chirikjian, Gregory S.

    2005-01-01

    We present a rigid-body-based technique (called rigid-cluster elastic network interpolation) to generate feasible transition pathways between two distinct conformations of a macromolecular assembly. Many biological molecules and assemblies consist of domains which act more or less as rigid bodies during large conformational changes. These collective motions are thought to be strongly related with the functions of a system. This fact encourages us to simply model a macromolecule or assembly as a set of rigid bodies which are interconnected with distance constraints. In previous articles, we developed coarse-grained elastic network interpolation (ENI) in which, for example, only Cα atoms are selected as representatives in each residue of a protein. We interpolate distance differences of two conformations in ENI by using a simple quadratic cost function, and the feasible conformations are generated without steric conflicts. Rigid-cluster interpolation is an extension of the ENI method with rigid-clusters replacing point masses. Now the intermediate conformations in an anharmonic pathway can be determined by the translational and rotational displacements of large clusters in such a way that distance constraints are observed. We present the derivation of the rigid-cluster model and apply it to a variety of macromolecular assemblies. Rigid-cluster ENI is then modified for a hybrid model represented by a mixture of rigid clusters and point masses. Simulation results show that both rigid-cluster and hybrid ENI methods generate sterically feasible pathways of large systems in a very short time. For example, the HK97 virus capsid is an icosahedral symmetric assembly composed of 60 identical asymmetric units. Its original Hessian matrix size for a Cα coarse-grained model is >(300,000)2. However, it reduces to (84)2 when we apply the rigid-cluster model with icosahedral symmetry constraints. The computational cost of the interpolation no longer scales heavily with the size of structures; instead, it depends strongly on the minimal number of rigid clusters into which the system can be decomposed. PMID:15833998

  7. Modeling and clustering water demand patterns from real-world smart meter data

    NASA Astrophysics Data System (ADS)

    Cheifetz, Nicolas; Noumir, Zineb; Samé, Allou; Sandraz, Anne-Claire; Féliers, Cédric; Heim, Véronique

    2017-08-01

    Nowadays, drinking water utilities need an acute comprehension of the water demand on their distribution network, in order to efficiently operate the optimization of resources, manage billing and propose new customer services. With the emergence of smart grids, based on automated meter reading (AMR), a better understanding of the consumption modes is now accessible for smart cities with more granularities. In this context, this paper evaluates a novel methodology for identifying relevant usage profiles from the water consumption data produced by smart meters. The methodology is fully data-driven using the consumption time series which are seen as functions or curves observed with an hourly time step. First, a Fourier-based additive time series decomposition model is introduced to extract seasonal patterns from time series. These patterns are intended to represent the customer habits in terms of water consumption. Two functional clustering approaches are then used to classify the extracted seasonal patterns: the functional version of K-means, and the Fourier REgression Mixture (FReMix) model. The K-means approach produces a hard segmentation and K representative prototypes. On the other hand, the FReMix is a generative model and also produces K profiles as well as a soft segmentation based on the posterior probabilities. The proposed approach is applied to a smart grid deployed on the largest water distribution network (WDN) in France. The two clustering strategies are evaluated and compared. Finally, a realistic interpretation of the consumption habits is given for each cluster. The extensive experiments and the qualitative interpretation of the resulting clusters allow one to highlight the effectiveness of the proposed methodology.

  8. Dynamical Organization of Syntaxin-1A at the Presynaptic Active Zone

    PubMed Central

    Ullrich, Alexander; Böhme, Mathias A.; Schöneberg, Johannes; Depner, Harald; Sigrist, Stephan J.; Noé, Frank

    2015-01-01

    Synaptic vesicle fusion is mediated by SNARE proteins forming in between synaptic vesicle (v-SNARE) and plasma membrane (t-SNARE), one of which is Syntaxin-1A. Although exocytosis mainly occurs at active zones, Syntaxin-1A appears to cover the entire neuronal membrane. By using STED super-resolution light microscopy and image analysis of Drosophila neuro-muscular junctions, we show that Syntaxin-1A clusters are more abundant and have an increased size at active zones. A computational particle-based model of syntaxin cluster formation and dynamics is developed. The model is parametrized to reproduce Syntaxin cluster-size distributions found by STED analysis, and successfully reproduces existing FRAP results. The model shows that the neuronal membrane is adjusted in a way to strike a balance between having most syntaxins stored in large clusters, while still keeping a mobile fraction of syntaxins free or in small clusters that can efficiently search the membrane or be traded between clusters. This balance is subtle and can be shifted toward almost no clustering and almost complete clustering by modifying the syntaxin interaction energy on the order of only 1 kBT. This capability appears to be exploited at active zones. The larger active-zone syntaxin clusters are more stable and provide regions of high docking and fusion capability, whereas the smaller clusters outside may serve as flexible reserve pool or sites of spontaneous ectopic release. PMID:26367029

  9. Million-body star cluster simulations: comparisons between Monte Carlo and direct N-body

    NASA Astrophysics Data System (ADS)

    Rodriguez, Carl L.; Morscher, Meagan; Wang, Long; Chatterjee, Sourav; Rasio, Frederic A.; Spurzem, Rainer

    2016-12-01

    We present the first detailed comparison between million-body globular cluster simulations computed with a Hénon-type Monte Carlo code, CMC, and a direct N-body code, NBODY6++GPU. Both simulations start from an identical cluster model with 106 particles, and include all of the relevant physics needed to treat the system in a highly realistic way. With the two codes `frozen' (no fine-tuning of any free parameters or internal algorithms of the codes) we find good agreement in the overall evolution of the two models. Furthermore, we find that in both models, large numbers of stellar-mass black holes (>1000) are retained for 12 Gyr. Thus, the very accurate direct N-body approach confirms recent predictions that black holes can be retained in present-day, old globular clusters. We find only minor disagreements between the two models and attribute these to the small-N dynamics driving the evolution of the cluster core for which the Monte Carlo assumptions are less ideal. Based on the overwhelming general agreement between the two models computed using these vastly different techniques, we conclude that our Monte Carlo approach, which is more approximate, but dramatically faster compared to the direct N-body, is capable of producing an accurate description of the long-term evolution of massive globular clusters even when the clusters contain large populations of stellar-mass black holes.

  10. A segmentation/clustering model for the analysis of array CGH data.

    PubMed

    Picard, F; Robin, S; Lebarbier, E; Daudin, J-J

    2007-09-01

    Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.

  11. Daily life activity routine discovery in hemiparetic rehabilitation patients using topic models.

    PubMed

    Seiter, J; Derungs, A; Schuster-Amft, C; Amft, O; Tröster, G

    2015-01-01

    Monitoring natural behavior and activity routines of hemiparetic rehabilitation patients across the day can provide valuable progress information for therapists and patients and contribute to an optimized rehabilitation process. In particular, continuous patient monitoring could add type, frequency and duration of daily life activity routines and hence complement standard clinical scores that are assessed for particular tasks only. Machine learning methods have been applied to infer activity routines from sensor data. However, supervised methods require activity annotations to build recognition models and thus require extensive patient supervision. Discovery methods, including topic models could provide patient routine information and deal with variability in activity and movement performance across patients. Topic models have been used to discover characteristic activity routine patterns of healthy individuals using activity primitives recognized from supervised sensor data. Yet, the applicability of topic models for hemiparetic rehabilitation patients and techniques to derive activity primitives without supervision needs to be addressed. We investigate, 1) whether a topic model-based activity routine discovery framework can infer activity routines of rehabilitation patients from wearable motion sensor data. 2) We compare the performance of our topic model-based activity routine discovery using rule-based and clustering-based activity vocabulary. We analyze the activity routine discovery in a dataset recorded with 11 hemiparetic rehabilitation patients during up to ten full recording days per individual in an ambulatory daycare rehabilitation center using wearable motion sensors attached to both wrists and the non-affected thigh. We introduce and compare rule-based and clustering-based activity vocabulary to process statistical and frequency acceleration features to activity words. Activity words were used for activity routine pattern discovery using topic models based on Latent Dirichlet Allocation. Discovered activity routine patterns were then mapped to six categorized activity routines. Using the rule-based approach, activity routines could be discovered with an average accuracy of 76% across all patients. The rule-based approach outperformed clustering by 10% and showed less confusions for predicted activity routines. Topic models are suitable to discover daily life activity routines in hemiparetic rehabilitation patients without trained classifiers and activity annotations. Activity routines show characteristic patterns regarding activity primitives including body and extremity postures and movement. A patient-independent rule set can be derived. Including expert knowledge supports successful activity routine discovery over completely data-driven clustering.

  12. Efficient Deployment of Key Nodes for Optimal Coverage of Industrial Mobile Wireless Networks

    PubMed Central

    Li, Xiaomin; Li, Di; Dong, Zhijie; Hu, Yage; Liu, Chengliang

    2018-01-01

    In recent years, industrial wireless networks (IWNs) have been transformed by the introduction of mobile nodes, and they now offer increased extensibility, mobility, and flexibility. Nevertheless, mobile nodes pose efficiency and reliability challenges. Efficient node deployment and management of channel interference directly affect network system performance, particularly for key node placement in clustered wireless networks. This study analyzes this system model, considering both industrial properties of wireless networks and their mobility. Then, static and mobile node coverage problems are unified and simplified to target coverage problems. We propose a novel strategy for the deployment of clustered heads in grouped industrial mobile wireless networks (IMWNs) based on the improved maximal clique model and the iterative computation of new candidate cluster head positions. The maximal cliques are obtained via a double-layer Tabu search. Each cluster head updates its new position via an improved virtual force while moving with full coverage to find the minimal inter-cluster interference. Finally, we develop a simulation environment. The simulation results, based on a performance comparison, show the efficacy of the proposed strategies and their superiority over current approaches. PMID:29439439

  13. RELICS: Strong Lens Models for Five Galaxy Clusters from the Reionization Lensing Cluster Survey

    NASA Astrophysics Data System (ADS)

    Cerny, Catherine; Sharon, Keren; Andrade-Santos, Felipe; Avila, Roberto J.; Bradač, Maruša; Bradley, Larry D.; Carrasco, Daniela; Coe, Dan; Czakon, Nicole G.; Dawson, William A.; Frye, Brenda L.; Hoag, Austin; Huang, Kuang-Han; Johnson, Traci L.; Jones, Christine; Lam, Daniel; Lovisari, Lorenzo; Mainali, Ramesh; Oesch, Pascal A.; Ogaz, Sara; Past, Matthew; Paterno-Mahler, Rachel; Peterson, Avery; Riess, Adam G.; Rodney, Steven A.; Ryan, Russell E.; Salmon, Brett; Sendra-Server, Irene; Stark, Daniel P.; Strolger, Louis-Gregory; Trenti, Michele; Umetsu, Keiichi; Vulcani, Benedetta; Zitrin, Adi

    2018-06-01

    Strong gravitational lensing by galaxy clusters magnifies background galaxies, enhancing our ability to discover statistically significant samples of galaxies at {\\boldsymbol{z}}> 6, in order to constrain the high-redshift galaxy luminosity functions. Here, we present the first five lens models out of the Reionization Lensing Cluster Survey (RELICS) Hubble Treasury Program, based on new HST WFC3/IR and ACS imaging of the clusters RXC J0142.9+4438, Abell 2537, Abell 2163, RXC J2211.7–0349, and ACT-CLJ0102–49151. The derived lensing magnification is essential for estimating the intrinsic properties of high-redshift galaxy candidates, and properly accounting for the survey volume. We report on new spectroscopic redshifts of multiply imaged lensed galaxies behind these clusters, which are used as constraints, and detail our strategy to reduce systematic uncertainties due to lack of spectroscopic information. In addition, we quantify the uncertainty on the lensing magnification due to statistical and systematic errors related to the lens modeling process, and find that in all but one cluster, the magnification is constrained to better than 20% in at least 80% of the field of view, including statistical and systematic uncertainties. The five clusters presented in this paper span the range of masses and redshifts of the clusters in the RELICS program. We find that they exhibit similar strong lensing efficiencies to the clusters targeted by the Hubble Frontier Fields within the WFC3/IR field of view. Outputs of the lens models are made available to the community through the Mikulski Archive for Space Telescopes.

  14. Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.

    PubMed

    Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço

    2017-11-01

    The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.

  15. An Island Grouping Genetic Algorithm for Fuzzy Partitioning Problems

    PubMed Central

    Salcedo-Sanz, S.; Del Ser, J.; Geem, Z. W.

    2014-01-01

    This paper presents a novel fuzzy clustering technique based on grouping genetic algorithms (GGAs), which are a class of evolutionary algorithms especially modified to tackle grouping problems. Our approach hinges on a GGA devised for fuzzy clustering by means of a novel encoding of individuals (containing elements and clusters sections), a new fitness function (a superior modification of the Davies Bouldin index), specially tailored crossover and mutation operators, and the use of a scheme based on a local search and a parallelization process, inspired from an island-based model of evolution. The overall performance of our approach has been assessed over a number of synthetic and real fuzzy clustering problems with different objective functions and distance measures, from which it is concluded that the proposed approach shows excellent performance in all cases. PMID:24977235

  16. MASSCLEANage-STELLAR CLUSTER AGES FROM INTEGRATED COLORS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Popescu, Bogdan; Hanson, M. M., E-mail: popescb@mail.uc.ed, E-mail: margaret.hanson@uc.ed

    2010-11-20

    We present the recently updated and expanded MASSCLEANcolors, a database of 70 million Monte Carlo models selected to match the properties (metallicity, ages, and masses) of stellar clusters found in the Large Magellanic Cloud (LMC). This database shows the rather extreme and non-Gaussian distribution of integrated colors and magnitudes expected with different cluster age and mass and the enormous age degeneracy of integrated colors when mass is unknown. This degeneracy could lead to catastrophic failures in estimating age with standard simple stellar population models, particularly if most of the clusters are of intermediate or low mass, like in the LMC.more » Utilizing the MASSCLEANcolors database, we have developed MASSCLEANage, a statistical inference package which assigns the most likely age and mass (solved simultaneously) to a cluster based only on its integrated broadband photometric properties. Finally, we use MASSCLEANage to derive the age and mass of LMC clusters based on integrated photometry alone. First, we compare our cluster ages against those obtained for the same seven clusters using more accurate integrated spectroscopy. We find improved agreement with the integrated spectroscopy ages over the original photometric ages. A close examination of our results demonstrates the necessity of solving simultaneously for mass and age to reduce degeneracies in the cluster ages derived via integrated colors. We then selected an additional subset of 30 photometric clusters with previously well-constrained ages and independently derive their age using the MASSCLEANage with the same photometry with very good agreement. The MASSCLEANage program is freely available under GNU General Public License.« less

  17. Assessment of cluster yield components by image analysis.

    PubMed

    Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

    2015-04-01

    Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.

  18. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

  19. Scaling Relations from Sunyaev-Zel'dovich Effect and Chandra X-ray Measurements of High-Redshift Galaxy Clusters

    NASA Technical Reports Server (NTRS)

    Bonamente, Massimiliano; Joy, Marshall; LaRoque, Samuel J.; Carlstrom, John E.; Nagai, Daisuke; Marrone, Dan

    2007-01-01

    We present Sunyaev-Zel'dovich Effect (SZE) scaling relations for 38 massive galaxy clusters at redshifts 0.14 less than or equal to z less than or equal to 0.89, observed with both the Chandra X-ray Observatory and the centimeter-wave SZE imaging system at the BIMA and OVRO interferometric arrays. An isothermal ,Beta-model with central 100 kpc excluded from the X-ray data is used to model the intracluster medium and to measure global cluster properties. For each Cluster, we measure the X-ray spectroscopic temperature, SZE gas mass, total mass. and integrated Compton-gamma parameters within r(sub 2500). Our measurements are in agreement with the expectations based on a simple self-similar model of cluster formation and evolution. We compare the cluster properties derived from our SZE observations with and without Chandra spatial and spectral information and find them to be in good agreement: We compare our results with cosmological numerical simulations, and find that simulations that include radiative cooling, star formation and feedback match well both the slope and normalization of our SZE scaling relations.

  20. Shortest-path constraints for 3D multiobject semiautomatic segmentation via clustering and Graph Cut.

    PubMed

    Kéchichian, Razmig; Valette, Sébastien; Desvignes, Michel; Prost, Rémy

    2013-11-01

    We derive shortest-path constraints from graph models of structure adjacency relations and introduce them in a joint centroidal Voronoi image clustering and Graph Cut multiobject semiautomatic segmentation framework. The vicinity prior model thus defined is a piecewise-constant model incurring multiple levels of penalization capturing the spatial configuration of structures in multiobject segmentation. Qualitative and quantitative analyses and comparison with a Potts prior-based approach and our previous contribution on synthetic, simulated, and real medical images show that the vicinity prior allows for the correct segmentation of distinct structures having identical intensity profiles and improves the precision of segmentation boundary placement while being fairly robust to clustering resolution. The clustering approach we take to simplify images prior to segmentation strikes a good balance between boundary adaptivity and cluster compactness criteria furthermore allowing to control the trade-off. Compared with a direct application of segmentation on voxels, the clustering step improves the overall runtime and memory footprint of the segmentation process up to an order of magnitude without compromising the quality of the result.

  1. Implementation of Self Organizing Map (SOM) as decision support: Indonesian telematics services MSMEs empowerment

    NASA Astrophysics Data System (ADS)

    Tosida, E. T.; Maryana, S.; Thaheer, H.; Hardiani

    2017-01-01

    Information technology and communication (telematics) is one of the most rapidly developing business sectors in Indonesia. It has strategic position in its contribution towards planning and implementation of developmental, economics, social, politics and defence strategies in business, communication and education. Aid absorption for the national telecommunication SMEs is relatively low; therefore, improvement is needed using analysis on business support cluster of which basis is types of business. In the study, the business support cluster analysis is specifically implemented for Indonesian telecommunication service. The data for the business are obtained from the National Census of Economic (Susenas 2006). The method used to develop cluster model is an Artificial Neural Network (ANN) system called Self-Organizing Maps (SOM) algorithm. Based on Index of Davies Bouldin (IDB), the accuracy level of the cluster model is 0.37 or can be categorized as good. The cluster model is developed to find out telecommunication business clusters that has influence towards the national economy so that it is easier for the government to supervise telecommunication business.

  2. Planck 2015 results. XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

    NASA Astrophysics Data System (ADS)

    Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Bartolo, N.; Battaner, E.; Battye, R.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.-R.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dolag, K.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Falgarone, E.; Fergusson, J.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Melin, J.-B.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Roman, M.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Türler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Weller, J.; White, S. D. M.; Yvon, D.; Zacchei, A.; Zonca, A.

    2016-09-01

    We present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing of background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. Improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.

  3. Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

    DOE PAGES

    Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...

    2016-09-20

    In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less

  4. Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ade, P. A. R.; Aghanim, N.; Arnaud, M.

    In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less

  5. A Commodity Computing Cluster

    NASA Astrophysics Data System (ADS)

    Teuben, P. J.; Wolfire, M. G.; Pound, M. W.; Mundy, L. G.

    We have assembled a cluster of Intel-Pentium based PCs running Linux to compute a large set of Photodissociation Region (PDR) and Dust Continuum models. For various reasons the cluster is heterogeneous, currently ranging from a single Pentium-II 333 MHz to dual Pentium-III 450 MHz CPU machines. Although this will be sufficient for our ``embarrassingly parallelizable problem'' it may present some challenges for as yet unplanned future use. In addition the cluster was used to construct a MIRIAD benchmark, and compared to equivalent Ultra-Sparc based workstations. Currently the cluster consists of 8 machines, 14 CPUs, 50GB of disk-space, and a total peak speed of 5.83 GHz, or about 1.5 Gflops. The total cost of this cluster has been about $12,000, including all cabling, networking equipment, rack, and a CD-R backup system. The URL for this project is http://dustem.astro.umd.edu.

  6. Graph configuration model based evaluation of the education-occupation match

    PubMed Central

    2018-01-01

    To study education—occupation matchings we developed a bipartite network model of education to work transition and a graph configuration model based metric. We studied the career paths of 15 thousand Hungarian students based on the integrated database of the National Tax Administration, the National Health Insurance Fund, and the higher education information system of the Hungarian Government. A brief analysis of gender pay gap and the spatial distribution of over-education is presented to demonstrate the background of the research and the resulted open dataset. We highlighted the hierarchical and clustered structure of the career paths based on the multi-resolution analysis of the graph modularity. The results of the cluster analysis can support policymakers to fine-tune the fragmented program structure of higher education. PMID:29509783

  7. Graph configuration model based evaluation of the education-occupation match.

    PubMed

    Gadar, Laszlo; Abonyi, Janos

    2018-01-01

    To study education-occupation matchings we developed a bipartite network model of education to work transition and a graph configuration model based metric. We studied the career paths of 15 thousand Hungarian students based on the integrated database of the National Tax Administration, the National Health Insurance Fund, and the higher education information system of the Hungarian Government. A brief analysis of gender pay gap and the spatial distribution of over-education is presented to demonstrate the background of the research and the resulted open dataset. We highlighted the hierarchical and clustered structure of the career paths based on the multi-resolution analysis of the graph modularity. The results of the cluster analysis can support policymakers to fine-tune the fragmented program structure of higher education.

  8. Are Binary Separations related to their System Mass?

    NASA Astrophysics Data System (ADS)

    Sterzik, M. F.; Durisen, R. H.

    2004-08-01

    We compile most recent multiplicity fractions and binary separation distributions for different primary masses, including very low-mass and brown dwarf primaries, and compare them with dynamical decay models of small-N clusters. The model predictions are based on detailed numerical calculations of the internal cluster dynamics, as well as on Monte-Carlo methods. Both observations and models reflect the same trends: (1) The multiplicity fraction is an increasing function of the primary mass. (2) The mean binary separations are increasing with the system mass in the sense that very low-mass binaries have average separations around ≈ 4AU, while the binary separation distribution for solar-type primaries peaks at ≈ 40AU. M-type binary systems apparently preferentially populate intermediate separations. Similar specific energy at the time of cluster formation for all cluster masses can possibly explain this trend.

  9. Mid-infrared Integrated-light Photometry Of LMC Star Clusters

    NASA Astrophysics Data System (ADS)

    Pessev, Peter; Goudfrooij, P.; Puzia, T.; Chandar, R.

    2008-03-01

    Massive star clusters (Galactic Globular Clusters and Populous Clusters in the Magellanic Clouds) are the best available approximation of Simple Stellar Populations (SSPs). Since the stellar populations in these nearby objects are studied in details, they provide fundamental age/metallicity templates for interpretation of the galaxy properties, testing and calibration of the SSP Models. Magellanic Cloud clusters are particularly important since they populate a region of the age/metallicity parameter space that is not easily accessible in our Galaxy. We present the first Mid-IR integrated-light measurements for six LMC clusters based on our Spitzer IRAC imaging program. Since we are targeting a specific group of intermediate-age clusters, our imaging goes deeper compared to SAGE-LMC survey data. We present a literature compilation of clusters' properties along with multi-wavelength integrated light photometry database spanning from the optical (Johnson U band) to the Mid-IR (IRAC Channel 4). This data provides an important empirical baseline for the interpretation of galaxy colors in the Mid-IR (especially high-z objects whose integrated-light is dominated by TP-AGB stars emission). It is also a valuable tool to check the SSP model predictions in the intermediate-age regime and provides calibration data for the next generation of SSP models.

  10. Classical plasma dynamics of Mie-oscillations in atomic clusters

    NASA Astrophysics Data System (ADS)

    Kull, H.-J.; El-Khawaldeh, A.

    2018-04-01

    Mie plasmons are of basic importance for the absorption of laser light by atomic clusters. In this work we first review the classical Rayleigh-theory of a dielectric sphere in an external electric field and Thomson’s plum-pudding model applied to atomic clusters. Both approaches allow for elementary discussions of Mie oscillations, however, they also indicate deficiencies in describing the damping mechanisms by electrons crossing the cluster surface. Nonlinear oscillator models have been widely studied to gain an understanding of damping and absorption by outer ionization of the cluster. In the present work, we attempt to address the issue of plasmon relaxation in atomic clusters in more detail based on classical particle simulations. In particular, we wish to study the role of thermal motion on plasmon relaxation, thereby extending nonlinear models of collective single-electron motion. Our simulations are particularly adopted to the regime of classical kinetics in weakly coupled plasmas and to cluster sizes extending the Debye-screening length. It will be illustrated how surface scattering leads to the relaxation of Mie oscillations in the presence of thermal motion and of electron spill-out at the cluster surface. This work is intended to give, from a classical perspective, further insight into recent work on plasmon relaxation in quantum plasmas [1].

  11. To center or not to center? Investigating inertia with a multilevel autoregressive model.

    PubMed

    Hamaker, Ellen L; Grasman, Raoul P P P

    2014-01-01

    Whether level 1 predictors should be centered per cluster has received considerable attention in the multilevel literature. While most agree that there is no one preferred approach, it has also been argued that cluster mean centering is desirable when the within-cluster slope and the between-cluster slope are expected to deviate, and the main interest is in the within-cluster slope. However, we show in a series of simulations that if one has a multilevel autoregressive model in which the level 1 predictor is the lagged outcome variable (i.e., the outcome variable at the previous occasion), cluster mean centering will in general lead to a downward bias in the parameter estimate of the within-cluster slope (i.e., the autoregressive relationship). This is particularly relevant if the main question is whether there is on average an autoregressive effect. Nonetheless, we show that if the main interest is in estimating the effect of a level 2 predictor on the autoregressive parameter (i.e., a cross-level interaction), cluster mean centering should be preferred over other forms of centering. Hence, researchers should be clear on what is considered the main goal of their study, and base their choice of centering method on this when using a multilevel autoregressive model.

  12. To center or not to center? Investigating inertia with a multilevel autoregressive model

    PubMed Central

    Hamaker, Ellen L.; Grasman, Raoul P. P. P.

    2015-01-01

    Whether level 1 predictors should be centered per cluster has received considerable attention in the multilevel literature. While most agree that there is no one preferred approach, it has also been argued that cluster mean centering is desirable when the within-cluster slope and the between-cluster slope are expected to deviate, and the main interest is in the within-cluster slope. However, we show in a series of simulations that if one has a multilevel autoregressive model in which the level 1 predictor is the lagged outcome variable (i.e., the outcome variable at the previous occasion), cluster mean centering will in general lead to a downward bias in the parameter estimate of the within-cluster slope (i.e., the autoregressive relationship). This is particularly relevant if the main question is whether there is on average an autoregressive effect. Nonetheless, we show that if the main interest is in estimating the effect of a level 2 predictor on the autoregressive parameter (i.e., a cross-level interaction), cluster mean centering should be preferred over other forms of centering. Hence, researchers should be clear on what is considered the main goal of their study, and base their choice of centering method on this when using a multilevel autoregressive model. PMID:25688215

  13. Identifying optimal threshold statistics for elimination of hookworm using a stochastic simulation model.

    PubMed

    Truscott, James E; Werkman, Marleen; Wright, James E; Farrell, Sam H; Sarkar, Rajiv; Ásbjörnsdóttir, Kristjana; Anderson, Roy M

    2017-06-30

    There is an increased focus on whether mass drug administration (MDA) programmes alone can interrupt the transmission of soil-transmitted helminths (STH). Mathematical models can be used to model these interventions and are increasingly being implemented to inform investigators about expected trial outcome and the choice of optimum study design. One key factor is the choice of threshold for detecting elimination. However, there are currently no thresholds defined for STH regarding breaking transmission. We develop a simulation of an elimination study, based on the DeWorm3 project, using an individual-based stochastic disease transmission model in conjunction with models of MDA, sampling, diagnostics and the construction of study clusters. The simulation is then used to analyse the relationship between the study end-point elimination threshold and whether elimination is achieved in the long term within the model. We analyse the quality of a range of statistics in terms of the positive predictive values (PPV) and how they depend on a range of covariates, including threshold values, baseline prevalence, measurement time point and how clusters are constructed. End-point infection prevalence performs well in discriminating between villages that achieve interruption of transmission and those that do not, although the quality of the threshold is sensitive to baseline prevalence and threshold value. Optimal post-treatment prevalence threshold value for determining elimination is in the range 2% or less when the baseline prevalence range is broad. For multiple clusters of communities, both the probability of elimination and the ability of thresholds to detect it are strongly dependent on the size of the cluster and the size distribution of the constituent communities. Number of communities in a cluster is a key indicator of probability of elimination and PPV. Extending the time, post-study endpoint, at which the threshold statistic is measured improves PPV value in discriminating between eliminating clusters and those that bounce back. The probability of elimination and PPV are very sensitive to baseline prevalence for individual communities. However, most studies and programmes are constructed on the basis of clusters. Since elimination occurs within smaller population sub-units, the construction of clusters introduces new sensitivities for elimination threshold values to cluster size and the underlying population structure. Study simulation offers an opportunity to investigate key sources of sensitivity for elimination studies and programme designs in advance and to tailor interventions to prevailing local or national conditions.

  14. Understanding the Support Needs of People with Intellectual and Related Developmental Disabilities through Cluster Analysis and Factor Analysis of Statewide Data

    ERIC Educational Resources Information Center

    Viriyangkura, Yuwadee

    2014-01-01

    Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…

  15. A clustering algorithm for sample data based on environmental pollution characteristics

    NASA Astrophysics Data System (ADS)

    Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

    2015-04-01

    Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.

  16. Data-driven modeling and predictive control for boiler-turbine unit using fuzzy clustering and subspace methods.

    PubMed

    Wu, Xiao; Shen, Jiong; Li, Yiguo; Lee, Kwang Y

    2014-05-01

    This paper develops a novel data-driven fuzzy modeling strategy and predictive controller for boiler-turbine unit using fuzzy clustering and subspace identification (SID) methods. To deal with the nonlinear behavior of boiler-turbine unit, fuzzy clustering is used to provide an appropriate division of the operation region and develop the structure of the fuzzy model. Then by combining the input data with the corresponding fuzzy membership functions, the SID method is extended to extract the local state-space model parameters. Owing to the advantages of the both methods, the resulting fuzzy model can represent the boiler-turbine unit very closely, and a fuzzy model predictive controller is designed based on this model. As an alternative approach, a direct data-driven fuzzy predictive control is also developed following the same clustering and subspace methods, where intermediate subspace matrices developed during the identification procedure are utilized directly as the predictor. Simulation results show the advantages and effectiveness of the proposed approach. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  17. Conjunction of wavelet transform and SOM-mutual information data pre-processing approach for AI-based Multi-Station nitrate modeling of watersheds

    NASA Astrophysics Data System (ADS)

    Nourani, Vahid; Andalib, Gholamreza; Dąbrowska, Dominika

    2017-05-01

    Accurate nitrate load predictions can elevate decision management of water quality of watersheds which affects to environment and drinking water. In this paper, two scenarios were considered for Multi-Station (MS) nitrate load modeling of the Little River watershed. In the first scenario, Markovian characteristics of streamflow-nitrate time series were proposed for the MS modeling. For this purpose, feature extraction criterion of Mutual Information (MI) was employed for input selection of artificial intelligence models (Feed Forward Neural Network, FFNN and least square support vector machine). In the second scenario for considering seasonality-based characteristics of the time series, wavelet transform was used to extract multi-scale features of streamflow-nitrate time series of the watershed's sub-basins to model MS nitrate loads. Self-Organizing Map (SOM) clustering technique which finds homogeneous sub-series clusters was also linked to MI for proper cluster agent choice to be imposed into the models for predicting the nitrate loads of the watershed's sub-basins. The proposed MS method not only considers the prediction of the outlet nitrate but also covers predictions of interior sub-basins nitrate load values. The results indicated that the proposed FFNN model coupled with the SOM-MI improved the performance of MS nitrate predictions compared to the Markovian-based models up to 39%. Overall, accurate selection of dominant inputs which consider seasonality-based characteristics of streamflow-nitrate process could enhance the efficiency of nitrate load predictions.

  18. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    PubMed

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  19. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks

    PubMed Central

    Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin

    2017-01-01

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211

  20. X-Ray Binaries and Star Clusters in the Antennae: Optical Cluster Counterparts

    NASA Astrophysics Data System (ADS)

    Rangelov, Blagoy; Chandar, Rupali; Prestwich, Andrea; Whitmore, Bradley C.

    2012-10-01

    We compare the locations of 82 X-ray binaries (XRBs) detected in the merging Antennae galaxies by Zezas et al., based on observations taken with the Chandra X-Ray Observatory, with a catalog of optically selected star clusters presented by Whitmore et al., based on observations taken with the Hubble Space Telescope. Within the 2σ positional uncertainty of ≈0farcs8, we find 22 XRBs are coincident with star clusters, where only two to three chance coincidences are expected. The ages of the clusters were estimated by comparing their UBVI, Hα colors with predictions from stellar evolutionary models. We find that 14 of the 22 coincident XRBs (64%) are hosted by star clusters with ages of ≈6 Myr or less. All of the very young host clusters are fairly massive and have M >~ 3 × 104 M ⊙, with many having masses M ≈ 105 M ⊙. Five of the XRBs are hosted by young clusters with ages τ ≈ 10-100 Myr, while three are hosted by intermediate-age clusters with τ ≈ 100-300 Myr. Based on the results from recent N-body simulations, which suggest that black holes are far more likely to be retained within their parent clusters than neutron stars, we suggest that our sample consists primarily of black hole binaries with different ages.

  1. Transport-reaction model for defect and carrier behavior within displacement cascades in gallium arsenide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wampler, William R.; Myers, Samuel M.

    2014-02-01

    A model is presented for recombination of charge carriers at displacement damage in gallium arsenide, which includes clustering of the defects in atomic displacement cascades produced by neutron or ion irradiation. The carrier recombination model is based on an atomistic description of capture and emission of carriers by the defects with time evolution resulting from the migration and reaction of the defects. The physics and equations on which the model is based are presented, along with details of the numerical methods used for their solution. The model uses a continuum description of diffusion, field-drift and reaction of carriers and defectsmore » within a representative spherically symmetric cluster. The initial radial defect profiles within the cluster were chosen through pair-correlation-function analysis of the spatial distribution of defects obtained from the binary-collision code MARLOWE, using recoil energies for fission neutrons. Charging of the defects can produce high electric fields within the cluster which may influence transport and reaction of carriers and defects, and which may enhance carrier recombination through band-to-trap tunneling. Properties of the defects are discussed and values for their parameters are given, many of which were obtained from density functional theory. The model provides a basis for predicting the transient response of III-V heterojunction bipolar transistors to pulsed neutron irradiation.« less

  2. Multi-scale study of condensation in water jets using ellipsoidal-statistical Bhatnagar-Gross-Krook and molecular dynamics modeling

    NASA Astrophysics Data System (ADS)

    Li, Zheng; Borner, Arnaud; Levin, Deborah A.

    2014-06-01

    Homogeneous water condensation and ice formation in supersonic expansions to vacuum for stagnation pressures from 12 to 1000 mbar are studied using the particle-based Ellipsoidal-Statistical Bhatnagar-Gross-Krook (ES-BGK) method. We find that when condensation starts to occur, at a stagnation pressure of 96 mbar, the increase in the degree of condensation causes an increase in the rotational temperature due to the latent heat of vaporization. The simulated rotational temperature profiles along the plume expansion agree well with measurements confirming the kinetic homogeneous condensation models and the method of simulation. Comparisons of the simulated gas and cluster number densities, cluster size for different stagnation pressures along the plume centerline were made and it is found that the cluster size increase linearly with respect to stagnation pressure, consistent with classical nucleation theory. The sensitivity of our results to cluster nucleation model and latent heat values based on bulk water, specific cluster size, or bulk ice are examined. In particular, the ES-BGK simulations are found to be too coarse-grained to provide information on the phase or structure of the clusters formed. For this reason, molecular dynamics simulations of water condensation in a one-dimensional free expansion to simulate the conditions in the core of a plume are performed. We find that the internal structure of the clusters formed depends on the stagnation temperature. A larger cluster of average size 21 was tracked down the expansion, and a calculation of its average internal temperature as well as a comparison of its radial distribution functions (RDFs) with values measured for solid amorphous ice clusters lead us to conclude that this cluster is in a solid-like rather than liquid form. In another molecular-dynamics simulation at a much lower stagnation temperature, a larger cluster of size 324 and internal temperature 200 K was extracted from an expansion plume and equilibrated to determine its RDF and self-diffusion coefficient. The value of the latter shows that this cluster is formed in a supercooled liquid state rather than in an amorphous solid state.

  3. Cluster structure in the correlation coefficient matrix can be characterized by abnormal eigenvalues

    NASA Astrophysics Data System (ADS)

    Nie, Chun-Xiao

    2018-02-01

    In a large number of previous studies, the researchers found that some of the eigenvalues of the financial correlation matrix were greater than the predicted values of the random matrix theory (RMT). Here, we call these eigenvalues as abnormal eigenvalues. In order to reveal the hidden meaning of these abnormal eigenvalues, we study the toy model with cluster structure and find that these eigenvalues are related to the cluster structure of the correlation coefficient matrix. In this paper, model-based experiments show that in most cases, the number of abnormal eigenvalues of the correlation matrix is equal to the number of clusters. In addition, empirical studies show that the sum of the abnormal eigenvalues is related to the clarity of the cluster structure and is negatively correlated with the correlation dimension.

  4. Novel schemes for measurement-based quantum computation.

    PubMed

    Gross, D; Eisert, J

    2007-06-01

    We establish a framework which allows one to construct novel schemes for measurement-based quantum computation. The technique develops tools from many-body physics-based on finitely correlated or projected entangled pair states-to go beyond the cluster-state based one-way computer. We identify resource states radically different from the cluster state, in that they exhibit nonvanishing correlations, can be prepared using nonmaximally entangling gates, or have very different local entanglement properties. In the computational models, randomness is compensated in a different manner. It is shown that there exist resource states which are locally arbitrarily close to a pure state. We comment on the possibility of tailoring computational models to specific physical systems.

  5. Conjunction of radial basis function interpolator and artificial intelligence models for time-space modeling of contaminant transport in porous media

    NASA Astrophysics Data System (ADS)

    Nourani, Vahid; Mousavi, Shahram; Dabrowska, Dominika; Sadikoglu, Fahreddin

    2017-05-01

    As an innovation, both black box and physical-based models were incorporated into simulating groundwater flow and contaminant transport. Time series of groundwater level (GL) and chloride concentration (CC) observed at different piezometers of study plain were firstly de-noised by the wavelet-based de-noising approach. The effect of de-noised data on the performance of artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) was evaluated. Wavelet transform coherence was employed for spatial clustering of piezometers. Then for each cluster, ANN and ANFIS models were trained to predict GL and CC values. Finally, considering the predicted water heads of piezometers as interior conditions, the radial basis function as a meshless method which solves partial differential equations of GFCT, was used to estimate GL and CC values at any point within the plain where there is not any piezometer. Results indicated that efficiency of ANFIS based spatiotemporal model was more than ANN based model up to 13%.

  6. Modeling and possible implementation of self-learning equivalence-convolutional neural structures for auto-encoding-decoding and clusterization of images

    NASA Astrophysics Data System (ADS)

    Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.

    2017-08-01

    Self-learning equivalent-convolutional neural structures (SLECNS) for auto-coding-decoding and image clustering are discussed. The SLECNS architectures and their spatially invariant equivalent models (SI EMs) using the corresponding matrix-matrix procedures with basic operations of continuous logic and non-linear processing are proposed. These SI EMs have several advantages, such as the ability to recognize image fragments with better efficiency and strong cross correlation. The proposed clustering method of fragments with regard to their structural features is suitable not only for binary, but also color images and combines self-learning and the formation of weight clustered matrix-patterns. Its model is constructed and designed on the basis of recursively processing algorithms and to k-average method. The experimental results confirmed that larger images and 2D binary fragments with a large numbers of elements may be clustered. For the first time the possibility of generalization of these models for space invariant case is shown. The experiment for an image with dimension of 256x256 (a reference array) and fragments with dimensions of 7x7 and 21x21 for clustering is carried out. The experiments, using the software environment Mathcad, showed that the proposed method is universal, has a significant convergence, the small number of iterations is easily, displayed on the matrix structure, and confirmed its prospects. Thus, to understand the mechanisms of self-learning equivalence-convolutional clustering, accompanying her to the competitive processes in neurons, and the neural auto-encoding-decoding and recognition principles with the use of self-learning cluster patterns is very important which used the algorithm and the principles of non-linear processing of two-dimensional spatial functions of images comparison. These SIEMs can simply describe the signals processing during the all training and recognition stages and they are suitable for unipolar-coding multilevel signals. We show that the implementation of SLECNS based on known equivalentors or traditional correlators is possible if they are based on proposed equivalental two-dimensional functions of image similarity. The clustering efficiency in such models and their implementation depends on the discriminant properties of neural elements of hidden layers. Therefore, the main models and architecture parameters and characteristics depends on the applied types of non-linear processing and function used for image comparison or for adaptive-equivalental weighing of input patterns. Real model experiments in Mathcad are demonstrated, which confirm that non-linear processing on equivalent functions allows you to determine the neuron winners and adjust the weight matrix. Experimental results have shown that such models can be successfully used for auto- and hetero-associative recognition. They can also be used to explain some mechanisms known as "focus" and "competing gain-inhibition concept". The SLECNS architecture and hardware implementations of its basic nodes based on multi-channel convolvers and correlators with time integration are proposed. The parameters and performance of such architectures are estimated.

  7. Assessing the distinguishable cluster approximation based on the triple bond-breaking in the nitrogen molecule

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rishi, Varun; Perera, Ajith; Bartlett, Rodney J., E-mail: bartlett@qtp.ufl.edu

    2016-03-28

    Obtaining the correct potential energy curves for the dissociation of multiple bonds is a challenging problem for ab initio methods which are affected by the choice of a spin-restricted reference function. Coupled cluster (CC) methods such as CCSD (coupled cluster singles and doubles model) and CCSD(T) (CCSD + perturbative triples) correctly predict the geometry and properties at equilibrium but the process of bond dissociation, particularly when more than one bond is simultaneously broken, is much more complicated. New modifications of CC theory suggest that the deleterious role of the reference function can be diminished, provided a particular subset of termsmore » is retained in the CC equations. The Distinguishable Cluster (DC) approach of Kats and Manby [J. Chem. Phys. 139, 021102 (2013)], seemingly overcomes the deficiencies for some bond-dissociation problems and might be of use in quasi-degenerate situations in general. DC along with other approximate coupled cluster methods such as ACCD (approximate coupled cluster doubles), ACP-D45, ACP-D14, 2CC, and pCCSD(α, β) (all defined in text) falls under a category of methods that are basically obtained by the deletion of some quadratic terms in the double excitation amplitude equation for CCD/CCSD (coupled cluster doubles model/coupled cluster singles and doubles model). Here these approximate methods, particularly those based on the DC approach, are studied in detail for the nitrogen molecule bond-breaking. The N{sub 2} problem is further addressed with conventional single reference methods but based on spatial symmetry-broken restricted Hartree–Fock (HF) solutions to assess the use of these references for correlated calculations in the situation where CC methods using fully symmetry adapted SCF solutions fail. The distinguishable cluster method is generalized: 1) to different orbitals for different spins (unrestricted HF based DCD and DCSD), 2) by adding triples correction perturbatively (DCSD(T)) and iteratively (DCSDT-n), and 3) via an excited state approximation through the equation of motion (EOM) approach (EOM-DCD, EOM-DCSD). The EOM-CC method is used to identify lower-energy CC solutions to overcome singularities in the CC potential energy curves. It is also shown that UHF based CC and DC methods behave very similarly in bond-breaking of N{sub 2}, and that using spatially broken but spin preserving SCF references makes the CCSD solutions better than those for DCSD.« less

  8. A dynamical study of Galactic globular clusters under different relaxation conditions

    NASA Astrophysics Data System (ADS)

    Zocchi, A.; Bertin, G.; Varri, A. L.

    2012-03-01

    Aims: We perform a systematic combined photometric and kinematic analysis of a sample of globular clusters under different relaxation conditions, based on their core relaxation time (as listed in available catalogs), by means of two well-known families of spherical stellar dynamical models. Systems characterized by shorter relaxation time scales are expected to be better described by isotropic King models, while less relaxed systems might be interpreted by means of non-truncated, radially-biased anisotropic f(ν) models, originally designed to represent stellar systems produced by a violent relaxation formation process and applied here for the first time to the study of globular clusters. Methods: The comparison between dynamical models and observations is performed by fitting simultaneously surface brightness and velocity dispersion profiles. For each globular cluster, the best-fit model in each family is identified, along with a full error analysis on the relevant parameters. Detailed structural properties and mass-to-light ratios are also explicitly derived. Results: We find that King models usually offer a good representation of the observed photometric profiles, but often lead to less satisfactory fits to the kinematic profiles, independently of the relaxation condition of the systems. For some less relaxed clusters, f(ν) models provide a good description of both observed profiles. Some derived structural characteristics, such as the total mass or the half-mass radius, turn out to be significantly model-dependent. The analysis confirms that, to answer some important dynamical questions that bear on the formation and evolution of globular clusters, it would be highly desirable to acquire larger numbers of accurate kinematic data-points, well distributed over the cluster field. Appendices are available in electronic form at http://www.aanda.org

  9. Chemical evolution of the Magellanic Clouds

    NASA Astrophysics Data System (ADS)

    Barbuy, B.; de Freitas Pacheco, J. A.; Idiart, T.

    We have obtained integrated spectra for 14 clusters in the Magellanic Clouds, on which the spectral indices Hβ, Mg2, Fe5270, Fe5335 were measured. Selecting indices whose behaviour depends essentially on age and metallicity (Hβ and ), together with (B-V) and (V-K) colours, we were able to determine age and metallicities for these clusters, using calibrations based on single stellar population models (Borges et al. 1995). A chemical evolution model which follows a star formation history as indicated by the field population is checked with the age and metallicity data for our sample star clusters.

  10. Intracluster age gradients in numerous young stellar clusters

    NASA Astrophysics Data System (ADS)

    Getman, K. V.; Feigelson, E. D.; Kuhn, M. A.; Bate, M. R.; Broos, P. S.; Garmire, G. P.

    2018-05-01

    The pace and pattern of star formation leading to rich young stellar clusters is quite uncertain. In this context, we analyse the spatial distribution of ages within 19 young (median t ≲ 3 Myr on the Siess et al. time-scale), morphologically simple, isolated, and relatively rich stellar clusters. Our analysis is based on young stellar object (YSO) samples from the Massive Young Star-Forming Complex Study in Infrared and X-ray and Star Formation in Nearby Clouds surveys, and a new estimator of pre-main sequence (PMS) stellar ages, AgeJX, derived from X-ray and near-infrared photometric data. Median cluster ages are computed within four annular subregions of the clusters. We confirm and extend the earlier result of Getman et al. (2014): 80 per cent of the clusters show age trends where stars in cluster cores are younger than in outer regions. Our cluster stacking analyses establish the existence of an age gradient to high statistical significance in several ways. Time-scales vary with the choice of PMS evolutionary model; the inferred median age gradient across the studied clusters ranges from 0.75 to 1.5 Myr pc-1. The empirical finding reported in the present study - late or continuing formation of stars in the cores of star clusters with older stars dispersed in the outer regions - has a strong foundation with other observational studies and with the astrophysical models like the global hierarchical collapse model of Vázquez-Semadeni et al.

  11. An Approach to Cluster EU Member States into Groups According to Pathways of Salmonella in the Farm-to-Consumption Chain for Pork Products.

    PubMed

    Vigre, Håkan; Domingues, Ana Rita Coutinho Calado; Pedersen, Ulrik Bo; Hald, Tine

    2016-03-01

    The aim of the project as the cluster analysis was to in part to develop a generic structured quantitative microbiological risk assessment (QMRA) model of human salmonellosis due to pork consumption in EU member states (MSs), and the objective of the cluster analysis was to group the EU MSs according to the relative contribution of different pathways of Salmonella in the farm-to-consumption chain of pork products. In the development of the model, by selecting a case study MS from each cluster the model was developed to represent different aspects of pig production, pork production, and consumption of pork products across EU states. The objective of the cluster analysis was to aggregate MSs into groups of countries with similar importance of different pathways of Salmonella in the farm-to-consumption chain using available, and where possible, universal register data related to the pork production and consumption in each country. Based on MS-specific information about distribution of (i) small and large farms, (ii) small and large slaughterhouses, (iii) amount of pork meat consumed, and (iv) amount of sausages consumed we used nonhierarchical and hierarchical cluster analysis to group the MSs. The cluster solutions were validated internally using statistic measures and externally by comparing the clustered MSs with an estimated human incidence of salmonellosis due to pork products in the MSs. Finally, each cluster was characterized qualitatively using the centroids of the clusters. © 2016 Society for Risk Analysis.

  12. Statistical models for predicting pair dispersion and particle clustering in isotropic turbulence and their applications

    NASA Astrophysics Data System (ADS)

    Zaichik, Leonid I.; Alipchenkov, Vladimir M.

    2009-10-01

    The purpose of this paper is twofold: (i) to advance and extend the statistical two-point models of pair dispersion and particle clustering in isotropic turbulence that were previously proposed by Zaichik and Alipchenkov (2003 Phys. Fluids15 1776-87 2007 Phys. Fluids 19, 113308) and (ii) to present some applications of these models. The models developed are based on a kinetic equation for the two-point probability density function of the relative velocity distribution of two particles. These models predict the pair relative velocity statistics and the preferential accumulation of heavy particles in stationary and decaying homogeneous isotropic turbulent flows. Moreover, the models are applied to predict the effect of particle clustering on turbulent collisions, sedimentation and intensity of microwave radiation as well as to calculate the mean filtered subgrid stress of the particulate phase. Model predictions are compared with direct numerical simulations and experimental measurements.

  13. GraphCrunch 2: Software tool for network modeling, alignment and clustering.

    PubMed

    Kuchaiev, Oleksii; Stevanović, Aleksandar; Hayes, Wayne; Pržulj, Nataša

    2011-01-19

    Recent advancements in experimental biotechnology have produced large amounts of protein-protein interaction (PPI) data. The topology of PPI networks is believed to have a strong link to their function. Hence, the abundance of PPI data for many organisms stimulates the development of computational techniques for the modeling, comparison, alignment, and clustering of networks. In addition, finding representative models for PPI networks will improve our understanding of the cell just as a model of gravity has helped us understand planetary motion. To decide if a model is representative, we need quantitative comparisons of model networks to real ones. However, exact network comparison is computationally intractable and therefore several heuristics have been used instead. Some of these heuristics are easily computable "network properties," such as the degree distribution, or the clustering coefficient. An important special case of network comparison is the network alignment problem. Analogous to sequence alignment, this problem asks to find the "best" mapping between regions in two networks. It is expected that network alignment might have as strong an impact on our understanding of biology as sequence alignment has had. Topology-based clustering of nodes in PPI networks is another example of an important network analysis problem that can uncover relationships between interaction patterns and phenotype. We introduce the GraphCrunch 2 software tool, which addresses these problems. It is a significant extension of GraphCrunch which implements the most popular random network models and compares them with the data networks with respect to many network properties. Also, GraphCrunch 2 implements the GRAph ALigner algorithm ("GRAAL") for purely topological network alignment. GRAAL can align any pair of networks and exposes large, dense, contiguous regions of topological and functional similarities far larger than any other existing tool. Finally, GraphCruch 2 implements an algorithm for clustering nodes within a network based solely on their topological similarities. Using GraphCrunch 2, we demonstrate that eukaryotic and viral PPI networks may belong to different graph model families and show that topology-based clustering can reveal important functional similarities between proteins within yeast and human PPI networks. GraphCrunch 2 is a software tool that implements the latest research on biological network analysis. It parallelizes computationally intensive tasks to fully utilize the potential of modern multi-core CPUs. It is open-source and freely available for research use. It runs under the Windows and Linux platforms.

  14. Weighted community detection and data clustering using message passing

    NASA Astrophysics Data System (ADS)

    Shi, Cheng; Liu, Yanchen; Zhang, Pan

    2018-03-01

    Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.

  15. Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data.

    PubMed

    Mwangi, Benson; Soares, Jair C; Hasan, Khader M

    2014-10-30

    Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Clustering gene expression data based on predicted differential effects of GV interaction.

    PubMed

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  17. STELLAR ENCOUNTER RATE IN GALACTIC GLOBULAR CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bahramian, Arash; Heinke, Craig O.; Sivakoff, Gregory R.

    2013-04-01

    The high stellar densities in the cores of globular clusters cause significant stellar interactions. These stellar interactions can produce close binary mass-transferring systems involving compact objects and their progeny, such as X-ray binaries and radio millisecond pulsars. Comparing the numbers of these systems and interaction rates in different clusters drives our understanding of how cluster parameters affect the production of close binaries. In this paper we estimate stellar encounter rates ({Gamma}) for 124 Galactic globular clusters based on observational data as opposed to the methods previously employed, which assumed 'King-model' profiles for all clusters. By deprojecting cluster surface brightness profilesmore » to estimate luminosity density profiles, we treat 'King-model' and 'core-collapsed' clusters in the same way. In addition, we use Monte Carlo simulations to investigate the effects of uncertainties in various observational parameters (distance, reddening, surface brightness) on {Gamma}, producing the first catalog of globular cluster stellar encounter rates with estimated errors. Comparing our results with published observations of likely products of stellar interactions (numbers of X-ray binaries, numbers of radio millisecond pulsars, and {gamma}-ray luminosity) we find both clear correlations and some differences with published results.« less

  18. Clustering and classification of infrasonic events at Mount Etna using pattern recognition techniques

    NASA Astrophysics Data System (ADS)

    Cannata, A.; Montalto, P.; Aliotta, M.; Cassisi, C.; Pulvirenti, A.; Privitera, E.; Patanè, D.

    2011-04-01

    Active volcanoes generate sonic and infrasonic signals, whose investigation provides useful information for both monitoring purposes and the study of the dynamics of explosive phenomena. At Mt. Etna volcano (Italy), a pattern recognition system based on infrasonic waveform features has been developed. First, by a parametric power spectrum method, the features describing and characterizing the infrasound events were extracted: peak frequency and quality factor. Then, together with the peak-to-peak amplitude, these features constituted a 3-D ‘feature space’; by Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN) three clusters were recognized inside it. After the clustering process, by using a common location method (semblance method) and additional volcanological information concerning the intensity of the explosive activity, we were able to associate each cluster to a particular source vent and/or a kind of volcanic activity. Finally, for automatic event location, clusters were used to train a model based on Support Vector Machine, calculating optimal hyperplanes able to maximize the margins of separation among the clusters. After the training phase this system automatically allows recognizing the active vent with no location algorithm and by using only a single station.

  19. Clustering biomolecular complexes by residue contacts similarity.

    PubMed

    Rodrigues, João P G L M; Trellet, Mikaël; Schmitz, Christophe; Kastritis, Panagiotis; Karaca, Ezgi; Melquiond, Adrien S J; Bonvin, Alexandre M J J

    2012-07-01

    Inaccuracies in computational molecular modeling methods are often counterweighed by brute-force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions. Albeit widely used, these measures suffer from several theoretical and technical limitations (e.g., choice of regions for fitting) that impair their application in multicomponent systems (N > 2), large-scale studies (e.g., interactomes), and other time-critical scenarios. We present here a simple similarity measure for structural clustering based on atomic contacts--the fraction of common contacts--and compare it with the most used similarity measure of the protein docking community--interface backbone RMSD. We show that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein-protein and protein-DNA complexes. Furthermore, it allows easy clustering of similar conformations of multicomponent symmetrical assemblies in which chain permutations can occur. Simple contact-based metrics should be applicable to other structural biology clustering problems, in particular for time-critical or large-scale endeavors. Copyright © 2012 Wiley Periodicals, Inc.

  20. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  1. Poisson Mixture Regression Models for Heart Disease Prediction

    PubMed Central

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  2. Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions

    NASA Astrophysics Data System (ADS)

    Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard

    2014-09-01

    Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.

  3. Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G., E-mail: yannis@princeton.edu, E-mail: gerhard.hummer@biophys.mpg.de

    Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlapmore » with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.« less

  4. Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions

    PubMed Central

    Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard

    2014-01-01

    Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space. PMID:25240340

  5. NASA Radiation Track Image GUI for Assessing Space Radiation Biological Effects

    NASA Technical Reports Server (NTRS)

    Ponomarev, Artem L.; Cucinotta, Francis A.

    2006-01-01

    The high-charge high-energy (HZE) ion components of the galactic cosmic rays when compared to terrestrial forms of radiations present unique challenges to biological systems. In this paper we present a deoxyribonucleic acid (DNA) breakage model to visualize and analyze the impact of chromatin domains and DNA loops on clustering of DNA damage from X rays, protons, and HZE ions. Our model of DNA breakage is based on a stochastic process of DNA double-strand break (DSB) formulation that includes the amorphous model of the radiation track and a polymer model of DNA packed in the cell nucleus. Our model is a Monte-Carlo simulation based on a randomly located DSB cluster formulation that accomodates both high- and low-linear energy transfer radiations. We demonstrate that HZE ions have a strong impact on DSB clustering, both along the chromosome length and in the nucleus volume. The effects of chromosomal domains and DNA loops on the DSB fragment-size distribution and the spatial distribution of DSB in the nucleus were studied. We compare our model predictions with the spatial distribution of DSB obtained from experiments. The implications of our model predictions for radiation protection are discussed.

  6. Correspondence between ion-cluster and bulk thermodynamics: on the validity of the cluster pair approximation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vlcek, Lukas; Chialvo, Ariel; Simonson, J Michael

    2013-01-01

    Molecular models and experimental estimates based on the cluster pair approximation (CPA) provide inconsistent predictions of absolute single-ion hydration properties. To understand the origin of this discrepancy we used molecular simulations to study the transition between hydration of alkali metal and halide ions in small aqueous clusters and bulk water. The results demonstrate that the assumptions underlying the CPA are not generally valid as a result of a significant shift in the ion hydration free energies (~15 kJ/mol) and enthalpies (~47 kJ/mol) in the intermediate range of cluster sizes. When this effect is accounted for, the systematic differences between modelsmore » and experimental predictions disappear, and the value of absolute proton hydration enthalpy based on the CPA gets in closer agreement with other estimates.« less

  7. Simulation modeling for stratified breast cancer screening - a systematic review of cost and quality of life assumptions.

    PubMed

    Arnold, Matthias

    2017-12-02

    The economic evaluation of stratified breast cancer screening gains momentum, but produces also very diverse results. Systematic reviews so far focused on modeling techniques and epidemiologic assumptions. However, cost and utility parameters received only little attention. This systematic review assesses simulation models for stratified breast cancer screening based on their cost and utility parameters in each phase of breast cancer screening and care. A literature review was conducted to compare economic evaluations with simulation models of personalized breast cancer screening. Study quality was assessed using reporting guidelines. Cost and utility inputs were extracted, standardized and structured using a care delivery framework. Studies were then clustered according to their study aim and parameters were compared within the clusters. Eighteen studies were identified within three study clusters. Reporting quality was very diverse in all three clusters. Only two studies in cluster 1, four studies in cluster 2 and one study in cluster 3 scored high in the quality appraisal. In addition to the quality appraisal, this review assessed if the simulation models were consistent in integrating all relevant phases of care, if utility parameters were consistent and methodological sound and if cost were compatible and consistent in the actual parameters used for screening, diagnostic work up and treatment. Of 18 studies, only three studies did not show signs of potential bias. This systematic review shows that a closer look into the cost and utility parameter can help to identify potential bias. Future simulation models should focus on integrating all relevant phases of care, using methodologically sound utility parameters and avoiding inconsistent cost parameters.

  8. Cluster-guided imaging-based CFD analysis of airflow and particle deposition in asthmatic human lungs

    NASA Astrophysics Data System (ADS)

    Choi, Jiwoong; Leblanc, Lawrence; Choi, Sanghun; Haghighi, Babak; Hoffman, Eric; Lin, Ching-Long

    2017-11-01

    The goal of this study is to assess inter-subject variability in delivery of orally inhaled drug products to small airways in asthmatic lungs. A recent multiscale imaging-based cluster analysis (MICA) of computed tomography (CT) lung images in an asthmatic cohort identified four clusters with statistically distinct structural and functional phenotypes associating with unique clinical biomarkers. Thus, we aimed to address inter-subject variability via inter-cluster variability. We selected a representative subject from each of the 4 asthma clusters as well as 1 male and 1 female healthy controls, and performed computational fluid and particle simulations on CT-based airway models of these subjects. The results from one severe and one non-severe asthmatic cluster subjects characterized by segmental airway constriction had increased particle deposition efficiency, as compared with the other two cluster subjects (one non-severe and one severe asthmatics) without airway constriction. Constriction-induced jets impinging on distal bifurcations led to excessive particle deposition. The results emphasize the impact of airway constriction on regional particle deposition rather than disease severity, demonstrating the potential of using cluster membership to tailor drug delivery. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837. XSEDE.

  9. Fragment-based 13C nuclear magnetic resonance chemical shift predictions in molecular crystals: An alternative to planewave methods

    NASA Astrophysics Data System (ADS)

    Hartman, Joshua D.; Monaco, Stephen; Schatschneider, Bohdan; Beran, Gregory J. O.

    2015-09-01

    We assess the quality of fragment-based ab initio isotropic 13C chemical shift predictions for a collection of 25 molecular crystals with eight different density functionals. We explore the relative performance of cluster, two-body fragment, combined cluster/fragment, and the planewave gauge-including projector augmented wave (GIPAW) models relative to experiment. When electrostatic embedding is employed to capture many-body polarization effects, the simple and computationally inexpensive two-body fragment model predicts both isotropic 13C chemical shifts and the chemical shielding tensors as well as both cluster models and the GIPAW approach. Unlike the GIPAW approach, hybrid density functionals can be used readily in a fragment model, and all four hybrid functionals tested here (PBE0, B3LYP, B3PW91, and B97-2) predict chemical shifts in noticeably better agreement with experiment than the four generalized gradient approximation (GGA) functionals considered (PBE, OPBE, BLYP, and BP86). A set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided based on these benchmark calculations. Statistical cross-validation procedures are used to demonstrate the robustness of these fits.

  10. Fragment-based (13)C nuclear magnetic resonance chemical shift predictions in molecular crystals: An alternative to planewave methods.

    PubMed

    Hartman, Joshua D; Monaco, Stephen; Schatschneider, Bohdan; Beran, Gregory J O

    2015-09-14

    We assess the quality of fragment-based ab initio isotropic (13)C chemical shift predictions for a collection of 25 molecular crystals with eight different density functionals. We explore the relative performance of cluster, two-body fragment, combined cluster/fragment, and the planewave gauge-including projector augmented wave (GIPAW) models relative to experiment. When electrostatic embedding is employed to capture many-body polarization effects, the simple and computationally inexpensive two-body fragment model predicts both isotropic (13)C chemical shifts and the chemical shielding tensors as well as both cluster models and the GIPAW approach. Unlike the GIPAW approach, hybrid density functionals can be used readily in a fragment model, and all four hybrid functionals tested here (PBE0, B3LYP, B3PW91, and B97-2) predict chemical shifts in noticeably better agreement with experiment than the four generalized gradient approximation (GGA) functionals considered (PBE, OPBE, BLYP, and BP86). A set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided based on these benchmark calculations. Statistical cross-validation procedures are used to demonstrate the robustness of these fits.

  11. Joint model-based clustering of nonlinear longitudinal trajectories and associated time-to-event data analysis, linked by latent class membership: with application to AIDS clinical studies.

    PubMed

    Huang, Yangxin; Lu, Xiaosun; Chen, Jiaqing; Liang, Juan; Zangmeister, Miriam

    2017-10-27

    Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately.

  12. IoT Service Clustering for Dynamic Service Matchmaking.

    PubMed

    Zhao, Shuai; Yu, Le; Cheng, Bo; Chen, Junliang

    2017-07-27

    As the adoption of service-oriented paradigms in the IoT (Internet of Things) environment, real-world devices will open their capabilities through service interfaces, which enable other functional entities to interact with them. In an IoT application, it is indispensable to find suitable services for satisfying users' requirements or replacing the unavailable services. However, from the perspective of performance, it is inappropriate to find desired services from the service repository online directly. Instead, clustering services offline according to their similarity and matchmaking or discovering service online in limited clusters is necessary. This paper proposes a multidimensional model-based approach to measure the similarity between IoT services. Then, density-peaks-based clustering is employed to gather similar services together according to the result of similarity measurement. Based on the service clustering, the algorithms of dynamic service matchmaking, discovery, and replacement will be performed efficiently. Evaluating experiments are conducted to validate the performance of proposed approaches, and the results are promising.

  13. IoT Service Clustering for Dynamic Service Matchmaking

    PubMed Central

    Yu, Le; Cheng, Bo; Chen, Junliang

    2017-01-01

    As the adoption of service-oriented paradigms in the IoT (Internet of Things) environment, real-world devices will open their capabilities through service interfaces, which enable other functional entities to interact with them. In an IoT application, it is indispensable to find suitable services for satisfying users’ requirements or replacing the unavailable services. However, from the perspective of performance, it is inappropriate to find desired services from the service repository online directly. Instead, clustering services offline according to their similarity and matchmaking or discovering service online in limited clusters is necessary. This paper proposes a multidimensional model-based approach to measure the similarity between IoT services. Then, density-peaks-based clustering is employed to gather similar services together according to the result of similarity measurement. Based on the service clustering, the algorithms of dynamic service matchmaking, discovery, and replacement will be performed efficiently. Evaluating experiments are conducted to validate the performance of proposed approaches, and the results are promising. PMID:28749431

  14. Data-driven process decomposition and robust online distributed modelling for large-scale processes

    NASA Astrophysics Data System (ADS)

    Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou

    2018-02-01

    With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.

  15. A comprehensive comparative test of seven widely used spectral synthesis models against multi-band photometry of young massive-star clusters

    NASA Astrophysics Data System (ADS)

    Wofford, A.; Charlot, S.; Bruzual, G.; Eldridge, J. J.; Calzetti, D.; Adamo, A.; Cignoni, M.; de Mink, S. E.; Gouliermis, D. A.; Grasha, K.; Grebel, E. K.; Lee, J. C.; Östlin, G.; Smith, L. J.; Ubeda, L.; Zackrisson, E.

    2016-04-01

    We test the predictions of spectral synthesis models based on seven different massive-star prescriptions against Legacy ExtraGalactic UV Survey (LEGUS) observations of eight young massive clusters in two local galaxies, NGC 1566 and NGC 5253, chosen because predictions of all seven models are available at the published galactic metallicities. The high angular resolution, extensive cluster inventory, and full near-ultraviolet to near-infrared photometric coverage make the LEGUS data set excellent for this study. We account for both stellar and nebular emission in the models and try two different prescriptions for attenuation by dust. From Bayesian fits of model libraries to the observations, we find remarkably low dispersion in the median E(B - V) (˜0.03 mag), stellar masses (˜104 M⊙), and ages (˜1 Myr) derived for individual clusters using different models, although maximum discrepancies in these quantities can reach 0.09 mag and factors of 2.8 and 2.5, respectively. This is for ranges in median properties of 0.05-0.54 mag, 1.8-10 × 104 M⊙, and 1.6-40 Myr spanned by the clusters in our sample. In terms of best fit, the observations are slightly better reproduced by models with interacting binaries and least well reproduced by models with single rotating stars. Our study provides a first quantitative estimate of the accuracies and uncertainties of the most recent spectral synthesis models of young stellar populations, demonstrates the good progress of models in fitting high-quality observations, and highlights the needs for a larger cluster sample and more extensive tests of the model parameter space.

  16. Avulsion Clusters in Alluvial Systems: An Example of Large-Scale Self-Organization in Ancient and Experimental Basins

    NASA Astrophysics Data System (ADS)

    Hajek, E.; Heller, P.; Huzurbazar, S.; Sheets, B.; Paola, C.

    2006-12-01

    The stratigraphic record of at least some alluvial basins exhibits a spatial structure that may reflect long time- scale (103-105 yr in natural basins) autogenic organization of river avulsions. Current models of avulsion-dominated alluvial sequences emphasize the spatial and temporal distribution of coarse-grained channel-belt deposits amid fine-grained floodplain materials. These models typically assume that individual avulsions move, either randomly or deterministically, to low spots distributed throughout the model space. However, our observations of ancient deposits and experimental stratigraphy indicate a previously unrecognized pattern of channel-belt organization, where clusters of closely-spaced channel-belt deposits are separated from each other by extensive intervals of overbank deposits. We explore potential causes of and controls on avulsion clustering with outcrop and subsurface data from Late Cretaceous/Early Paleogene fluvial deposits in the Rocky Mountains (including the Ferris, Lance, and Fort Union formations of Wyoming) and results of physical stratigraphy experiments from the St. Anthony Falls Lab, University of Minnesota. We use Ripley's K-function to determine the degree and scales of clustering in these basins with results that show moderate statistical clustering in experimental deposits and strong clustering in the Ferris Formation (Hanna Basin, Wyoming). External controls (base level, subsidence rate, and sediment/water supplies) were not varied during the experiment, and therefore not factors in cluster formation. Likewise, the stratigraphic context of the ancient system (including the absence of incised valleys and lack of faulting) suggests that obvious extrinsic controls, such as base level change and local tectonics, were not major influences on the development of clusters. We propose that avulsion clusters, as seen in this study, reflect a scale of self-organization in alluvial basins that is not usually recognized in stratigraphy. However cursory examination of other ancient systems suggests that such structure may be common in the rock record. Understanding mechanisms driving avulsion clustering will shed light on the dominant processes in alluvial basins over long time scales. Furthermore, characterizing autogenic avulsion clusters will be an important factor to consider when interpreting allogenic signals in ancient basin fills.

  17. Critical thinking in higher education: The influence of teaching styles and peer collaboration on science and math learning

    NASA Astrophysics Data System (ADS)

    Quitadamo, Ian Joseph

    Many higher education faculty perceive a deficiency in students' ability to reason, evaluate, and make informed judgments, skills that are deemed necessary for academic and job success in science and math. These skills, often collected within a domain called critical thinking (CT), have been studied and are thought to be influenced by teaching styles (the combination of beliefs, behavior, and attitudes used when teaching) and small group collaborative learning (SGCL). However, no existing studies show teaching styles and SGCL cause changes in student CT performance. This study determined how combinations of teaching styles called clusters and peer-facilitated SGCL (a specific form of SGCL) affect changes in undergraduate student CT performance using a quasi-experimental pre-test/post-test research design and valid and reliable CT performance indicators. Quantitative analyses of three teaching style cluster models (Grasha's cluster model, a weighted cluster model, and a student-centered/teacher-centered cluster model) and peer-facilitated SGCL were performed to evaluate their ability to cause measurable changes in student CT skills. Based on results that indicated weighted teaching style clusters and peer-facilitated SGCL are associated with significant changes in student CT, we conclude that teaching styles and peer-facilitated SGCL influence the development of undergraduate CT in higher education science and math.

  18. Mechanism of cell alignment in groups of Myxococcus xanthus bacteria

    NASA Astrophysics Data System (ADS)

    Balgam, Rajesh; Igoshin, Oleg

    2015-03-01

    Myxococcus xanthus is a model for studying self-organization in bacteria. These flexible cylindrical bacteria move along. In groups, M. xanthus cells align themselves into dynamic cell clusters but the mechanism underlying their formation is unknown. It has been shown that steric interactions can cause alignment in self-propelled hard rods but it is not clear how flexibility and reversals affect the alignment and cluster formation. We have investigated cell alignment process using our biophysical model of M. xanthus cell in an agent-based simulation framework under realistic cell flexibility values. We observed that flexible model cells can form aligned cell clusters when reversals are suppressed but these clusters disappeared when reversals frequency becomes similar to the observed value. However, M. xanthus cells follow slime (polysaccharide gel like material) trails left by other cells and we show that implementing this into our model rescues cell clustering for reversing cells. Our results show that slime following along with periodic cell reversals act as positive feedback to reinforce existing slime trails and recruit more cells. Furthermore, we have observed that mechanical cell alignment combined with slime following is sufficient to explain the distinct clustering patterns of reversing and non-reversing cells as observed in recent experiments. This work is supported by NSF MCB 0845919 and 1411780.

  19. Prediction of settled water turbidity and optimal coagulant dosage in drinking water treatment plant using a hybrid model of k-means clustering and adaptive neuro-fuzzy inference system

    NASA Astrophysics Data System (ADS)

    Kim, Chan Moon; Parnichkun, Manukid

    2017-11-01

    Coagulation is an important process in drinking water treatment to attain acceptable treated water quality. However, the determination of coagulant dosage is still a challenging task for operators, because coagulation is nonlinear and complicated process. Feedback control to achieve the desired treated water quality is difficult due to lengthy process time. In this research, a hybrid of k-means clustering and adaptive neuro-fuzzy inference system ( k-means-ANFIS) is proposed for the settled water turbidity prediction and the optimal coagulant dosage determination using full-scale historical data. To build a well-adaptive model to different process states from influent water, raw water quality data are classified into four clusters according to its properties by a k-means clustering technique. The sub-models are developed individually on the basis of each clustered data set. Results reveal that the sub-models constructed by a hybrid k-means-ANFIS perform better than not only a single ANFIS model, but also seasonal models by artificial neural network (ANN). The finally completed model consisting of sub-models shows more accurate and consistent prediction ability than a single model of ANFIS and a single model of ANN based on all five evaluation indices. Therefore, the hybrid model of k-means-ANFIS can be employed as a robust tool for managing both treated water quality and production costs simultaneously.

  20. Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes

    PubMed Central

    Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli

    2014-01-01

    Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095

  1. Clustering-based ensemble learning for activity recognition in smart homes.

    PubMed

    Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli

    2014-07-10

    Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.

  2. Efficient ensemble forecasting of marine ecology with clustered 1D models and statistical lateral exchange: application to the Red Sea

    NASA Astrophysics Data System (ADS)

    Dreano, Denis; Tsiaras, Kostas; Triantafyllou, George; Hoteit, Ibrahim

    2017-07-01

    Forecasting the state of large marine ecosystems is important for many economic and public health applications. However, advanced three-dimensional (3D) ecosystem models, such as the European Regional Seas Ecosystem Model (ERSEM), are computationally expensive, especially when implemented within an ensemble data assimilation system requiring several parallel integrations. As an alternative to 3D ecological forecasting systems, we propose to implement a set of regional one-dimensional (1D) water-column ecological models that run at a fraction of the computational cost. The 1D model domains are determined using a Gaussian mixture model (GMM)-based clustering method and satellite chlorophyll-a (Chl-a) data. Regionally averaged Chl-a data is assimilated into the 1D models using the singular evolutive interpolated Kalman (SEIK) filter. To laterally exchange information between subregions and improve the forecasting skills, we introduce a new correction step to the assimilation scheme, in which we assimilate a statistical forecast of future Chl-a observations based on information from neighbouring regions. We apply this approach to the Red Sea and show that the assimilative 1D ecological models can forecast surface Chl-a concentration with high accuracy. The statistical assimilation step further improves the forecasting skill by as much as 50%. This general approach of clustering large marine areas and running several interacting 1D ecological models is very flexible. It allows many combinations of clustering, filtering and regression technics to be used and can be applied to build efficient forecasting systems in other large marine ecosystems.

  3. Triggering active galactic nuclei in galaxy clusters

    NASA Astrophysics Data System (ADS)

    Marshall, Madeline A.; Shabala, Stanislav S.; Krause, Martin G. H.; Pimbblet, Kevin A.; Croton, Darren J.; Owers, Matt S.

    2018-03-01

    We model the triggering of active galactic nuclei (AGN) in galaxy clusters using the semi-analytic galaxy formation model SAGE. We prescribe triggering methods based on the ram pressure galaxies experience as they move throughout the intracluster medium, which is hypothesized to trigger star formation and AGN activity. The clustercentric radius and velocity distribution of the simulated active galaxies produced by these models are compared with those of AGN and galaxies with intense star formation from a sample of low-redshift relaxed clusters from the Sloan Digital Sky Survey. The ram pressure triggering model that best explains the clustercentric radius and velocity distribution of these observed galaxies has AGN and star formation triggered if 2.5 × 10-14 Pa < Pram < 2.5 × 10-13 Pa and Pram > 2Pinternal; this is consistent with expectations from hydrodynamical simulations of ram-pressure-induced star formation. Our results show that ram pressure is likely to be an important mechanism for triggering star formation and AGN activity in clusters.

  4. Surface properties for α-cluster nuclear matter

    NASA Astrophysics Data System (ADS)

    Castro, J. J.; Soto, J. R.; Yépez, E.

    2013-03-01

    We introduce a new microscopic model for α-cluster matter, which simulates the properties of ordinary nuclear matter and α-clustering in a curved surface of a large but finite nucleus. The model is based on a nested icosahedral fullerene-like multiple-shell structure, where each vertex is occupied by a microscopic α-particle. The novel aspect of this model is that it allows a consistent description of nuclear surface properties from microscopic parameters to be made without using the leptodermous expansion. In particular, we show that the calculated surface energy is in excellent agreement with the corresponding coefficient of the Bethe-Weizäcker semi-empirical mass formula. We discuss the properties of the surface α-cluster state, which resembles an ultra cold bosonic quantum gas trapped in an optical lattice. By comparing the surface and interior states we are able to estimate the α preformation probability. Possible extensions of this model to study nuclear dynamics through surface vibrations and departures from approximate sphericity are mentioned.

  5. Control of clustered action potential firing in a mathematical model of entorhinal cortex stellate cells.

    PubMed

    Tait, Luke; Wedgwood, Kyle; Tsaneva-Atanasova, Krasimira; Brown, Jon T; Goodfellow, Marc

    2018-07-14

    The entorhinal cortex is a crucial component of our memory and spatial navigation systems and is one of the first areas to be affected in dementias featuring tau pathology, such as Alzheimer's disease and frontotemporal dementia. Electrophysiological recordings from principle cells of medial entorhinal cortex (layer II stellate cells, mEC-SCs) demonstrate a number of key identifying properties including subthreshold oscillations in the theta (4-12 Hz) range and clustered action potential firing. These single cell properties are correlated with network activity such as grid firing and coupling between theta and gamma rhythms, suggesting they are important for spatial memory. As such, experimental models of dementia have revealed disruption of organised dorsoventral gradients in clustered action potential firing. To better understand the mechanisms underpinning these different dynamics, we study a conductance based model of mEC-SCs. We demonstrate that the model, driven by extrinsic noise, can capture quantitative differences in clustered action potential firing patterns recorded from experimental models of tau pathology and healthy animals. The differential equation formulation of our model allows us to perform numerical bifurcation analyses in order to uncover the dynamic mechanisms underlying these patterns. We show that clustered dynamics can be understood as subcritical Hopf/homoclinic bursting in a fast-slow system where the slow sub-system is governed by activation of the persistent sodium current and inactivation of the slow A-type potassium current. In the full system, we demonstrate that clustered firing arises via flip bifurcations as conductance parameters are varied. Our model analyses confirm the experimentally suggested hypothesis that the breakdown of clustered dynamics in disease occurs via increases in AHP conductance. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. Identifying Two Groups of Entitled Individuals: Cluster Analysis Reveals Emotional Stability and Self-Esteem Distinction.

    PubMed

    Crowe, Michael L; LoPilato, Alexander C; Campbell, W Keith; Miller, Joshua D

    2016-12-01

    The present study hypothesized that there exist two distinct groups of entitled individuals: grandiose-entitled, and vulnerable-entitled. Self-report scores of entitlement were collected for 916 individuals using an online platform. Model-based cluster analyses were conducted on the individuals with scores one standard deviation above mean (n = 159) using the five-factor model dimensions as clustering variables. The results support the existence of two groups of entitled individuals categorized as emotionally stable and emotionally vulnerable. The emotionally stable cluster reported emotional stability, high self-esteem, more positive affect, and antisocial behavior. The emotionally vulnerable cluster reported low self-esteem and high levels of neuroticism, disinhibition, conventionality, psychopathy, negative affect, childhood abuse, intrusive parenting, and attachment difficulties. Compared to the control group, both clusters reported being more antagonistic, extraverted, Machiavellian, and narcissistic. These results suggest important differences are missed when simply examining the linear relationships between entitlement and various aspects of its nomological network.

  7. [Application of Kohonen Self-Organizing Feature Maps in QSAR of human ADMET and kinase data sets].

    PubMed

    Hegymegi-Barakonyi, Bálint; Orfi, László; Kéri, György; Kövesdi, István

    2013-01-01

    QSAR predictions have been proven very useful in a large number of studies for drug design, such as kinase inhibitor design as targets for cancer therapy, however the overall predictability often remains unsatisfactory. To improve predictability of ADMET features and kinase inhibitory data, we present a new method using Kohonen's Self-Organizing Feature Map (SOFM) to cluster molecules based on explanatory variables (X) and separate dissimilar ones. We calculated SOFM clusters for a large number of molecules with human ADMET and kinase inhibitory data, and we showed that chemically similar molecules were in the same SOFM cluster, and within such clusters the QSAR models had significantly better predictability. We used also target variables (Y, e.g. ADMET) jointly with X variables to create a novel type of clustering. With our method, cells of loosely coupled XY data could be identified and separated into different model building sets.

  8. Discrete bivariate population balance modelling of heteroaggregation processes.

    PubMed

    Rollié, Sascha; Briesen, Heiko; Sundmacher, Kai

    2009-08-15

    Heteroaggregation in binary particle mixtures was simulated with a discrete population balance model in terms of two internal coordinates describing the particle properties. The considered particle species are of different size and zeta-potential. Property space is reduced with a semi-heuristic approach to enable an efficient solution. Aggregation rates are based on deterministic models for Brownian motion and stability, under consideration of DLVO interaction potentials. A charge-balance kernel is presented, relating the electrostatic surface potential to the property space by a simple charge balance. Parameter sensitivity with respect to the fractal dimension, aggregate size, hydrodynamic correction, ionic strength and absolute particle concentration was assessed. Results were compared to simulations with the literature kernel based on geometric coverage effects for clusters with heterogeneous surface properties. In both cases electrostatic phenomena, which dominate the aggregation process, show identical trends: impeded cluster-cluster aggregation at low particle mixing ratio (1:1), restabilisation at high mixing ratios (100:1) and formation of complex clusters for intermediate ratios (10:1). The particle mixing ratio controls the surface coverage extent of the larger particle species. Simulation results are compared to experimental flow cytometric data and show very satisfactory agreement.

  9. Community trait overdispersion due to trophic interactions: concerns for assembly process inference

    PubMed Central

    Petchey, Owen L.

    2016-01-01

    The expected link between competitive exclusion and community trait overdispersion has been used to infer competition in local communities, and trait clustering has been interpreted as habitat filtering. Such community assembly process inference has received criticism for ignoring trophic interactions, as competition and trophic interactions might create similar trait patterns. While other theoretical studies have generally demonstrated the importance of predation for coexistence, ours provides the first quantitative demonstration of such effects on assembly process inference, using a trait-based ecological model to simulate the assembly of a competitive primary consumer community with and without the influence of trophic interactions. We quantified and contrasted trait dispersion/clustering of the competitive communities with the absence and presence of secondary consumers. Trophic interactions most often decreased trait clustering (i.e. increased dispersion) in the competitive communities due to evenly distributed invasions of secondary consumers and subsequent competitor extinctions over trait space. Furthermore, effects of trophic interactions were somewhat dependent on model parameters and clustering metric. These effects create considerable problems for process inference from trait distributions; one potential solution is to use more process-based and inclusive models in inference. PMID:27733548

  10. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology

    PubMed Central

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767

  11. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.

    PubMed

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.

  12. The Atacama Cosmology Telescope: Cosmology from Galaxy Clusters Detected Via the Sunyaev-Zel'dovich Effect

    NASA Technical Reports Server (NTRS)

    Sehgal, Neelima; Trac, Hy; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John W.; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard; hide

    2010-01-01

    We present constraints on cosmological parameters based on a sample of Sunyaev-Zel'dovich-selected galaxy clusters detected in a millimeter-wave survey by the Atacama Cosmology Telescope. The cluster sample used in this analysis consists of 9 optically-confirmed high-mass clusters comprising the high-significance end of the total cluster sample identified in 455 square degrees of sky surveyed during 2008 at 148 GHz. We focus on the most massive systems to reduce the degeneracy between unknown cluster astrophysics and cosmology derived from SZ surveys. We describe the scaling relation between cluster mass and SZ signal with a 4-parameter fit. Marginalizing over the values of the parameters in this fit with conservative priors gives (sigma)8 = 0.851 +/- 0.115 and w = -1.14 +/- 0.35 for a spatially-flat wCDM cosmological model with WMAP 7-year priors on cosmological parameters. This gives a modest improvement in statistical uncertainty over WMAP 7-year constraints alone. Fixing the scaling relation between cluster mass and SZ signal to a fiducial relation obtained from numerical simulations and calibrated by X-ray observations, we find (sigma)8 + 0.821 +/- 0.044 and w = -1.05 +/- 0.20. These results are consistent with constraints from WMAP 7 plus baryon acoustic oscillations plus type Ia supernova which give (sigma)8 = 0.802 +/- 0.038 and w = -0.98 +/- 0.053. A stacking analysis of the clusters in this sample compared to clusters simulated assuming the fiducial model also shows good agreement. These results suggest that, given the sample of clusters used here, both the astrophysics of massive clusters and the cosmological parameters derived from them are broadly consistent with current models.

  13. Applications of modern statistical methods to analysis of data in physical science

    NASA Astrophysics Data System (ADS)

    Wicker, James Eric

    Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.

  14. Model-based document categorization employing semantic pattern analysis and local structure clustering

    NASA Astrophysics Data System (ADS)

    Fume, Kosei; Ishitani, Yasuto

    2008-01-01

    We propose a document categorization method based on a document model that can be defined externally for each task and that categorizes Web content or business documents into a target category in accordance with the similarity of the model. The main feature of the proposed method consists of two aspects of semantics extraction from an input document. The semantics of terms are extracted by the semantic pattern analysis and implicit meanings of document substructure are specified by a bottom-up text clustering technique focusing on the similarity of text line attributes. We have constructed a system based on the proposed method for trial purposes. The experimental results show that the system achieves more than 80% classification accuracy in categorizing Web content and business documents into 15 or 70 categories.

  15. Systematic Study on the Self-Assembled Hexagonal Au Voids, Nano-Clusters and Nanoparticles on GaN (0001).

    PubMed

    Pandey, Puran; Sui, Mao; Li, Ming-Yu; Zhang, Quanzhen; Kim, Eun-Soo; Lee, Jihoon

    2015-01-01

    Au nano-clusters and nanoparticles (NPs) have been widely utilized in various electronic, optoelectronic, and bio-medical applications due to their great potentials. The size, density and configuration of Au NPs play a vital role in the performance of these devices. In this paper, we present a systematic study on the self-assembled hexagonal Au voids, nano-clusters and NPs fabricated on GaN (0001) by the variation of annealing temperature and deposition amount. At relatively low annealing temperatures between 400 and 600°C, the fabrication of hexagonal shaped Au voids and Au nano-clusters are observed and discussed based on the diffusion limited aggregation model. The size and density of voids and nano-clusters can systematically be controlled. The self-assembled Au NPs are fabricated at comparatively high temperatures from 650 to 800°C based on the Volmer-Weber growth model and also the size and density can be tuned accordingly. The results are symmetrically analyzed and discussed in conjunction with the diffusion theory and thermodynamics by utilizing AFM and SEM images, EDS maps and spectra, FFT power spectra, cross-sectional line-profiles and size and density plots.

  16. Systematic Study on the Self-Assembled Hexagonal Au Voids, Nano-Clusters and Nanoparticles on GaN (0001)

    PubMed Central

    Pandey, Puran; Sui, Mao; Li, Ming-Yu; Zhang, Quanzhen; Kim, Eun-Soo; Lee, Jihoon

    2015-01-01

    Au nano-clusters and nanoparticles (NPs) have been widely utilized in various electronic, optoelectronic, and bio-medical applications due to their great potentials. The size, density and configuration of Au NPs play a vital role in the performance of these devices. In this paper, we present a systematic study on the self-assembled hexagonal Au voids, nano-clusters and NPs fabricated on GaN (0001) by the variation of annealing temperature and deposition amount. At relatively low annealing temperatures between 400 and 600°C, the fabrication of hexagonal shaped Au voids and Au nano-clusters are observed and discussed based on the diffusion limited aggregation model. The size and density of voids and nano-clusters can systematically be controlled. The self-assembled Au NPs are fabricated at comparatively high temperatures from 650 to 800°C based on the Volmer-Weber growth model and also the size and density can be tuned accordingly. The results are symmetrically analyzed and discussed in conjunction with the diffusion theory and thermodynamics by utilizing AFM and SEM images, EDS maps and spectra, FFT power spectra, cross-sectional line-profiles and size and density plots. PMID:26285135

  17. Clustering Multivariate Time Series Using Hidden Markov Models

    PubMed Central

    Ghassempour, Shima; Girosi, Federico; Maeder, Anthony

    2014-01-01

    In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers. PMID:24662996

  18. Collaborative filtering recommendation model based on fuzzy clustering algorithm

    NASA Astrophysics Data System (ADS)

    Yang, Ye; Zhang, Yunhua

    2018-05-01

    As one of the most widely used algorithms in recommender systems, collaborative filtering algorithm faces two serious problems, which are the sparsity of data and poor recommendation effect in big data environment. In traditional clustering analysis, the object is strictly divided into several classes and the boundary of this division is very clear. However, for most objects in real life, there is no strict definition of their forms and attributes of their class. Concerning the problems above, this paper proposes to improve the traditional collaborative filtering model through the hybrid optimization of implicit semantic algorithm and fuzzy clustering algorithm, meanwhile, cooperating with collaborative filtering algorithm. In this paper, the fuzzy clustering algorithm is introduced to fuzzy clustering the information of project attribute, which makes the project belong to different project categories with different membership degrees, and increases the density of data, effectively reduces the sparsity of data, and solves the problem of low accuracy which is resulted from the inaccuracy of similarity calculation. Finally, this paper carries out empirical analysis on the MovieLens dataset, and compares it with the traditional user-based collaborative filtering algorithm. The proposed algorithm has greatly improved the recommendation accuracy.

  19. An alternative validation strategy for the Planck cluster catalogue and y-distortion maps

    NASA Astrophysics Data System (ADS)

    Khatri, Rishi

    2016-07-01

    We present an all-sky map of the y-type distortion calculated from the full mission Planck High Frequency Instrument (HFI) data using the recently proposed approach to component separation, which is based on parametric model fitting and model selection. This simple model-selection approach enables us to distinguish between carbon monoxide (CO) line emission and y-type distortion, something that is not possible using the internal linear combination based methods. We create a mask to cover the regions of significant CO emission relying on the information in the χ2 map that was obtained when fitting for the y-distortion and CO emission to the lowest four HFI channels. We revisit the second Planck cluster catalogue and try to quantify the quality of the cluster candidates in an approach that is similar in spirit to Aghanim et al. (2015, A&A, 580, A138). We find that at least 93% of the clusters in the cosmology sample are free of CO contamination. We also find that 59% of unconfirmed candidates may have significant contamination from molecular clouds. We agree with Planck Collaboration XXVII (2016, A&A, in press) on the worst offenders. We suggest an alternative validation strategy of measuring and subtracting the CO emission from the Planck cluster candidates using radio telescopes, thus improving the reliability of the catalogue. Our CO mask and annotations to the Planck cluster catalogue, identifying cluster candidates with possible CO contamination, are made publicly available. The full Tables 1-3 are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/592/A48

  20. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.

    PubMed

    Wu, Dingming; Wang, Dongfang; Zhang, Michael Q; Gu, Jin

    2015-12-01

    One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data. In this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types. LRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/ .

  1. Modeling of the HiPco process for carbon nanotube production. II. Reactor-scale analysis

    NASA Technical Reports Server (NTRS)

    Gokcen, Tahir; Dateo, Christopher E.; Meyyappan, M.

    2002-01-01

    The high-pressure carbon monoxide (HiPco) process, developed at Rice University, has been reported to produce single-walled carbon nanotubes from gas-phase reactions of iron carbonyl in carbon monoxide at high pressures (10-100 atm). Computational modeling is used here to develop an understanding of the HiPco process. A detailed kinetic model of the HiPco process that includes of the precursor, decomposition metal cluster formation and growth, and carbon nanotube growth was developed in the previous article (Part I). Decomposition of precursor molecules is necessary to initiate metal cluster formation. The metal clusters serve as catalysts for carbon nanotube growth. The diameter of metal clusters and number of atoms in these clusters are some of the essential information for predicting carbon nanotube formation and growth, which is then modeled by the Boudouard reaction with metal catalysts. Based on the detailed model simulations, a reduced kinetic model was also developed in Part I for use in reactor-scale flowfield calculations. Here this reduced kinetic model is integrated with a two-dimensional axisymmetric reactor flow model to predict reactor performance. Carbon nanotube growth is examined with respect to several process variables (peripheral jet temperature, reactor pressure, and Fe(CO)5 concentration) with the use of the axisymmetric model, and the computed results are compared with existing experimental data. The model yields most of the qualitative trends observed in the experiments and helps to understanding the fundamental processes in HiPco carbon nanotube production.

  2. X-RAY BINARIES AND STAR CLUSTERS IN THE ANTENNAE: OPTICAL CLUSTER COUNTERPARTS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rangelov, Blagoy; Chandar, Rupali; Prestwich, Andrea

    2012-10-20

    We compare the locations of 82 X-ray binaries (XRBs) detected in the merging Antennae galaxies by Zezas et al., based on observations taken with the Chandra X-Ray Observatory, with a catalog of optically selected star clusters presented by Whitmore et al., based on observations taken with the Hubble Space Telescope. Within the 2{sigma} positional uncertainty of Almost-Equal-To 0.''8, we find 22 XRBs are coincident with star clusters, where only two to three chance coincidences are expected. The ages of the clusters were estimated by comparing their UBVI, H{alpha} colors with predictions from stellar evolutionary models. We find that 14 ofmore » the 22 coincident XRBs (64%) are hosted by star clusters with ages of Almost-Equal-To 6 Myr or less. All of the very young host clusters are fairly massive and have M {approx}> 3 Multiplication-Sign 10{sup 4} M {sub Sun }, with many having masses M Almost-Equal-To 10{sup 5} M {sub Sun }. Five of the XRBs are hosted by young clusters with ages {tau} Almost-Equal-To 10-100 Myr, while three are hosted by intermediate-age clusters with {tau} Almost-Equal-To 100-300 Myr. Based on the results from recent N-body simulations, which suggest that black holes are far more likely to be retained within their parent clusters than neutron stars, we suggest that our sample consists primarily of black hole binaries with different ages.« less

  3. Qualitative mechanism models and the rationalization of procedures

    NASA Technical Reports Server (NTRS)

    Farley, Arthur M.

    1989-01-01

    A qualitative, cluster-based approach to the representation of hydraulic systems is described and its potential for generating and explaining procedures is demonstrated. Many ideas are formalized and implemented as part of an interactive, computer-based system. The system allows for designing, displaying, and reasoning about hydraulic systems. The interactive system has an interface consisting of three windows: a design/control window, a cluster window, and a diagnosis/plan window. A qualitative mechanism model for the ORS (Orbital Refueling System) is presented to coordinate with ongoing research on this system being conducted at NASA Ames Research Center.

  4. A first packet processing subdomain cluster model based on SDN

    NASA Astrophysics Data System (ADS)

    Chen, Mingyong; Wu, Weimin

    2017-08-01

    For the current controller cluster packet processing performance bottlenecks and controller downtime problems. An SDN controller is proposed to allocate the priority of each device in the SDN (Software Defined Network) network, and the domain contains several network devices and Controller, the controller is responsible for managing the network equipment within the domain, the switch performs data delivery based on the load of the controller, processing network equipment data. The experimental results show that the model can effectively solve the risk of single point failure of the controller, and can solve the performance bottleneck of the first packet processing.

  5. A Hierarchical Framework for State-Space Matrix Inference and Clustering.

    PubMed

    Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J; Bresnick, Emery H; Keleş, Sündüz

    2016-09-01

    In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC ( M atrix B ased A nalysis for S tate-space I nference and C lustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its endogenous locus by utilizing transcription factor occupancy data and illustrated applicability of MBASIC in a wide variety of problems. In both studies, MBASIC showed higher levels of raw data fidelity than analyzing these data with a two-step approach using ENCODE results on transcription factor occupancy data.

  6. Model for transport and reaction of defects and carriers within displacement cascades in gallium arsenide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wampler, William R., E-mail: wrwampl@sandia.gov; Myers, Samuel M.

    A model is presented for recombination of charge carriers at evolving displacement damage in gallium arsenide, which includes clustering of the defects in atomic displacement cascades produced by neutron or ion irradiation. The carrier recombination model is based on an atomistic description of capture and emission of carriers by the defects with time evolution resulting from the migration and reaction of the defects. The physics and equations on which the model is based are presented, along with the details of the numerical methods used for their solution. The model uses a continuum description of diffusion, field-drift and reaction of carriers,more » and defects within a representative spherically symmetric cluster of defects. The initial radial defect profiles within the cluster were determined through pair-correlation-function analysis of the spatial distribution of defects obtained from the binary-collision code MARLOWE, using recoil energies for fission neutrons. Properties of the defects are discussed and values for their parameters are given, many of which were obtained from density functional theory. The model provides a basis for predicting the transient response of III-V heterojunction bipolar transistors to displacement damage from energetic particle irradiation.« less

  7. Influence of Aromatic Molecules on the Structure and Spectroscopy of Water Clusters

    NASA Astrophysics Data System (ADS)

    Tabor, Daniel P.; Sibert, Edwin; Walsh, Patrick S.; Zwier, Timothy S.

    2016-06-01

    Isomer-specific resonant ion-dip infrared spectra are presented for benzene-(water)_n, 1-2-diphenoxyethane-(water)_n, and tricyclophane-(water)_n clusters. The IR spectra are modeled with a local mode Hamiltonian that was originally formulated for the analysis of benzene-(water)_n clusters with up to seven waters. The model accounts for stretch-bend Fermi coupling, which can complicate the IR spectra in the 3150-3300 cm-1 region. When the water clusters interact with each of the solutes, the hydrogen bond lengths between the water molecules change in a characteristic way, reflecting the strength of the solute-water interaction. These structural effects are also reflected spectroscopically in the shifts of the local mode OH stretch frequencies. When diphenoxyethane is the solute, the water clusters distort more significantly than when bound to benzene. Tricyclophane's structure provides an aromatic-rich binding pocket for the water clusters. The local mode model is used to extract Hamiltonians for individual water molecules. These monomer Hamiltonians divide into groups based on their local H-bonding architecture, allowing for further classification of the wide variety of water environments encountered in this study.

  8. Synthesis, Characterization, and Reactivity of Functionalized Trinuclear Iron–Sulfur Clusters – A New Class of Bioinspired Hydrogenase Models

    PubMed Central

    Kaiser, Manuel; Knör, Günther

    2015-01-01

    The air- and moisture-stable iron–sulfur carbonyl clusters Fe3S2(CO)7(dppm) (1) and Fe3S2(CO)7(dppf) (2) carrying the bisphosphine ligands bis(diphenylphosphanyl)methane (dppm) and 1,1′-bis(diphenylphosphanyl)ferrocene (dppf) were prepared and fully characterized. Two alternative synthetic routes based on different thionation reactions of triiron dodecacarbonyl were tested. The molecular structures of the methylene-bridged compound 1 and the ferrocene-functionalized derivative 2 were determined by single-crystal X-ray diffraction. The catalytic reactivity of the trinuclear iron–sulfur cluster core for proton reduction in solution at low overpotential was demonstrated. These deeply colored bisphosphine-bridged sulfur-capped iron carbonyl systems are discussed as promising candidates for the development of new bioinspired model compounds of iron-based hydrogenases. PMID:26512211

  9. Weakly supervised image semantic segmentation based on clustering superpixels

    NASA Astrophysics Data System (ADS)

    Yan, Xiong; Liu, Xiaohua

    2018-04-01

    In this paper, we propose an image semantic segmentation model which is trained from image-level labeled images. The proposed model starts with superpixel segmenting, and features of the superpixels are extracted by trained CNN. We introduce a superpixel-based graph followed by applying the graph partition method to group correlated superpixels into clusters. For the acquisition of inter-label correlations between the image-level labels in dataset, we not only utilize label co-occurrence statistics but also exploit visual contextual cues simultaneously. At last, we formulate the task of mapping appropriate image-level labels to the detected clusters as a problem of convex minimization. Experimental results on MSRC-21 dataset and LableMe dataset show that the proposed method has a better performance than most of the weakly supervised methods and is even comparable to fully supervised methods.

  10. NASA Software Cost Estimation Model: An Analogy Based Estimation Model

    NASA Technical Reports Server (NTRS)

    Hihn, Jairus; Juster, Leora; Menzies, Tim; Mathew, George; Johnson, James

    2015-01-01

    The cost estimation of software development activities is increasingly critical for large scale integrated projects such as those at DOD and NASA especially as the software systems become larger and more complex. As an example MSL (Mars Scientific Laboratory) developed at the Jet Propulsion Laboratory launched with over 2 million lines of code making it the largest robotic spacecraft ever flown (Based on the size of the software). Software development activities are also notorious for their cost growth, with NASA flight software averaging over 50% cost growth. All across the agency, estimators and analysts are increasingly being tasked to develop reliable cost estimates in support of program planning and execution. While there has been extensive work on improving parametric methods there is very little focus on the use of models based on analogy and clustering algorithms. In this paper we summarize our findings on effort/cost model estimation and model development based on ten years of software effort estimation research using data mining and machine learning methods to develop estimation models based on analogy and clustering. The NASA Software Cost Model performance is evaluated by comparing it to COCOMO II, linear regression, and K-­ nearest neighbor prediction model performance on the same data set.

  11. Development of New Open-Shell Perturbation and Coupled-Cluster Theories Based on Symmetric Spin Orbitals

    NASA Technical Reports Server (NTRS)

    Lee, Timothy J.; Arnold, James O. (Technical Monitor)

    1994-01-01

    A new spin orbital basis is employed in the development of efficient open-shell coupled-cluster and perturbation theories that are based on a restricted Hartree-Fock (RHF) reference function. The spin orbital basis differs from the standard one in the spin functions that are associated with the singly occupied spatial orbital. The occupied orbital (in the spin orbital basis) is assigned the delta(+) = 1/square root of 2(alpha+Beta) spin function while the unoccupied orbital is assigned the delta(-) = 1/square root of 2(alpha-Beta) spin function. The doubly occupied and unoccupied orbitals (in the reference function) are assigned the standard alpha and Beta spin functions. The coupled-cluster and perturbation theory wave functions based on this set of "symmetric spin orbitals" exhibit much more symmetry than those based on the standard spin orbital basis. This, together with interacting space arguments, leads to a dramatic reduction in the computational cost for both coupled-cluster and perturbation theory. Additionally, perturbation theory based on "symmetric spin orbitals" obeys Brillouin's theorem provided that spin and spatial excitations are both considered. Other properties of the coupled-cluster and perturbation theory wave functions and models will be discussed.

  12. Stabilization of sulfuric acid dimers by ammonia, methylamine, dimethylamine, and trimethylamine

    NASA Astrophysics Data System (ADS)

    Jen, Coty N.; McMurry, Peter H.; Hanson, David R.

    2014-06-01

    This study experimentally explores how ammonia (NH3), methylamine (MA), dimethylamine (DMA), and trimethylamine (TMA) affect the chemical formation mechanisms of electrically neutral clusters that contain two sulfuric acid molecules (dimers). Dimers may also contain undetectable compounds, such as water or bases, that evaporate upon ionization and sampling. Measurements were conducted using a glass flow reactor which contained a steady flow of humidified nitrogen with sulfuric acid concentrations of 107 to 109 cm-3. A known molar flow rate of a basic gas was injected into the flow reactor. The University of Minnesota Cluster Chemical Ionization Mass Spectrometer was used to measure the resulting sulfuric acid vapor and cluster concentrations. It was found that, for a given concentration of sulfuric acid vapor, the dimer concentration increases with increasing concentration of the basic gas, eventually reaching a plateau. The base concentrations at which the dimer concentrations saturate suggest NH3 < MA < TMA ≲ DMA in forming stabilized sulfuric acid dimers. Two heuristic models for cluster formation by acid-base reactions are developed to interpret the data. The models provide ranges of evaporation rate constants that are consistent with observations and leads to an analytic expression for nucleation rates that is consistent with atmospheric observations.

  13. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles.

    PubMed

    Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G Michael; O'Connor, Christopher; Patel, Chetan B

    2016-01-01

    Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using simultaneous considerations of etiology, comorbid conditions, and biomarker levels, may be superior to bedside classifications.

  14. Simultaneous Co-Clustering and Classification in Customers Insight

    NASA Astrophysics Data System (ADS)

    Anggistia, M.; Saefuddin, A.; Sartono, B.

    2017-04-01

    Building predictive model based on the heterogeneous dataset may yield many problems, such as less precise in parameter and prediction accuracy. Such problem can be solved by segmenting the data into relatively homogeneous groups and then build a predictive model for each cluster. The advantage of using this strategy usually gives result in simpler models, more interpretable, and more actionable without any loss in accuracy and reliability. This work concerns on marketing data set which recorded a customer behaviour across products. There are some variables describing customer and product as attributes. The basic idea of this approach is to combine co-clustering and classification simultaneously. The objective of this research is to analyse the customer across product characteristics, so the marketing strategy implemented precisely.

  15. Conduction band fluctuation scattering due to alloy clustering in barrier layers in InAlN/GaN heterostructures

    NASA Astrophysics Data System (ADS)

    Li, Qun; Chen, Qian; Chong, Jing

    2017-12-01

    In InAlN/GaN heterostructures, alloy clustering-induced InAlN conduction band fluctuations interact with electrons penetrating into the barrier layers and thus affect the electron transport. Based on the statistical description of InAlN compositional distribution, a theoretical model of the conduction band fluctuation scattering (CBFS) is presented. The model calculations show that the CBFS-limited mobility decreases with increasing two-dimensional electron gas sheet density and is inversely proportional to the squared standard deviation of In distribution. The AlN interfacial layer can effectively suppress the CBFS via decreasing the penetration probability. This model is directed towards understanding the transport properties in heterostructure materials with columnar clusters.

  16. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    ERIC Educational Resources Information Center

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  17. A Cluster Analytic Study of Clinical Orientations among Chemical Dependency Counselors.

    ERIC Educational Resources Information Center

    Thombs, Dennis L.; Osborn, Cynthia J.

    2001-01-01

    Three distinct clinical orientations were identified in a sample of chemical dependency counselors (N=406). Based on cluster analysis, the largest group, identified and labeled as "uniform counselors," endorsed a simple, moral-disease model with little interest in psychosocial interventions. (Contains 50 references and 4 tables.) (GCP)

  18. Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data

    PubMed Central

    Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.

    2003-01-01

    Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292

  19. Clustering autism: using neuroanatomical differences in 26 mouse models to gain insight into the heterogeneity.

    PubMed

    Ellegood, J; Anagnostou, E; Babineau, B A; Crawley, J N; Lin, L; Genestine, M; DiCicco-Bloom, E; Lai, J K Y; Foster, J A; Peñagarikano, O; Geschwind, D H; Pacey, L K; Hampson, D R; Laliberté, C L; Mills, A A; Tam, E; Osborne, L R; Kouser, M; Espinosa-Becerra, F; Xuan, Z; Powell, C M; Raznahan, A; Robins, D M; Nakai, N; Nakatani, J; Takumi, T; van Eede, M C; Kerr, T M; Muller, C; Blakely, R D; Veenstra-VanderWeele, J; Henkelman, R M; Lerch, J P

    2015-02-01

    Autism is a heritable disorder, with over 250 associated genes identified to date, yet no single gene accounts for >1-2% of cases. The clinical presentation, behavioural symptoms, imaging and histopathology findings are strikingly heterogeneous. A more complete understanding of autism can be obtained by examining multiple genetic or behavioural mouse models of autism using magnetic resonance imaging (MRI)-based neuroanatomical phenotyping. Twenty-six different mouse models were examined and the consistently found abnormal brain regions across models were parieto-temporal lobe, cerebellar cortex, frontal lobe, hypothalamus and striatum. These models separated into three distinct clusters, two of which can be linked to the under and over-connectivity found in autism. These clusters also identified previously unknown connections between Nrxn1α, En2 and Fmr1; Nlgn3, BTBR and Slc6A4; and also between X monosomy and Mecp2. With no single treatment for autism found, clustering autism using neuroanatomy and identifying these strong connections may prove to be a crucial step in predicting treatment response.

  20. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

    PubMed Central

    Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369

  1. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    PubMed

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  2. Perspective: Size selected clusters for catalysis and electrochemistry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Halder, Avik; Curtiss, Larry A.; Fortunelli, Alessandro

    We report that size-selected clusters containing a handful of atoms may possess noble catalytic properties different from nano-sized or bulk catalysts. Size- and composition-selected clusters can also serve as models of the catalytic active site, where an addition or removal of a single atom can have a dramatic effect on their activity and selectivity. In this Perspective, we provide an overview of studies performed under both ultra-high vacuum and realistic reaction conditions aimed at the interrogation, characterization and understanding of the performance of supported size-selected clusters in heterogeneous and electrochemical reactions, which address the effects of cluster size, cluster composition,more » cluster-support interactions and reaction conditions, the key parameters for the understanding and control of catalyst functionality. Computational modelling based on density functional theory sampling of local minima and energy barriers or ab initio Molecular Dynamics simulations is an integral part of this research by providing fundamental understanding of the catalytic processes at the atomic level, as well as by predicting new materials compositions which can be validated in experiments. Lastly, we discuss approaches which aim at the scale up of the production of well-defined clusters for use in real world applications.« less

  3. Perspective: Size selected clusters for catalysis and electrochemistry

    DOE PAGES

    Halder, Avik; Curtiss, Larry A.; Fortunelli, Alessandro; ...

    2018-03-15

    We report that size-selected clusters containing a handful of atoms may possess noble catalytic properties different from nano-sized or bulk catalysts. Size- and composition-selected clusters can also serve as models of the catalytic active site, where an addition or removal of a single atom can have a dramatic effect on their activity and selectivity. In this Perspective, we provide an overview of studies performed under both ultra-high vacuum and realistic reaction conditions aimed at the interrogation, characterization and understanding of the performance of supported size-selected clusters in heterogeneous and electrochemical reactions, which address the effects of cluster size, cluster composition,more » cluster-support interactions and reaction conditions, the key parameters for the understanding and control of catalyst functionality. Computational modelling based on density functional theory sampling of local minima and energy barriers or ab initio Molecular Dynamics simulations is an integral part of this research by providing fundamental understanding of the catalytic processes at the atomic level, as well as by predicting new materials compositions which can be validated in experiments. Lastly, we discuss approaches which aim at the scale up of the production of well-defined clusters for use in real world applications.« less

  4. Perspective: Size selected clusters for catalysis and electrochemistry

    NASA Astrophysics Data System (ADS)

    Halder, Avik; Curtiss, Larry A.; Fortunelli, Alessandro; Vajda, Stefan

    2018-03-01

    Size-selected clusters containing a handful of atoms may possess noble catalytic properties different from nano-sized or bulk catalysts. Size- and composition-selected clusters can also serve as models of the catalytic active site, where an addition or removal of a single atom can have a dramatic effect on their activity and selectivity. In this perspective, we provide an overview of studies performed under both ultra-high vacuum and realistic reaction conditions aimed at the interrogation, characterization, and understanding of the performance of supported size-selected clusters in heterogeneous and electrochemical reactions, which address the effects of cluster size, cluster composition, cluster-support interactions, and reaction conditions, the key parameters for the understanding and control of catalyst functionality. Computational modeling based on density functional theory sampling of local minima and energy barriers or ab initio molecular dynamics simulations is an integral part of this research by providing fundamental understanding of the catalytic processes at the atomic level, as well as by predicting new materials compositions which can be validated in experiments. Finally, we discuss approaches which aim at the scale up of the production of well-defined clusters for use in real world applications.

  5. Computational Design of Clusters for Catalysis

    NASA Astrophysics Data System (ADS)

    Jimenez-Izal, Elisa; Alexandrova, Anastassia N.

    2018-04-01

    When small clusters are studied in chemical physics or physical chemistry, one perhaps thinks of the fundamental aspects of cluster electronic structure, or precision spectroscopy in ultracold molecular beams. However, small clusters are also of interest in catalysis, where the cold ground state or an isolated cluster may not even be the right starting point. Instead, the big question is: What happens to cluster-based catalysts under real conditions of catalysis, such as high temperature and coverage with reagents? Myriads of metastable cluster states become accessible, the entire system is dynamic, and catalysis may be driven by rare sites present only under those conditions. Activity, selectivity, and stability are highly dependent on size, composition, shape, support, and environment. To probe and master cluster catalysis, sophisticated tools are being developed for precision synthesis, operando measurements, and multiscale modeling. This review intends to tell the messy story of clusters in catalysis.

  6. Quark cluster model for deep-inelastic lepton-deuteron scattering

    NASA Astrophysics Data System (ADS)

    Yen, G.; Vary, J. P.; Harindranath, A.; Pirner, H. J.

    1990-10-01

    We evaluate the contribution of quasifree nucleon knockout and of inelastic lepton-nucleon scattering in inclusive electron-deuteron reactions at large momentum transfer. We examine the degree of quantitative agreement with deuteron wave functions from the Reid soft-core and Bonn realistic nucleon-nucleon interactions. For the range of data available there is strong sensitivity to the tensor correlations which are distinctively different in these two deuteron models. At this stage of the analyses the Reid soft-core wave function provides a reasonable description of the data while the Bonn wave function does not. We then include a six-quark cluster component whose relative contribution is based on an overlap criterion and obtain a good description of all the data with both interactions. The critical separation at which overlap occurs (formation of six-quark clusters) is taken to be 1.0 fm and the six-quark cluster probability is 4.7% for Reid and 5.4% for Bonn. As a consequence the quark cluster model with either Reid or Bonn wave function describe the SLAC inclusive electron-deuteron scattering data equally well. We then show how additional data would be decisive in resolving which model is ultimately more correct.

  7. Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Ling; Lee, Doris; Sim, Alex

    Current practice in whole time series clustering of residential meter data focuses on aggregated or subsampled load data at the customer level, which ignores day-to-day differences within customers. This information is critical to determine each customer’s suitability to various demand side management strategies that support intelligent power grids and smart energy management. Clustering daily load shapes provides fine-grained information on customer attributes and sources of variation for subsequent models and customer segmentation. In this paper, we apply 11 clustering methods to daily residential meter data. We evaluate their parameter settings and suitability based on 6 generic performance metrics and post-checkingmore » of resulting clusters. Finally, we recommend suitable techniques and parameters based on the goal of discovering diverse daily load patterns among residential customers. To the authors’ knowledge, this paper is the first robust comparative review of clustering techniques applied to daily residential load shape time series in the power systems’ literature.« less

  8. Functionalizing graphene by embedded boron clusters

    NASA Astrophysics Data System (ADS)

    Quandt, Alexander; Kunstmann, Jens; Ozdogan, Cem; Fehske, Holger

    2010-03-01

    We present results from an ab initio study of B7 clusters implanted into graphene [1,2]. Our model system consists of an alternating chain of quasiplanar B7 clusters. We show that graphene easily accepts these alternating B7-C6 chains and that the implanted boron components may dramatically modify the electronic properties. This suggests that our model system might serve as a blueprint for the controlled layout of graphene based nanodevices, where the semiconducting properties are supplemented by parts of the graphene matrix itself, and the basic metallic wiring is provided by alternating chains of implanted boron clusters. [1] A. Quandt, C. "Ozdogan, J. Kunstmann, and H. Fehske, Nanotechnology 19, 335707 (2008). [2] A. Quandt, C. "Ozdogan, J. Kunstmann, and H. Fehske, phys. stat. solidi (b) 245, 2077 (2008).

  9. Mapping the Indonesian territory, based on pollution, social demography and geographical data, using self organizing feature map

    NASA Astrophysics Data System (ADS)

    Hernawati, Kuswari; Insani, Nur; Bambang S. H., M.; Nur Hadi, W.; Sahid

    2017-08-01

    This research aims to mapping the 33 (thirty-three) provinces in Indonesia, based on the data on air, water and soil pollution, as well as social demography and geography data, into a clustered model. The method used in this study was unsupervised method that combines the basic concept of Kohonen or Self-Organizing Feature Maps (SOFM). The method is done by providing the design parameters for the model based on data related directly/ indirectly to pollution, which are the demographic and social data, pollution levels of air, water and soil, as well as the geographical situation of each province. The parameters used consists of 19 features/characteristics, including the human development index, the number of vehicles, the availability of the plant's water absorption and flood prevention, as well as geographic and demographic situation. The data used were secondary data from the Central Statistics Agency (BPS), Indonesia. The data are mapped into SOFM from a high-dimensional vector space into two-dimensional vector space according to the closeness of location in term of Euclidean distance. The resulting outputs are represented in clustered grouping. Thirty-three provinces are grouped into five clusters, where each cluster has different features/characteristics and level of pollution. The result can used to help the efforts on prevention and resolution of pollution problems on each cluster in an effective and efficient way.

  10. Deletion Diagnostics for Alternating Logistic Regressions

    PubMed Central

    Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.

    2013-01-01

    Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960

  11. Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models.

    PubMed

    Liu, Jingxia; Colditz, Graham A

    2018-05-01

    There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the "working correlation structure" is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs-exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Helium segregation on surfaces of plasma-exposed tungsten

    DOE PAGES

    Maroudas, Dimitrios; Blondel, Sophie; Hu, Lin; ...

    2016-01-21

    Here we report a hierarchical multi-scale modeling study of implanted helium segregation on surfaces of tungsten, considered as a plasma facing component in nuclear fusion reactors. We employ a hierarchy of atomic-scale simulations based on a reliable interatomic interaction potential, including molecular-statics simulations to understand the origin of helium surface segregation, targeted molecular-dynamics (MD) simulations of near-surface cluster reactions, and large-scale MD simulations of implanted helium evolution in plasma-exposed tungsten. We find that small, mobile He-n (1 <= n <= 7) clusters in the near-surface region are attracted to the surface due to an elastic interaction force that provides themore » thermodynamic driving force for surface segregation. Elastic interaction force induces drift fluxes of these mobile Hen clusters, which increase substantially as the migrating clusters approach the surface, facilitating helium segregation on the surface. Moreover, the clusters' drift toward the surface enables cluster reactions, most importantly trap mutation, in the near-surface region at rates much higher than in the bulk material. Moreover, these near-surface cluster dynamics have significant effects on the surface morphology, near-surface defect structures, and the amount of helium retained in the material upon plasma exposure. We integrate the findings of such atomic-scale simulations into a properly parameterized and validated spatially dependent, continuum-scale reaction-diffusion cluster dynamics model, capable of predicting implanted helium evolution, surface segregation, and its near-surface effects in tungsten. This cluster-dynamics model sets the stage for development of fully atomistically informed coarse-grained models for computationally efficient simulation predictions of helium surface segregation, as well as helium retention and surface morphological evolution, toward optimal design of plasma facing components.« less

  13. Lens models under the microscope: comparison of Hubble Frontier Field cluster magnification maps

    NASA Astrophysics Data System (ADS)

    Priewe, Jett; Williams, Liliya L. R.; Liesenborgs, Jori; Coe, Dan; Rodney, Steven A.

    2017-02-01

    Using the power of gravitational lensing magnification by massive galaxy clusters, the Hubble Frontier Fields provide deep views of six patches of the high-redshift Universe. The combination of deep Hubble imaging and exceptional lensing strength has revealed the greatest numbers of multiply-imaged galaxies available to constrain models of cluster mass distributions. However, even with O(100) images per cluster, the uncertainties associated with the reconstructions are not negligible. The goal of this paper is to show the diversity of model magnification predictions. We examine seven and nine mass models of Abell 2744 and MACS J0416, respectively, submitted to the Mikulski Archive for Space Telescopes for public distribution in 2015 September. The dispersion between model predictions increases from 30 per cent at common low magnifications (μ ˜ 2) to 70 per cent at rare high magnifications (μ ˜ 40). MACS J0416 exhibits smaller dispersions than Abell 2744 for 2 < μ < 10. We show that magnification maps based on different lens inversion techniques typically differ from each other by more than their quoted statistical errors. This suggests that some models underestimate the true uncertainties, which are primarily due to various lensing degeneracies. Though the exact mass sheet degeneracy is broken, its generalized counterpart is not broken at least in Abell 2744. Other local degeneracies are also present in both clusters. Our comparison of models is complementary to the comparison of reconstructions of known synthetic mass distributions. By focusing on observed clusters, we can identify those that are best constrained, and therefore provide the clearest view of the distant Universe.

  14. Rotational symmetry breaking toward a string-valence bond solid phase in frustrated J1 -J2 transverse field Ising model

    NASA Astrophysics Data System (ADS)

    Sadrzadeh, M.; Langari, A.

    2018-06-01

    We study the effect of quantum fluctuations by means of a transverse magnetic field (Γ) on the highly degenerate ground state of antiferromagnetic J1 -J2 Ising model on the square lattice, at the limit J2 /J1 = 0.5 . We show that harmonic quantum fluctuations based on single spin flips can not lift such degeneracy, however an-harmonic quantum fluctuations based on multi spin cluster flip excitations lift the degeneracy toward a unique ground state with string-valence bond solid (VBS) nature. A cluster operator formalism has been implemented to incorporate an-harmonic quantum fluctuations. We show that cluster-type excitations of the model lead not only to lower the excitation energy compared with a single-spin flip but also to lift the extensive degeneracy in favor of a string-VBS state, which breaks lattice rotational symmetry with only two fold degeneracy. The tendency toward the broken symmetry state is justified by numerical exact diagonalization. Moreover, we introduce a map to find the relation between the present model on the checkerboard and square lattices.

  15. Model-Based Clustering of Regression Time Series Data via APECM -- An AECM Algorithm Sung to an Even Faster Beat

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Wei-Chen; Maitra, Ranjan

    2011-01-01

    We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less

  16. Autonomous mental development with selective attention, object perception, and knowledge representation

    NASA Astrophysics Data System (ADS)

    Ban, Sang-Woo; Lee, Minho

    2008-04-01

    Knowledge-based clustering and autonomous mental development remains a high priority research topic, among which the learning techniques of neural networks are used to achieve optimal performance. In this paper, we present a new framework that can automatically generate a relevance map from sensory data that can represent knowledge regarding objects and infer new knowledge about novel objects. The proposed model is based on understating of the visual what pathway in our brain. A stereo saliency map model can selectively decide salient object areas by additionally considering local symmetry feature. The incremental object perception model makes clusters for the construction of an ontology map in the color and form domains in order to perceive an arbitrary object, which is implemented by the growing fuzzy topology adaptive resonant theory (GFTART) network. Log-polar transformed color and form features for a selected object are used as inputs of the GFTART. The clustered information is relevant to describe specific objects, and the proposed model can automatically infer an unknown object by using the learned information. Experimental results with real data have demonstrated the validity of this approach.

  17. Exploring multicollinearity using a random matrix theory approach.

    PubMed

    Feher, Kristen; Whelan, James; Müller, Samuel

    2012-01-01

    Clustering of gene expression data is often done with the latent aim of dimension reduction, by finding groups of genes that have a common response to potentially unknown stimuli. However, what is poorly understood to date is the behaviour of a low dimensional signal embedded in high dimensions. This paper introduces a multicollinear model which is based on random matrix theory results, and shows potential for the characterisation of a gene cluster's correlation matrix. This model projects a one dimensional signal into many dimensions and is based on the spiked covariance model, but rather characterises the behaviour of the corresponding correlation matrix. The eigenspectrum of the correlation matrix is empirically examined by simulation, under the addition of noise to the original signal. The simulation results are then used to propose a dimension estimation procedure of clusters from data. Moreover, the simulation results warn against considering pairwise correlations in isolation, as the model provides a mechanism whereby a pair of genes with `low' correlation may simply be due to the interaction of high dimension and noise. Instead, collective information about all the variables is given by the eigenspectrum.

  18. Young LMC clusters: the role of red supergiants and multiple stellar populations in their integrated light and CMDs

    NASA Astrophysics Data System (ADS)

    Asa'd, Randa S.; Vazdekis, Alexandre; Cerviño, Miguel; Noël, Noelia E. D.; Beasley, Michael A.; Kassab, Mahmoud

    2017-11-01

    The optical integrated spectra of three Large Magellanic Cloud young stellar clusters (NGC 1984, NGC 1994 and NGC 2011) exhibit concave continua and prominent molecular bands which deviate significantly from the predictions of single stellar population (SSP) models. In order to understand the appearance of these spectra, we create a set of young stellar population (MILES) models, which we make available to the community. We use archival International Ultraviolet Explorer integrated UV spectra to independently constrain the cluster masses and extinction, and rule out strong stochastic effects in the optical spectra. In addition, we also analyse deep colour-magnitude diagrams of the clusters to provide independent age determinations based on isochrone fitting. We explore hypotheses, including age spreads in the clusters, a top-heavy initial mass function, different SSP models and the role of red supergiant stars (RSG). We find that the strong molecular features in the optical spectra can be only reproduced by modelling an increased fraction of about ˜20 per cent by luminosity of RSG above what is predicted by canonical stellar evolution models. Given the uncertainties in stellar evolution at Myr ages, we cannot presently rule out the presence of Myr age spreads in these clusters. Our work combines different wavelengths as well as different approaches (resolved data as well as integrated spectra for the same sample) in order to reveal the complete picture. We show that each approach provides important information but in combination we can better understand the cluster stellar populations.

  19. Microstructure-based modelling of arbitrary deformation histories of filler-reinforced elastomers

    NASA Astrophysics Data System (ADS)

    Lorenz, H.; Klüppel, M.

    2012-11-01

    A physically motivated theory of rubber reinforcement based on filler cluster mechanics is presented considering the mechanical behaviour of quasi-statically loaded elastomeric materials subjected to arbitrary deformation histories. This represents an extension of a previously introduced model describing filler induced stress softening and hysteresis of highly strained elastomers. These effects are referred to the hydrodynamic reinforcement of rubber elasticity due to strain amplification by stiff filler clusters and cyclic breakdown and re-aggregation (healing) of softer, already damaged filler clusters. The theory is first developed for the special case of outer stress-strain cycles with successively increasing maximum strain. In this more simple case, all soft clusters are broken at the turning points of the cycle and the mechanical energy stored in the strained clusters is completely dissipated, i.e. only irreversible stress contributions result. Nevertheless, the description of outer cycles involves already all material parameters of the theory and hence they can be used for a fitting procedure. In the general case of an arbitrary deformation history, the cluster mechanics of the material is complicated due to the fact that not all soft clusters are broken at the turning points of a cycle. For that reason additional reversible stress contributions considering the relaxation of clusters upon retraction have to be taken into account for the description of inner cycles. A special recursive algorithm is developed constituting a frame of the mechanical response of encapsulated inner cycles. Simulation and measurement are found to be in fair agreement for CB and silica filled SBR/BR and EPDM samples, loaded in compression and tension along various deformation histories.

  20. New particle formation from sulfuric acid and amines: Comparison of monomethylamine, dimethylamine, and trimethylamine

    NASA Astrophysics Data System (ADS)

    Olenius, Tinja; Halonen, Roope; Kurtén, Theo; Henschel, Henning; Kupiainen-Määttä, Oona; Ortega, Ismael K.; Jen, Coty N.; Vehkamäki, Hanna; Riipinen, Ilona

    2017-07-01

    Amines are bases that originate from both anthropogenic and natural sources, and they are recognized as candidates to participate in atmospheric aerosol particle formation together with sulfuric acid. Monomethylamine, dimethylamine, and trimethylamine (MMA, DMA, and TMA, respectively) have been shown to enhance sulfuric acid-driven particle formation more efficiently than ammonia, but both theory and laboratory experiments suggest that there are differences in their enhancing potentials. However, as quantitative concentrations and thermochemical properties of different amines remain relatively uncertain, and also for computational reasons, the compounds have been treated as a single surrogate amine species in large-scale modeling studies. In this work, the differences and similarities of MMA, DMA, and TMA are studied by simulations of molecular cluster formation from sulfuric acid, water, and each of the three amines. Quantum chemistry-based cluster evaporation rate constants are applied in a cluster population dynamics model to yield cluster concentrations and formation rates at boundary layer conditions. While there are differences, for instance, in the clustering mechanisms and cluster hygroscopicity for the three amines, DMA and TMA can be approximated as a lumped species. Formation of nanometer-sized particles and its dependence on ambient conditions is roughly similar for these two: both efficiently form clusters with sulfuric acid, and cluster formation is rather insensitive to changes in temperature and relative humidity. Particle formation from sulfuric acid and MMA is weaker and significantly more sensitive to ambient conditions. Therefore, merging MMA together with DMA and TMA introduces inaccuracies in sulfuric acid-amine particle formation schemes.

  1. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.

    PubMed

    Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka

    2014-02-01

    In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.

  2. Patterns of breast cancer mortality trends in Europe.

    PubMed

    Amaro, Joana; Severo, Milton; Vilela, Sofia; Fonseca, Sérgio; Fontes, Filipa; La Vecchia, Carlo; Lunet, Nuno

    2013-06-01

    To identify patterns of variation in breast cancer mortality in Europe (1980-2010), using a model-based approach. Mortality data were obtained from the World Health Organization database and mixed models were used to describe the time trends in the age-standardized mortality rates (ASMR). Model-based clustering was used to identify clusters of countries with homogeneous variation in ASMR. Three patterns were identified. Patterns 1 and 2 are characterized by stable or slightly increasing trends in ASMR in the first half of the period analysed, and a clear decline is observed thereafter; in pattern 1 the median of the ASMR is higher, and the highest rates were achieved sooner. Pattern 3 is characterised by a rapid increase in mortality until 1999, declining slowly thereafter. This study provides a general model for the description and interpretation of the variation in breast cancer mortality in Europe, based in three main patterns. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. Self-organization and positioning of bacterial protein clusters

    NASA Astrophysics Data System (ADS)

    Murray, Seán M.; Sourjik, Victor

    2017-10-01

    Many cellular processes require proteins to be precisely positioned within the cell. In some cases this can be attributed to passive mechanisms such as recruitment by other proteins in the cell or by exploiting the curvature of the membrane. However, in bacteria, active self-positioning is likely to play a role in multiple processes, including the positioning of the future site of cell division and cytoplasmic protein clusters. How can such dynamic clusters be formed and positioned? Here, we present a model for the self-organization and positioning of dynamic protein clusters into regularly repeating patterns based on a phase-locked Turing pattern. A single peak in the concentration is always positioned at the midpoint of the model cell, and two peaks are positioned at the midpoint of each half. Furthermore, domain growth results in peak splitting and pattern doubling. We argue that the model may explain the regular positioning of the highly conserved structural maintenance of chromosomes complexes on the bacterial nucleoid and that it provides an attractive mechanism for the self-positioning of dynamic protein clusters in other systems.

  4. Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things

    PubMed Central

    Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao

    2015-01-01

    Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices’ service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes’ life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN. PMID:26703619

  5. Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things.

    PubMed

    Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao

    2015-12-23

    Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices' service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes' life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN.

  6. Interactive classification and content-based retrieval of tissue images

    NASA Astrophysics Data System (ADS)

    Aksoy, Selim; Marchisio, Giovanni B.; Tusk, Carsten; Koperski, Krzysztof

    2002-11-01

    We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.

  7. The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts

    NASA Astrophysics Data System (ADS)

    Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.

    2012-07-01

    We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.

  8. A hybrid intelligent method for three-dimensional short-term prediction of dissolved oxygen content in aquaculture.

    PubMed

    Chen, Yingyi; Yu, Huihui; Cheng, Yanjun; Cheng, Qianqian; Li, Daoliang

    2018-01-01

    A precise predictive model is important for obtaining a clear understanding of the changes in dissolved oxygen content in crab ponds. Highly accurate interval forecasting of dissolved oxygen content is fundamental to reduce risk, and three-dimensional prediction can provide more accurate results and overall guidance. In this study, a hybrid three-dimensional (3D) dissolved oxygen content prediction model based on a radial basis function (RBF) neural network, K-means and subtractive clustering was developed and named the subtractive clustering (SC)-K-means-RBF model. In this modeling process, K-means and subtractive clustering methods were employed to enhance the hyperparameters required in the RBF neural network model. The comparison of the predicted results of different traditional models validated the effectiveness and accuracy of the proposed hybrid SC-K-means-RBF model for three-dimensional prediction of dissolved oxygen content. Consequently, the proposed model can effectively display the three-dimensional distribution of dissolved oxygen content and serve as a guide for feeding and future studies.

  9. Adding-point strategy for reduced-order hypersonic aerothermodynamics modeling based on fuzzy clustering

    NASA Astrophysics Data System (ADS)

    Chen, Xin; Liu, Li; Zhou, Sida; Yue, Zhenjiang

    2016-09-01

    Reduced order models(ROMs) based on the snapshots on the CFD high-fidelity simulations have been paid great attention recently due to their capability of capturing the features of the complex geometries and flow configurations. To improve the efficiency and precision of the ROMs, it is indispensable to add extra sampling points to the initial snapshots, since the number of sampling points to achieve an adequately accurate ROM is generally unknown in prior, but a large number of initial sampling points reduces the parsimony of the ROMs. A fuzzy-clustering-based adding-point strategy is proposed and the fuzzy clustering acts an indicator of the region in which the precision of ROMs is relatively low. The proposed method is applied to construct the ROMs for the benchmark mathematical examples and a numerical example of hypersonic aerothermodynamics prediction for a typical control surface. The proposed method can achieve a 34.5% improvement on the efficiency than the estimated mean squared error prediction algorithm and shows same-level prediction accuracy.

  10. Mass functions for globular cluster main sequences based on CCD photometry and stellar models

    NASA Astrophysics Data System (ADS)

    McClure, Robert D.; Vandenberg, Don A.; Smith, Graeme H.; Fahlman, Gregory G.; Richer, Harvey B.; Hesser, James E.; Harris, William E.; Stetson, Peter B.; Bell, R. A.

    1986-08-01

    Main-sequence luminosity functions constructed from CCD observations of globular clusters reveal a strong trend in slope with metal abundance. Theoretical luminosity functions constructed from VandenBerg and Bell's (1985) isochrones have been fitted to the observations and reveal a trend between x, the power-law index of the mass function, and metal abundance. The most metal-poor clusters require an index of about x = 2.5, whereas the most metal-rich clusters exhibit an index of x of roughly -0.5. The luminosity functions for two sparse clusters, E3 and Pal 5, are distinct from those of the more massive clusters, in that they show a turndown which is possibly a result of mass loss or tidal disruption.

  11. An effective trust-based recommendation method using a novel graph clustering algorithm

    NASA Astrophysics Data System (ADS)

    Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

    2015-10-01

    Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.

  12. Ontology-based structured cosine similarity in document summarization: with applications to mobile audio-based knowledge management.

    PubMed

    Yuan, Soe-Tsyr; Sun, Jerry

    2005-10-01

    Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of the documents. In this paper, we present a novel method named structured cosine similarity (SCS) that furnishes document clustering with a new way of modeling on document summarization, considering the structure of the documents so as to improve the performance of document clustering in terms of quality, stability, and efficiency. This study was motivated by the problem of clustering speech documents (of no rich document features) attained from the wireless experience oral sharing conducted by mobile workforce of enterprises, fulfilling audio-based knowledge management. In other words, this problem aims to facilitate knowledge acquisition and sharing by speech. The evaluations also show fairly promising results on our method of structured cosine similarity.

  13. Superresolution Modeling of Calcium Release in the Heart

    PubMed Central

    Walker, Mark A.; Williams, George S.B.; Kohl, Tobias; Lehnart, Stephan E.; Jafri, M. Saleet; Greenstein, Joseph L.; Lederer, W.J.; Winslow, Raimond L.

    2014-01-01

    Stable calcium-induced calcium release (CICR) is critical for maintaining normal cellular contraction during cardiac excitation-contraction coupling. The fundamental element of CICR in the heart is the calcium (Ca2+) spark, which arises from a cluster of ryanodine receptors (RyR). Opening of these RyR clusters is triggered to produce a local, regenerative release of Ca2+ from the sarcoplasmic reticulum (SR). The Ca2+ leak out of the SR is an important process for cellular Ca2+ management, and it is critically influenced by spark fidelity, i.e., the probability that a spontaneous RyR opening triggers a Ca2+ spark. Here, we present a detailed, three-dimensional model of a cardiac Ca2+ release unit that incorporates diffusion, intracellular buffering systems, and stochastically gated ion channels. The model exhibits realistic Ca2+ sparks and robust Ca2+ spark termination across a wide range of geometries and conditions. Furthermore, the model captures the details of Ca2+ spark and nonspark-based SR Ca2+ leak, and it produces normal excitation-contraction coupling gain. We show that SR luminal Ca2+-dependent regulation of the RyR is not critical for spark termination, but it can explain the exponential rise in the SR Ca2+ leak-load relationship demonstrated in previous experimental work. Perturbations to subspace dimensions, which have been observed in experimental models of disease, strongly alter Ca2+ spark dynamics. In addition, we find that the structure of RyR clusters also influences Ca2+ release properties due to variations in inter-RyR coupling via local subspace Ca2+ concentration ([Ca2+]ss). These results are illustrated for RyR clusters based on super-resolution stimulated emission depletion microscopy. Finally, we present a believed-novel approach by which the spark fidelity of a RyR cluster can be predicted from structural information of the cluster using the maximum eigenvalue of its adjacency matrix. These results provide critical insights into CICR dynamics in heart, under normal and pathological conditions. PMID:25517166

  14. Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres.

    PubMed

    Banerjee, Arindam; Ghosh, Joydeep

    2004-05-01

    Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.

  15. Analysis of EEG-fMRI data in focal epilepsy based on automated spike classification and Signal Space Projection.

    PubMed

    Liston, Adam D; De Munck, Jan C; Hamandi, Khalid; Laufs, Helmut; Ossenblok, Pauly; Duncan, John S; Lemieux, Louis

    2006-07-01

    Simultaneous acquisition of EEG and fMRI data enables the investigation of the hemodynamic correlates of interictal epileptiform discharges (IEDs) during the resting state in patients with epilepsy. This paper addresses two issues: (1) the semi-automation of IED classification in statistical modelling for fMRI analysis and (2) the improvement of IED detection to increase experimental fMRI efficiency. For patients with multiple IED generators, sensitivity to IED-correlated BOLD signal changes can be improved when the fMRI analysis model distinguishes between IEDs of differing morphology and field. In an attempt to reduce the subjectivity of visual IED classification, we implemented a semi-automated system, based on the spatio-temporal clustering of EEG events. We illustrate the technique's usefulness using EEG-fMRI data from a subject with focal epilepsy in whom 202 IEDs were visually identified and then clustered semi-automatically into four clusters. Each cluster of IEDs was modelled separately for the purpose of fMRI analysis. This revealed IED-correlated BOLD activations in distinct regions corresponding to three different IED categories. In a second step, Signal Space Projection (SSP) was used to project the scalp EEG onto the dipoles corresponding to each IED cluster. This resulted in 123 previously unrecognised IEDs, the inclusion of which, in the General Linear Model (GLM), increased the experimental efficiency as reflected by significant BOLD activations. We have also shown that the detection of extra IEDs is robust in the face of fluctuations in the set of visually detected IEDs. We conclude that automated IED classification can result in more objective fMRI models of IEDs and significantly increased sensitivity.

  16. Consensus-Based Sorting of Neuronal Spike Waveforms

    PubMed Central

    Fournier, Julien; Mueller, Christian M.; Shein-Idelson, Mark; Hemberger, Mike

    2016-01-01

    Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained “ground truth” data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data. PMID:27536990

  17. Consensus-Based Sorting of Neuronal Spike Waveforms.

    PubMed

    Fournier, Julien; Mueller, Christian M; Shein-Idelson, Mark; Hemberger, Mike; Laurent, Gilles

    2016-01-01

    Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained "ground truth" data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data.

  18. Comparison of nano-sized Mn oxides with the Mn cluster of photosystem II as catalysts for water oxidation.

    PubMed

    Najafpour, Mohammad Mahdi; Ghobadi, Mohadeseh Zarei; Haghighi, Behzad; Tomo, Tatsuya; Shen, Jian-Ren; Allakhverdiev, Suleyman I

    2015-02-01

    "Back to Nature" is a promising way to solve the problems that we face today, such as air pollution and shortage of energy supply based on conventional fossil fuels. A Mn cluster inside photosystem II catalyzes light-induced water-splitting leading to the generation of protons, electrons and oxygen in photosynthetic organisms, and has been considered as a good model for the synthesis of new artificial water-oxidizing catalysts. Herein, we surveyed the structural and functional details of this cluster and its surrounding environment. Then, we review the mechanistic findings concerning the cluster and compare this biological catalyst with nano-sized Mn oxides, which are among the best artificial Mn-based water-oxidizing catalysts. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis

    DTIC Science & Technology

    2015-01-01

    ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large

  20. Age and Mass for 920 Large Magellanic Cloud Clusters Derived from 100 Million Monte Carlo Simulations

    NASA Astrophysics Data System (ADS)

    Popescu, Bogdan; Hanson, M. M.; Elmegreen, Bruce G.

    2012-06-01

    We present new age and mass estimates for 920 stellar clusters in the Large Magellanic Cloud (LMC) based on previously published broadband photometry and the stellar cluster analysis package, MASSCLEANage. Expressed in the generic fitting formula, d 2 N/dMdtvpropM α t β, the distribution of observed clusters is described by α = -1.5 to -1.6 and β = -2.1 to -2.2. For 288 of these clusters, ages have recently been determined based on stellar photometric color-magnitude diagrams, allowing us to gauge the confidence of our ages. The results look very promising, opening up the possibility that this sample of 920 clusters, with reliable and consistent age, mass, and photometric measures, might be used to constrain important characteristics about the stellar cluster population in the LMC. We also investigate a traditional age determination method that uses a χ2 minimization routine to fit observed cluster colors to standard infinite-mass limit simple stellar population models. This reveals serious defects in the derived cluster age distribution using this method. The traditional χ2 minimization method, due to the variation of U, B, V, R colors, will always produce an overdensity of younger and older clusters, with an underdensity of clusters in the log (age/yr) = [7.0, 7.5] range. Finally, we present a unique simulation aimed at illustrating and constraining the fading limit in observed cluster distributions that includes the complex effects of stochastic variations in the observed properties of stellar clusters.

  1. Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images

    NASA Astrophysics Data System (ADS)

    Zhai, Han; Zhang, Hongyan; Zhang, Liangpei; Li, Pingxiang

    2016-10-01

    Considering the inevitable obstacles faced by the pixel-based clustering methods, such as salt-and-pepper noise, high computational complexity, and the lack of spatial information, a reweighted mass center based object-oriented sparse subspace clustering (RMC-OOSSC) algorithm for hyperspectral images (HSIs) is proposed. First, the mean-shift segmentation method is utilized to oversegment the HSI to obtain meaningful objects. Second, a distance reweighted mass center learning model is presented to extract the representative and discriminative features for each object. Third, assuming that all the objects are sampled from a union of subspaces, it is natural to apply the SSC algorithm to the HSI. Faced with the high correlation among the hyperspectral objects, a weighting scheme is adopted to ensure that the highly correlated objects are preferred in the procedure of sparse representation, to reduce the representation errors. Two widely used hyperspectral datasets were utilized to test the performance of the proposed RMC-OOSSC algorithm, obtaining high clustering accuracies (overall accuracy) of 71.98% and 89.57%, respectively. The experimental results show that the proposed method clearly improves the clustering performance with respect to the other state-of-the-art clustering methods, and it significantly reduces the computational time.

  2. Discrete Wavelet Transform-Based Whole-Spectral and Subspectral Analysis for Improved Brain Tumor Clustering Using Single Voxel MR Spectroscopy.

    PubMed

    Yang, Guang; Nawaz, Tahir; Barrick, Thomas R; Howe, Franklyn A; Slabaugh, Greg

    2015-12-01

    Many approaches have been considered for automatic grading of brain tumors by means of pattern recognition with magnetic resonance spectroscopy (MRS). Providing an improved technique which can assist clinicians in accurately identifying brain tumor grades is our main objective. The proposed technique, which is based on the discrete wavelet transform (DWT) of whole-spectral or subspectral information of key metabolites, combined with unsupervised learning, inspects the separability of the extracted wavelet features from the MRS signal to aid the clustering. In total, we included 134 short echo time single voxel MRS spectra (SV MRS) in our study that cover normal controls, low grade and high grade tumors. The combination of DWT-based whole-spectral or subspectral analysis and unsupervised clustering achieved an overall clustering accuracy of 94.8% and a balanced error rate of 7.8%. To the best of our knowledge, it is the first study using DWT combined with unsupervised learning to cluster brain SV MRS. Instead of dimensionality reduction on SV MRS or feature selection using model fitting, our study provides an alternative method of extracting features to obtain promising clustering results.

  3. Revealing common disease mechanisms shared by tumors of different tissues of origin through semantic representation of genomic alterations and topic modeling.

    PubMed

    Chen, Vicky; Paisley, John; Lu, Xinghua

    2017-03-14

    Cancer is a complex disease driven by somatic genomic alterations (SGAs) that perturb signaling pathways and consequently cellular function. Identifying patterns of pathway perturbations would provide insights into common disease mechanisms shared among tumors, which is important for guiding treatment and predicting outcome. However, identifying perturbed pathways is challenging, because different tumors can have the same perturbed pathways that are perturbed by different SGAs. Here, we designed novel semantic representations that capture the functional similarity of distinct SGAs perturbing a common pathway in different tumors. Combining this representation with topic modeling would allow us to identify patterns in altered signaling pathways. We represented each gene with a vector of words describing its function, and we represented the SGAs of a tumor as a text document by pooling the words representing individual SGAs. We applied the nested hierarchical Dirichlet process (nHDP) model to a collection of tumors of 5 cancer types from TCGA. We identified topics (consisting of co-occurring words) representing the common functional themes of different SGAs. Tumors were clustered based on their topic associations, such that each cluster consists of tumors sharing common functional themes. The resulting clusters contained mixtures of cancer types, which indicates that different cancer types can share disease mechanisms. Survival analysis based on the clusters revealed significant differences in survival among the tumors of the same cancer type that were assigned to different clusters. The results indicate that applying topic modeling to semantic representations of tumors identifies patterns in the combinations of altered functional pathways in cancer.

  4. The old open cluster NGC 2112: updated estimates of fundamental parameters based on a membership analysis†

    NASA Astrophysics Data System (ADS)

    Carraro, G.; Villanova, S.; Demarque, P.; Moni Bidin, C.; McSwain, M. V.

    2008-05-01

    We report on a new, wide-field (20 × 20 arcmin2), multicolour (UBVI), photometric campaign in the area of the nearby old open cluster NGC 2112. At the same time, we provide medium-resolution spectroscopy of 35 (and high-resolution of additional 5) red giant and turn-off stars. This material is analysed with the aim to update the fundamental parameters of this traditionally difficult cluster, which is very sparse and suffers from heavy field star contamination. Among the 40 stars with spectra, we identified 21 bona fide radial velocity members which allow us to put more solid constraints on the cluster's metal abundance, long suggested to be as low as the metallicity of globulars. As indicated earlier by us on a purely photometric basis, the cluster [Fe/H] abundance is slightly supersolar ([Fe/H] = 0.16 +/- 0.03) and close to the Hyades value, as inferred from a detailed abundance analysis of three of the five stars with higher resolution spectra. Abundance ratios are also marginally supersolar. Based on this result, we revise the properties of NGC 2112 using stellar models from the Padova and Yale-Yonsei groups. For this metal abundance, we find that the cluster's age, reddening and distance values are 1.8 Gyr, 0.60 mag and 940 pc, respectively. Both the Yale-Yonsei and Padova models predict the same values for the fundamental parameters within the errors. Overall, NGC 2112 is a typical solar neighbourhood, thin-disc star cluster, sharing the same chemical properties of F-G stars and open clusters close to the Sun. This investigation outlines the importance of a detailed membership analysis in the study of disc star clusters. This paper includes data gathered with the 6.5 Magellan Telescopes, located at Las Campanas Observatory, Chile. The data discussed in this paper will be made available at the WEBDA open cluster data base http://www.univie.ac.at/webda, which is maintained by E. Paunzen and J.-C. Mermilliod. ‡ E-mail: gcarraro@eso.org (GC); sandro.villanova@unipd.it (SV); demarque@astro.yale.edu (PD); mbidin@das.uchile.cl (CMB); mcswain@lehigh.edu(MVM)

  5. Para-hydrogen and helium cluster size distributions in free jet expansions based on Smoluchowski theory with kernel scaling.

    PubMed

    Kornilov, Oleg; Toennies, J Peter

    2015-02-21

    The size distribution of para-H2 (pH2) clusters produced in free jet expansions at a source temperature of T0 = 29.5 K and pressures of P0 = 0.9-1.96 bars is reported and analyzed according to a cluster growth model based on the Smoluchowski theory with kernel scaling. Good overall agreement is found between the measured and predicted, Nk = A k(a) e(-bk), shape of the distribution. The fit yields values for A and b for values of a derived from simple collision models. The small remaining deviations between measured abundances and theory imply a (pH2)k magic number cluster of k = 13 as has been observed previously by Raman spectroscopy. The predicted linear dependence of b(-(a+1)) on source gas pressure was verified and used to determine the value of the basic effective agglomeration reaction rate constant. A comparison of the corresponding effective growth cross sections σ11 with results from a similar analysis of He cluster size distributions indicates that the latter are much larger by a factor 6-10. An analysis of the three body recombination rates, the geometric sizes and the fact that the He clusters are liquid independent of their size can explain the larger cross sections found for He.

  6. Whole-Volume Clustering of Time Series Data from Zebrafish Brain Calcium Images via Mixture Modeling.

    PubMed

    Nguyen, Hien D; Ullmann, Jeremy F P; McLachlan, Geoffrey J; Voleti, Venkatakaushik; Li, Wenze; Hillman, Elizabeth M C; Reutens, David C; Janke, Andrew L

    2018-02-01

    Calcium is a ubiquitous messenger in neural signaling events. An increasing number of techniques are enabling visualization of neurological activity in animal models via luminescent proteins that bind to calcium ions. These techniques generate large volumes of spatially correlated time series. A model-based functional data analysis methodology via Gaussian mixtures is suggested for the clustering of data from such visualizations is proposed. The methodology is theoretically justified and a computationally efficient approach to estimation is suggested. An example analysis of a zebrafish imaging experiment is presented.

  7. Object Tracking Using Adaptive Covariance Descriptor and Clustering-Based Model Updating for Visual Surveillance

    PubMed Central

    Qin, Lei; Snoussi, Hichem; Abdallah, Fahed

    2014-01-01

    We propose a novel approach for tracking an arbitrary object in video sequences for visual surveillance. The first contribution of this work is an automatic feature extraction method that is able to extract compact discriminative features from a feature pool before computing the region covariance descriptor. As the feature extraction method is adaptive to a specific object of interest, we refer to the region covariance descriptor computed using the extracted features as the adaptive covariance descriptor. The second contribution is to propose a weakly supervised method for updating the object appearance model during tracking. The method performs a mean-shift clustering procedure among the tracking result samples accumulated during a period of time and selects a group of reliable samples for updating the object appearance model. As such, the object appearance model is kept up-to-date and is prevented from contamination even in case of tracking mistakes. We conducted comparing experiments on real-world video sequences, which confirmed the effectiveness of the proposed approaches. The tracking system that integrates the adaptive covariance descriptor and the clustering-based model updating method accomplished stable object tracking on challenging video sequences. PMID:24865883

  8. An application of seasonal ARIMA models on group commodities to forecast Philippine merchandise exports performance

    NASA Astrophysics Data System (ADS)

    Natividad, Gina May R.; Cawiding, Olive R.; Addawe, Rizavel C.

    2017-11-01

    The increase in the merchandise exports of the country offers information about the Philippines' trading role within the global economy. Merchandise exports statistics are used to monitor the country's overall production that is consumed overseas. This paper investigates the comparison between two models obtained by a) clustering the commodity groups into two based on its proportional contribution to the total exports, and b) treating only the total exports. Different seasonal autoregressive integrated moving average (SARIMA) models were then developed for the clustered commodities and for the total exports based on the monthly merchandise exports of the Philippines from 2011 to 2016. The data set used in this study was retrieved from the Philippine Statistics Authority (PSA) which is the central statistical authority in the country responsible for primary data collection. A test for significance of the difference between means at 0.05 level of significance was then performed on the forecasts produced. The result indicates that there is a significant difference between the mean of the forecasts of the two models. Moreover, upon a comparison of the root mean square error (RMSE) and mean absolute error (MAE) of the models, it was found that the models used for the clustered groups outperform the model for the total exports.

  9. A unifying model for adsorption and nucleation of vapors on solid surfaces.

    PubMed

    Laaksonen, Ari

    2015-04-23

    Vapor interaction with solid surfaces is traditionally described with adsorption isotherms in the undersaturated regime and with heterogeneous nucleation theory in the supersaturated regime. A class of adsorption isotherms is based on the idea of vapor molecule clustering around so-called active sites. However, as the isotherms do not account for the surface curvature effects of the clusters, they predict an infinitely thick adsorption layer at saturation and do not recognize the existence of the supersaturated regime. The classical heterogeneous nucleation theory also builds on the idea of cluster formation, but describes the interactions between the surface and the cluster with a single parameter, the contact angle, which provides limited information compared with adsorption isotherms. Here, a new model of vapor adsorption on nonporous solid surfaces is derived. The basic assumption is that adsorption proceeds via formation of molecular clusters, modeled as liquid caps. The equilibrium of the individual clusters with the vapor phase is described with the Frenkel-Halsey-Hill (FHH) adsorption theory modified with the Kelvin equation that corrects for the curvature effect on vapor pressure. The new model extends the FHH adsorption isotherm to be applicable both at submonolayer surface coverages and at supersaturated conditions. It shows good agreement with experimental adsorption data from 12 different adsorbent-adsorbate systems. The model predictions are also compared against heterogeneous nucleation data, and they show much better agreement than predictions of the classical heterogeneous nucleation theory.

  10. The Next Generation Virgo Cluster Survey (NGVS). XXV. Fiducial Panchromatic Colors of Virgo Core Globular Clusters and Their Comparison to Model Predictions

    NASA Astrophysics Data System (ADS)

    Powalka, Mathieu; Lançon, Ariane; Puzia, Thomas H.; Peng, Eric W.; Liu, Chengze; Muñoz, Roberto P.; Blakeslee, John P.; Côté, Patrick; Ferrarese, Laura; Roediger, Joel; Sánchez-Janssen, Rúben; Zhang, Hongxin; Durrell, Patrick R.; Cuillandre, Jean-Charles; Duc, Pierre-Alain; Guhathakurta, Puragra; Gwyn, S. D. J.; Hudelot, Patrick; Mei, Simona; Toloba, Elisa

    2016-11-01

    The central region of the Virgo Cluster of galaxies contains thousands of globular clusters (GCs), an order of magnitude more than the number of clusters found in the Local Group. Relics of early star formation epochs in the universe, these GCs also provide ideal targets to test our understanding of the spectral energy distributions (SEDs) of old stellar populations. Based on photometric data from the Next Generation Virgo Cluster Survey (NGVS) and its near-infrared counterpart NGVS-IR, we select a robust sample of ≈ 2000 GCs with excellent photometry and tha span the full range of colors present in the Virgo core. The selection exploits the well-defined locus of GCs in the uiK diagram and the fact that the GCs are marginally resolved in the images. We show that the GCs define a narrow sequence in five-dimensional color space, with limited but real dispersion around the mean sequence. The comparison of these SEDs with the predictions of 11 widely used population synthesis models highlights differences between the models and also shows that no single model adequately matches the data in all colors. We discuss possible causes for some of these discrepancies. Forthcoming papers of this series will examine how best to estimate photometric metallicities in this context, and compare the Virgo GC colors with those in other environments.

  11. Who Visits a National Park and What do They Get Out of It?: A Joint Visitor Cluster Analysis and Travel Cost Model for Yellowstone National Park

    NASA Astrophysics Data System (ADS)

    Benson, Charles; Watson, Philip; Taylor, Garth; Cook, Philip; Hollenhorst, Steve

    2013-10-01

    Yellowstone National Park visitor data were obtained from a survey collected for the National Park Service by the Park Studies Unit at the University of Idaho. Travel cost models have been conducted for national parks in the United States; however, this study builds on these studies and investigates how benefits vary by types of visitors who participate in different activities while at the park. Visitor clusters were developed based on activities in which a visitor participated while at the park. The clusters were analyzed and then incorporated into a travel cost model to determine the economic value (consumer surplus) that the different visitor groups received from visiting the park. The model was estimated using a zero-truncated negative binomial regression corrected for endogenous stratification. The travel cost price variable was estimated using both 1/3 and 1/4 the wage rate to test for sensitivity to opportunity cost specification. The average benefit across all visitor cluster groups was estimated at between 235 and 276 per person per trip. However, per trip benefits varied substantially across clusters; from 90 to 103 for the "value picnickers," to 185-263 for the "backcountry enthusiasts," 189-278 for the "do it all adventurists," 204-303 for the "windshield tourists," and 323-714 for the "creature comfort" cluster group.

  12. Accounting for Non-Gaussian Sources of Spatial Correlation in Parametric Functional Magnetic Resonance Imaging Paradigms I: Revisiting Cluster-Based Inferences.

    PubMed

    Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Sathian, K

    2018-02-01

    In a recent study, Eklund et al. employed resting-state functional magnetic resonance imaging data as a surrogate for null functional magnetic resonance imaging (fMRI) datasets and posited that cluster-wise family-wise error (FWE) rate-corrected inferences made by using parametric statistical methods in fMRI studies over the past two decades may have been invalid, particularly for cluster defining thresholds less stringent than p < 0.001; this was principally because the spatial autocorrelation functions (sACF) of fMRI data had been modeled incorrectly to follow a Gaussian form, whereas empirical data suggested otherwise. Here, we show that accounting for non-Gaussian signal components such as those arising from resting-state neural activity as well as physiological responses and motion artifacts in the null fMRI datasets yields first- and second-level general linear model analysis residuals with nearly uniform and Gaussian sACF. Further comparison with nonparametric permutation tests indicates that cluster-based FWE corrected inferences made with Gaussian spatial noise approximations are valid.

  13. Stability and mobility of Cu-vacancy clusters in Fe-Cu alloys: A computational study based on the use of artificial neural networks for energy barrier calculations

    NASA Astrophysics Data System (ADS)

    Pascuet, M. I.; Castin, N.; Becquart, C. S.; Malerba, L.

    2011-05-01

    An atomistic kinetic Monte Carlo (AKMC) method has been applied to study the stability and mobility of copper-vacancy clusters in Fe. This information, which cannot be obtained directly from experimental measurements, is needed to parameterise models describing the nanostructure evolution under irradiation of Fe alloys (e.g. model alloys for reactor pressure vessel steels). The physical reliability of the AKMC method has been improved by employing artificial intelligence techniques for the regression of the activation energies required by the model as input. These energies are calculated allowing for the effects of local chemistry and relaxation, using an interatomic potential fitted to reproduce them as accurately as possible and the nudged-elastic-band method. The model validation was based on comparison with available ab initio calculations for verification of the used cohesive model, as well as with other models and theories.

  14. Study of ^{14}C Cluster Decay Half-Lives of Heavy Deformed Nuclei

    NASA Astrophysics Data System (ADS)

    Shamami, S. Rahimi; Pahlavani, M. R.

    2018-01-01

    A theoretical model based on deformed Woods-Saxon, Coulomb and centrifugal terms are constructed to evaluate the half-lives for the cluster radioactivity of various super heavy nuclei. Deformation have been applied on all parts of their potential containing nuclear barrier for cluster decay. Also, both parent and daughter nuclei are considered to be deformed. The calculated results of ^{14}C cluster radioactivity half-lives are compared with available experimental data. A satisfactory agreement between theoretical and measured data is achieved. Also, obtained half-lives for each decay family is agreed with Geiger-Nuttall law.

  15. A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set

    PubMed Central

    Peng, Yi; Zhang, Yong; Kou, Gang; Shi, Yong

    2012-01-01

    Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm–k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study. PMID:22870181

  16. Analytical network process based optimum cluster head selection in wireless sensor network.

    PubMed

    Farman, Haleem; Javed, Huma; Jan, Bilal; Ahmad, Jamil; Ali, Shaukat; Khalil, Falak Naz; Khan, Murad

    2017-01-01

    Wireless Sensor Networks (WSNs) are becoming ubiquitous in everyday life due to their applications in weather forecasting, surveillance, implantable sensors for health monitoring and other plethora of applications. WSN is equipped with hundreds and thousands of small sensor nodes. As the size of a sensor node decreases, critical issues such as limited energy, computation time and limited memory become even more highlighted. In such a case, network lifetime mainly depends on efficient use of available resources. Organizing nearby nodes into clusters make it convenient to efficiently manage each cluster as well as the overall network. In this paper, we extend our previous work of grid-based hybrid network deployment approach, in which merge and split technique has been proposed to construct network topology. Constructing topology through our proposed technique, in this paper we have used analytical network process (ANP) model for cluster head selection in WSN. Five distinct parameters: distance from nodes (DistNode), residual energy level (REL), distance from centroid (DistCent), number of times the node has been selected as cluster head (TCH) and merged node (MN) are considered for CH selection. The problem of CH selection based on these parameters is tackled as a multi criteria decision system, for which ANP method is used for optimum cluster head selection. Main contribution of this work is to check the applicability of ANP model for cluster head selection in WSN. In addition, sensitivity analysis is carried out to check the stability of alternatives (available candidate nodes) and their ranking for different scenarios. The simulation results show that the proposed method outperforms existing energy efficient clustering protocols in terms of optimum CH selection and minimizing CH reselection process that results in extending overall network lifetime. This paper analyzes that ANP method used for CH selection with better understanding of the dependencies of different components involved in the evaluation process.

  17. Analytical network process based optimum cluster head selection in wireless sensor network

    PubMed Central

    Javed, Huma; Jan, Bilal; Ahmad, Jamil; Ali, Shaukat; Khalil, Falak Naz; Khan, Murad

    2017-01-01

    Wireless Sensor Networks (WSNs) are becoming ubiquitous in everyday life due to their applications in weather forecasting, surveillance, implantable sensors for health monitoring and other plethora of applications. WSN is equipped with hundreds and thousands of small sensor nodes. As the size of a sensor node decreases, critical issues such as limited energy, computation time and limited memory become even more highlighted. In such a case, network lifetime mainly depends on efficient use of available resources. Organizing nearby nodes into clusters make it convenient to efficiently manage each cluster as well as the overall network. In this paper, we extend our previous work of grid-based hybrid network deployment approach, in which merge and split technique has been proposed to construct network topology. Constructing topology through our proposed technique, in this paper we have used analytical network process (ANP) model for cluster head selection in WSN. Five distinct parameters: distance from nodes (DistNode), residual energy level (REL), distance from centroid (DistCent), number of times the node has been selected as cluster head (TCH) and merged node (MN) are considered for CH selection. The problem of CH selection based on these parameters is tackled as a multi criteria decision system, for which ANP method is used for optimum cluster head selection. Main contribution of this work is to check the applicability of ANP model for cluster head selection in WSN. In addition, sensitivity analysis is carried out to check the stability of alternatives (available candidate nodes) and their ranking for different scenarios. The simulation results show that the proposed method outperforms existing energy efficient clustering protocols in terms of optimum CH selection and minimizing CH reselection process that results in extending overall network lifetime. This paper analyzes that ANP method used for CH selection with better understanding of the dependencies of different components involved in the evaluation process. PMID:28719616

  18. Image texture segmentation using a neural network

    NASA Astrophysics Data System (ADS)

    Sayeh, Mohammed R.; Athinarayanan, Ragu; Dhali, Pushpuak

    1992-09-01

    In this paper we use a neural network called the Lyapunov associative memory (LYAM) system to segment image texture into different categories or clusters. The LYAM system is constructed by a set of ordinary differential equations which are simulated on a digital computer. The clustering can be achieved by using a single tuning parameter in the simplest model. Pattern classes are represented by the stable equilibrium states of the system. Design of the system is based on synthesizing two local energy functions, namely, the learning and recall energy functions. Before the implementation of the segmentation process, a Gauss-Markov random field (GMRF) model is applied to the raw image. This application suitably reduces the image data and prepares the texture information for the neural network process. We give a simple image example illustrating the capability of the technique. The GMRF-generated features are also used for a clustering, based on the Euclidean distance.

  19. On selecting a prior for the precision parameter of Dirichlet process mixture models

    USGS Publications Warehouse

    Dorazio, R.M.

    2009-01-01

    In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter ?? and a base probability measure G0. In problems where ?? is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for ??. In this paper an approach is developed for computing a prior for the precision parameter ?? that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.

  20. Modeling sports highlights using a time-series clustering framework and model interpretation

    NASA Astrophysics Data System (ADS)

    Radhakrishnan, Regunathan; Otsuka, Isao; Xiong, Ziyou; Divakaran, Ajay

    2005-01-01

    In our past work on sports highlights extraction, we have shown the utility of detecting audience reaction using an audio classification framework. The audio classes in the framework were chosen based on intuition. In this paper, we present a systematic way of identifying the key audio classes for sports highlights extraction using a time series clustering framework. We treat the low-level audio features as a time series and model the highlight segments as "unusual" events in a background of an "usual" process. The set of audio classes to characterize the sports domain is then identified by analyzing the consistent patterns in each of the clusters output from the time series clustering framework. The distribution of features from the training data so obtained for each of the key audio classes, is parameterized by a Minimum Description Length Gaussian Mixture Model (MDL-GMM). We also interpret the meaning of each of the mixture components of the MDL-GMM for the key audio class (the "highlight" class) that is correlated with highlight moments. Our results show that the "highlight" class is a mixture of audience cheering and commentator's excited speech. Furthermore, we show that the precision-recall performance for highlights extraction based on this "highlight" class is better than that of our previous approach which uses only audience cheering as the key highlight class.

  1. Fragment-based {sup 13}C nuclear magnetic resonance chemical shift predictions in molecular crystals: An alternative to planewave methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hartman, Joshua D.; Beran, Gregory J. O., E-mail: gregory.beran@ucr.edu; Monaco, Stephen

    2015-09-14

    We assess the quality of fragment-based ab initio isotropic {sup 13}C chemical shift predictions for a collection of 25 molecular crystals with eight different density functionals. We explore the relative performance of cluster, two-body fragment, combined cluster/fragment, and the planewave gauge-including projector augmented wave (GIPAW) models relative to experiment. When electrostatic embedding is employed to capture many-body polarization effects, the simple and computationally inexpensive two-body fragment model predicts both isotropic {sup 13}C chemical shifts and the chemical shielding tensors as well as both cluster models and the GIPAW approach. Unlike the GIPAW approach, hybrid density functionals can be used readilymore » in a fragment model, and all four hybrid functionals tested here (PBE0, B3LYP, B3PW91, and B97-2) predict chemical shifts in noticeably better agreement with experiment than the four generalized gradient approximation (GGA) functionals considered (PBE, OPBE, BLYP, and BP86). A set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided based on these benchmark calculations. Statistical cross-validation procedures are used to demonstrate the robustness of these fits.« less

  2. Inferring HIV-1 Transmission Dynamics in Germany From Recently Transmitted Viruses.

    PubMed

    Pouran Yousef, Kaveh; Meixenberger, Karolin; Smith, Maureen R; Somogyi, Sybille; Gromöller, Silvana; Schmidt, Daniel; Gunsenheimer-Bartmeyer, Barbara; Hamouda, Osamah; Kücherer, Claudia; von Kleist, Max

    2016-11-01

    Although HIV continues to spread globally, novel intervention strategies such as treatment as prevention (TasP) may bring the epidemic to a halt. However, their effective implementation requires a profound understanding of the underlying transmission dynamics. We analyzed parameters of the German HIV epidemic based on phylogenetic clustering of viral sequences from recently infected seroconverters with known infection dates. Viral baseline and follow-up pol sequences (n = 1943) from 1159 drug-naïve individuals were selected from a nationwide long-term observational study initiated in 1997. Putative transmission clusters were computed based on a maximum likelihood phylogeny. Using individual follow-up sequences, we optimized our clustering threshold to maximize the likelihood of co-clustering individuals connected by direct transmission. The sizes of putative transmission clusters scaled inversely with their abundance and their distribution exhibited a heavy tail. Clusters based on the optimal clustering threshold were significantly more likely to contain members of the same or bordering German federal states. Interinfection times between co-clustered individuals were significantly shorter (26 weeks; interquartile range: 13-83) than in a null model. Viral intraindividual evolution may be used to select criteria that maximize co-clustering of transmission pairs in the absence of strong adaptive selection pressure. Interinfection times of co-clustered individuals may then be an indicator of the typical time to onward transmission. Our analysis suggests that onward transmission may have occurred early after infection, when individuals are typically unaware of their serological status. The latter argues that TasP should be combined with HIV testing campaigns to reduce the possibility of transmission before TasP initiation.

  3. Inductive Approaches to Improving Diagnosis and Design for Diagnosability

    NASA Technical Reports Server (NTRS)

    Fisher, Douglas H. (Principal Investigator)

    1995-01-01

    The first research area under this grant addresses the problem of classifying time series according to their morphological features in the time domain. A supervised learning system called CALCHAS, which induces a classification procedure for signatures from preclassified examples, was developed. For each of several signature classes, the system infers a model that captures the class's morphological features using Bayesian model induction and the minimum message length approach to assign priors. After induction, a time series (signature) is classified in one of the classes when there is enough evidence to support that decision. Time series with sufficiently novel features, belonging to classes not present in the training set, are recognized as such. A second area of research assumes two sources of information about a system: a model or domain theory that encodes aspects of the system under study and data from actual system operations over time. A model, when it exists, represents strong prior expectations about how a system will perform. Our work with a diagnostic model of the RCS (Reaction Control System) of the Space Shuttle motivated the development of SIG, a system which combines information from a model (or domain theory) and data. As it tracks RCS behavior, the model computes quantitative and qualitative values. Induction is then performed over the data represented by both the 'raw' features and the model-computed high-level features. Finally, work on clustering for operating mode discovery motivated some important extensions to the clustering strategy we had used. One modification appends an iterative optimization technique onto the clustering system; this optimization strategy appears to be novel in the clustering literature. A second modification improves the noise tolerance of the clustering system. In particular, we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus making post-clustering analysis easier.

  4. Applying Model Analysis to a Resource-Based Analysis of the Force and Motion Conceptual Evaluation

    ERIC Educational Resources Information Center

    Smith, Trevor I.; Wittmann, Michael C.; Carter, Tom

    2014-01-01

    Previously, we analyzed the Force and Motion Conceptual Evaluation in terms of a resources-based model that allows for clustering of questions so as to provide useful information on how students correctly or incorrectly reason about physics. In this paper, we apply model analysis to show that the associated model plots provide more information…

  5. Activity of a social dynamics model

    NASA Astrophysics Data System (ADS)

    Reia, Sandro M.; Neves, Ubiraci P. C.

    2015-10-01

    Axelrod's model was proposed to study interactions between agents and the formation of cultural domains. It presents a transition from a monocultural to a multicultural steady state which has been studied in the literature by evaluation of the relative size of the largest cluster. In this article, we propose new measurements based on the concept of activity per agent to study the Axelrod's model on the square lattice. We show that the variance of system activity can be used to indicate the critical points of the transition. Furthermore the frequency distribution of the system activity is able to show a coexistence of phases typical of a first order phase transition. Finally, we verify a power law dependence between cluster activity and cluster size for multicultural steady state configurations at the critical point.

  6. A two-stage model of fracture of rocks

    USGS Publications Warehouse

    Kuksenko, V.; Tomilin, N.; Damaskinskaya, E.; Lockner, D.

    1996-01-01

    In this paper we propose a two-stage model of rock fracture. In the first stage, cracks or local regions of failure are uncorrelated occur randomly throughout the rock in response to loading of pre-existing flaws. As damage accumulates in the rock, there is a gradual increase in the probability that large clusters of closely spaced cracks or local failure sites will develop. Based on statistical arguments, a critical density of damage will occur where clusters of flaws become large enough to lead to larger-scale failure of the rock (stage two). While crack interaction and cooperative failure is expected to occur within clusters of closely spaced cracks, the initial development of clusters is predicted based on the random variation in pre-existing Saw populations. Thus the onset of the unstable second stage in the model can be computed from the generation of random, uncorrelated damage. The proposed model incorporates notions of the kinetic (and therefore time-dependent) nature of the strength of solids as well as the discrete hierarchic structure of rocks and the flaw populations that lead to damage accumulation. The advantage offered by this model is that its salient features are valid for fracture processes occurring over a wide range of scales including earthquake processes. A notion of the rank of fracture (fracture size) is introduced, and criteria are presented for both fracture nucleation and the transition of the failure process from one scale to another.

  7. Determination of Fundamental Properties of an M31 Globular Cluster from Main-Sequence Photometry

    NASA Astrophysics Data System (ADS)

    Ma, Jun; Wu, Zhenyu; Wang, Song; Fan, Zhou; Zhou, Xu; Wu, Jianghua; Jiang, Zhaoji; Chen, Jiansheng

    2010-10-01

    M31 globular cluster B379 is the first extragalactic cluster whose age was determined by main-sequence photometry. In the main-sequence photometric method, the age of a cluster is obtained by fitting its color-magnitude diagram (CMD) with stellar evolutionary models. However, different stellar evolutionary models use different parameters of stellar evolution, such as range of stellar masses, different opacities and equations of state, and different recipes, and so on. So, it is interesting to check whether different stellar evolutionary models can give consistent results for the same cluster. Brown et al. constrained the age of B379 by comparing its CMD with isochrones of the 2006 VandenBerg models. Using SSP models of Bruzual & Charlot and its multiphotometry, ZMa et al. independently determined the age of B379, which is in good agreement with the determination of Brown et al. The models of Bruzual & Charlot are calculated based on the Padova evolutionary tracks. It is necessary to check whether the age of B379 as determined based on the Padova evolutionary tracks is in agreement with the determination of Brown et al.. In this article, we redetermine the age of B379 using isochrones of the Padova stellar evolutionary models. In addition, the metal abundance, the distance modulus, and the reddening value for B379 are reported. The results obtained are consistent with the previous determinations, which include the age obtained by Brown et al. This article thus confirms the consistency of the age scale of B379 between the Padova isochrones and the 2006 VandenBerg isochrones; i.e., the comparison between the results of Brown et al. and Ma et al. is meaningful. The results reported in this article of values found for B379 are: metallicity [M/H] = log(Z/Z ⊙) = -0.325, age τ = 11.0 ± 1.5 Gyr, reddening E(B - V) = 0.08, and distance modulus (m - M)0 = 24.44 ± 0.10.

  8. Limits on turbulent propagation of energy in cool-core clusters of galaxies

    NASA Astrophysics Data System (ADS)

    Bambic, C. J.; Pinto, C.; Fabian, A. C.; Sanders, J.; Reynolds, C. S.

    2018-07-01

    We place constraints on the propagation velocity of bulk turbulence within the intracluster medium of three clusters and an elliptical galaxy. Using Reflection Grating Spectrometer measurements of turbulent line broadening, we show that for these clusters, the 90 per cent upper limit on turbulent velocities when accounting for instrumental broadening is too low to propagate energy radially to the cooling radius of the clusters within the required cooling time. In this way, we extend previous Hitomi-based analysis on the Perseus cluster to more clusters, with the intention of applying these results to a future, more extensive catalogue. These results constrain models of turbulent heating in active galactic nucleus feedback by requiring a mechanism which can not only provide sufficient energy to offset radiative cooling but also resupply that energy rapidly enough to balance cooling at each cluster radius.

  9. Limits on turbulent propagation of energy in cool-core clusters of galaxies

    NASA Astrophysics Data System (ADS)

    Bambic, C. J.; Pinto, C.; Fabian, A. C.; Sanders, J.; Reynolds, C. S.

    2018-04-01

    We place constraints on the propagation velocity of bulk turbulence within the intracluster medium of three clusters and an elliptical galaxy. Using Reflection Grating Spectrometer measurements of turbulent line broadening, we show that for these clusters, the 90% upper limit on turbulent velocities when accounting for instrumental broadening is too low to propagate energy radially to the cooling radius of the clusters within the required cooling time. In this way, we extend previous Hitomi-based analysis on the Perseus cluster to more clusters, with the intention of applying these results to a future, more extensive catalog. These results constrain models of turbulent heating in AGN feedback by requiring a mechanism which can not only provide sufficient energy to offset radiative cooling, but resupply that energy rapidly enough to balance cooling at each cluster radius.

  10. Graph-Based Object Class Discovery

    NASA Astrophysics Data System (ADS)

    Xia, Shengping; Hancock, Edwin R.

    We are interested in the problem of discovering the set of object classes present in a database of images using a weakly supervised graph-based framework. Rather than making use of the ”Bag-of-Features (BoF)” approach widely used in current work on object recognition, we represent each image by a graph using a group of selected local invariant features. Using local feature matching and iterative Procrustes alignment, we perform graph matching and compute a similarity measure. Borrowing the idea of query expansion , we develop a similarity propagation based graph clustering (SPGC) method. Using this method class specific clusters of the graphs can be obtained. Such a cluster can be generally represented by using a higher level graph model whose vertices are the clustered graphs, and the edge weights are determined by the pairwise similarity measure. Experiments are performed on a dataset, in which the number of images increases from 1 to 50K and the number of objects increases from 1 to over 500. Some objects have been discovered with total recall and a precision 1 in a single cluster.

  11. Ozone levels in the Empty Quarter of Saudi Arabia--application of adaptive neuro-fuzzy model.

    PubMed

    Rahman, Syed Masiur; Khondaker, A N; Khan, Rouf Ahmad

    2013-05-01

    In arid regions, primary pollutants may contribute to the increase of ozone levels and cause negative effects on biotic health. This study investigates the use of adaptive neuro-fuzzy inference system (ANFIS) for ozone prediction. The initial fuzzy inference system is developed by using fuzzy C-means (FCM) and subtractive clustering (SC) algorithms, which determines the important rules, increases generalization capability of the fuzzy inference system, reduces computational needs, and ensures speedy model development. The study area is located in the Empty Quarter of Saudi Arabia, which is considered as a source of huge potential for oil and gas field development. The developed clustering algorithm-based ANFIS model used meteorological data and derived meteorological data, along with NO and NO₂ concentrations and their transformations, as inputs. The root mean square error and Willmott's index of agreement of the FCM- and SC-based ANFIS models are 3.5 ppbv and 0.99, and 8.9 ppbv and 0.95, respectively. Based on the analysis of the performance measures and regression error characteristic curves, it is concluded that the FCM-based ANFIS model outperforms the SC-based ANFIS model.

  12. Ab initio molecular dynamics simulation of binary Cu64Zr36 bulk metallic glass: Validation of the cluster-plus-glue-atom model

    NASA Astrophysics Data System (ADS)

    Tian, Hua; Zhang, Chong; Wang, Lu; Zhao, JiJun; Dong, Chuang; Wen, Bin; Wang, Qing

    2011-06-01

    We have performed ab initio molecular dynamics simulation of Cu64Zr36 alloy at descending temperatures (from 2000 K to 400 K) and discussed the evolution of short-range order with temperature. The pair-correlation functions, coordination numbers, and chemical compositions of the most abundant local clusters have been analyzed. We found that icosahedral short-range order exists in the liquid, undercooled, and glass states, and it becomes dominant in the glass states. Moreover, we demonstrated the existence of Cu-centered Cu8Zr5 icosahedral clusters as the major local structural unit in the Cu64Zr36 amorphous alloy. This finding agrees well with our previous cluster model of Cu-Zr-based BMG as well as experimental evidences from synchrotron x ray and neutron diffraction measurements.

  13. Alignment and integration of complex networks by hypergraph-based spectral clustering

    NASA Astrophysics Data System (ADS)

    Michoel, Tom; Nachtergaele, Bruno

    2012-11-01

    Complex networks possess a rich, multiscale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction, or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.

  14. Alignment and integration of complex networks by hypergraph-based spectral clustering.

    PubMed

    Michoel, Tom; Nachtergaele, Bruno

    2012-11-01

    Complex networks possess a rich, multiscale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction, or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.

  15. Application of Hermitian time-dependent coupled-cluster response Ansätze of second order to excitation energies and frequency-dependent dipole polarizabilities

    NASA Astrophysics Data System (ADS)

    Wälz, Gero; Kats, Daniel; Usvyat, Denis; Korona, Tatiana; Schütz, Martin

    2012-11-01

    Linear-response methods, based on the time-dependent variational coupled-cluster or the unitary coupled-cluster model, and truncated at the second order according to the Møller-Plesset partitioning, i.e., the TD-VCC[2] and TD-UCC[2] linear-response methods, are presented and compared. For both of these methods a Hermitian eigenvalue problem has to be solved to obtain excitation energies and state eigenvectors. The excitation energies thus are guaranteed always to be real valued, and the eigenvectors are mutually orthogonal, in contrast to response theories based on “traditional” coupled-cluster models. It turned out that the TD-UCC[2] working equations for excitation energies and polarizabilities are equivalent to those of the second-order algebraic diagrammatic construction scheme ADC(2). Numerical tests are carried out by calculating TD-VCC[2] and TD-UCC[2] excitation energies and frequency-dependent dipole polarizabilities for several test systems and by comparing them to the corresponding values obtained from other second- and higher-order methods. It turns out that the TD-VCC[2] polarizabilities in the frequency regions away from the poles are of a similar accuracy as for other second-order methods, as expected from the perturbative analysis of the TD-VCC[2] polarizability expression. On the other hand, the TD-VCC[2] excitation energies are systematically too low relative to other second-order methods (including TD-UCC[2]). On the basis of these results and an analysis presented in this work, we conjecture that the perturbative expansion of the Jacobian converges more slowly for the TD-VCC formalism than for TD-UCC or for response theories based on traditional coupled-cluster models.

  16. Model-based recursive partitioning to identify risk clusters for metabolic syndrome and its components: findings from the International Mobility in Aging Study

    PubMed Central

    Pirkle, Catherine M; Wu, Yan Yan; Zunzunegui, Maria-Victoria; Gómez, José Fernando

    2018-01-01

    Objective Conceptual models underpinning much epidemiological research on ageing acknowledge that environmental, social and biological systems interact to influence health outcomes. Recursive partitioning is a data-driven approach that allows for concurrent exploration of distinct mixtures, or clusters, of individuals that have a particular outcome. Our aim is to use recursive partitioning to examine risk clusters for metabolic syndrome (MetS) and its components, in order to identify vulnerable populations. Study design Cross-sectional analysis of baseline data from a prospective longitudinal cohort called the International Mobility in Aging Study (IMIAS). Setting IMIAS includes sites from three middle-income countries—Tirana (Albania), Natal (Brazil) and Manizales (Colombia)—and two from Canada—Kingston (Ontario) and Saint-Hyacinthe (Quebec). Participants Community-dwelling male and female adults, aged 64–75 years (n=2002). Primary and secondary outcome measures We apply recursive partitioning to investigate social and behavioural risk factors for MetS and its components. Model-based recursive partitioning (MOB) was used to cluster participants into age-adjusted risk groups based on variabilities in: study site, sex, education, living arrangements, childhood adversities, adult occupation, current employment status, income, perceived income sufficiency, smoking status and weekly minutes of physical activity. Results 43% of participants had MetS. Using MOB, the primary partitioning variable was participant sex. Among women from middle-incomes sites, the predicted proportion with MetS ranged from 58% to 68%. Canadian women with limited physical activity had elevated predicted proportions of MetS (49%, 95% CI 39% to 58%). Among men, MetS ranged from 26% to 41% depending on childhood social adversity and education. Clustering for MetS components differed from the syndrome and across components. Study site was a primary partitioning variable for all components except HDL cholesterol. Sex was important for most components. Conclusion MOB is a promising technique for identifying disease risk clusters (eg, vulnerable populations) in modestly sized samples. PMID:29500203

  17. Sunyaev-Zel'dovich Effect and X-ray Scaling Relations from Weak-Lensing Mass Calibration of 32 SPT Selected Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dietrich, J.P.; et al.

    Uncertainty in the mass-observable scaling relations is currently the limiting factor for galaxy cluster based cosmology. Weak gravitational lensing can provide a direct mass calibration and reduce the mass uncertainty. We present new ground-based weak lensing observations of 19 South Pole Telescope (SPT) selected clusters and combine them with previously reported space-based observations of 13 galaxy clusters to constrain the cluster mass scaling relations with the Sunyaev-Zel'dovich effect (SZE), the cluster gas massmore » $$M_\\mathrm{gas}$$, and $$Y_\\mathrm{X}$$, the product of $$M_\\mathrm{gas}$$ and X-ray temperature. We extend a previously used framework for the analysis of scaling relations and cosmological constraints obtained from SPT-selected clusters to make use of weak lensing information. We introduce a new approach to estimate the effective average redshift distribution of background galaxies and quantify a number of systematic errors affecting the weak lensing modelling. These errors include a calibration of the bias incurred by fitting a Navarro-Frenk-White profile to the reduced shear using $N$-body simulations. We blind the analysis to avoid confirmation bias. We are able to limit the systematic uncertainties to 6.4% in cluster mass (68% confidence). Our constraints on the mass-X-ray observable scaling relations parameters are consistent with those obtained by earlier studies, and our constraints for the mass-SZE scaling relation are consistent with the the simulation-based prior used in the most recent SPT-SZ cosmology analysis. We can now replace the external mass calibration priors used in previous SPT-SZ cosmology studies with a direct, internal calibration obtained on the same clusters.« less

  18. Orbits of Selected Globular Clusters in the Galactic Bulge

    NASA Astrophysics Data System (ADS)

    Pérez-Villegas, A.; Rossi, L.; Ortolani, S.; Casotto, S.; Barbuy, B.; Bica, E.

    2018-05-01

    We present orbit analysis for a sample of eight inner bulge globular clusters, together with one reference halo object. We used proper motion values derived from long time base CCD data. Orbits are integrated in both an axisymmetric model and a model including the Galactic bar potential. The inclusion of the bar proved to be essential for the description of the dynamical behaviour of the clusters. We use the Monte Carlo scheme to construct the initial conditions for each cluster, taking into account the uncertainties in the kinematical data and distances. The sample clusters show typically maximum height to the Galactic plane below 1.5 kpc, and develop rather eccentric orbits. Seven of the bulge sample clusters share the orbital properties of the bar/bulge, having perigalactic and apogalatic distances, and maximum vertical excursion from the Galactic plane inside the bar region. NGC 6540 instead shows a completely different orbital behaviour, having a dynamical signature of the thick disc. Both prograde and prograde-retrograde orbits with respect to the direction of the Galactic rotation were revealed, which might characterise a chaotic behaviour.

  19. Statistical uncertainty of extreme wind storms over Europe derived from a probabilistic clustering technique

    NASA Astrophysics Data System (ADS)

    Walz, Michael; Leckebusch, Gregor C.

    2016-04-01

    Extratropical wind storms pose one of the most dangerous and loss intensive natural hazards for Europe. However, due to only 50 years of high quality observational data, it is difficult to assess the statistical uncertainty of these sparse events just based on observations. Over the last decade seasonal ensemble forecasts have become indispensable in quantifying the uncertainty of weather prediction on seasonal timescales. In this study seasonal forecasts are used in a climatological context: By making use of the up to 51 ensemble members, a broad and physically consistent statistical base can be created. This base can then be used to assess the statistical uncertainty of extreme wind storm occurrence more accurately. In order to determine the statistical uncertainty of storms with different paths of progression, a probabilistic clustering approach using regression mixture models is used to objectively assign storm tracks (either based on core pressure or on extreme wind speeds) to different clusters. The advantage of this technique is that the entire lifetime of a storm is considered for the clustering algorithm. Quadratic curves are found to describe the storm tracks most accurately. Three main clusters (diagonal, horizontal or vertical progression of the storm track) can be identified, each of which have their own particulate features. Basic storm features like average velocity and duration are calculated and compared for each cluster. The main benefit of this clustering technique, however, is to evaluate if the clusters show different degrees of uncertainty, e.g. more (less) spread for tracks approaching Europe horizontally (diagonally). This statistical uncertainty is compared for different seasonal forecast products.

  20. Robust Bayesian clustering.

    PubMed

    Archambeau, Cédric; Verleysen, Michel

    2007-01-01

    A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence.

  1. Brownian model of transcriptome evolution and phylogenetic network visualization between tissues.

    PubMed

    Gu, Xun; Ruan, Hang; Su, Zhixi; Zou, Yangyun

    2017-09-01

    While phylogenetic analysis of transcriptomes of the same tissue is usually congruent with the species tree, the controversy emerges when multiple tissues are included, that is, whether species from the same tissue are clustered together, or different tissues from the same species are clustered together. Recent studies have suggested that phylogenetic network approach may shed some lights on our understanding of multi-tissue transcriptome evolution; yet the underlying evolutionary mechanism remains unclear. In this paper we develop a Brownian-based model of transcriptome evolution under the phylogenetic network that can statistically distinguish between the patterns of species-clustering and tissue-clustering. Our model can be used as a null hypothesis (neutral transcriptome evolution) for testing any correlation in tissue evolution, can be applied to cancer transcriptome evolution to study whether two tumors of an individual appeared independently or via metastasis, and can be useful to detect convergent evolution at the transcriptional level. Copyright © 2017. Published by Elsevier Inc.

  2. Modeling the Dark Matter of Galaxy Clusters Using the Tensor-Vector-Scalar Theory of Alternate Gravity

    NASA Astrophysics Data System (ADS)

    Ragozzine, Brett

    The invocation of dark matter in the universe is predicated upon gravitational observations that cannot be explained by the amount of luminous matter that we detect. There is an ongoing debate over which gravitational model is correct. The work herein tests a prescription of gravity theory known as Tensor-Vector-Scalar and is based upon the work of Angus et al. (2007). We add upon this work by extending the sample of galaxy clusters to five and testing the accepted Navarro, Frenk & White (NFW) dark matter potential (Navarro et al., 1996). Our independent implementation of this method includes weak gravitational lensing analysis to determine the amount of dark matter in these galaxy clusters by calculating the gas fraction ƒgas = Mgas=Mtot. The ability of the Tensor-Vector-Scalar theory to predict a consistent ƒgas across all galaxy clusters is a measure of its liklihood of being the correct gravity model.

  3. The velocity field of clusters of galaxies within 100 megaparsecs. II - Northern clusters

    NASA Technical Reports Server (NTRS)

    Mould, J. R.; Akeson, R. L.; Bothun, G. D.; Han, M.; Huchra, J. P.; Roth, J.; Schommer, R. A.

    1993-01-01

    Distances and peculiar velocities for galaxies in eight clusters and groups have been determined by means of the near-infrared Tully-Fisher relation. With the possible exception of a group halfway between us and the Hercules Cluster, we observe peculiar velocities of the same order as the measuring errors of about 400 km/s. The present sample is drawn from the northern Galactic hemisphere and delineates a quiet region in the Hubble flow. This contrasts with the large-scale flows seen in the Hydra-Centaurus and Perseus-Pisces regions. We compare the observed peculiar velocities with predictions based upon the gravity field inferred from the IRAS redshift survey. The differences between the observed and predicted peculiar motions are generally small, except near dense structures, where the observed motions exceed the predictions by significant amounts. Kinematic models of the velocity field are also compared with the data. We cannot distinguish between parameterized models with a great attractor or models with a bulk flow.

  4. Critical exponents of the explosive percolation transition

    NASA Astrophysics Data System (ADS)

    da Costa, R. A.; Dorogovtsev, S. N.; Goltsev, A. V.; Mendes, J. F. F.

    2014-04-01

    In a new type of percolation phase transition, which was observed in a set of nonequilibrium models, each new connection between vertices is chosen from a number of possibilities by an Achlioptas-like algorithm. This causes preferential merging of small components and delays the emergence of the percolation cluster. First simulations led to a conclusion that a percolation cluster in this irreversible process is born discontinuously, by a discontinuous phase transition, which results in the term "explosive percolation transition." We have shown that this transition is actually continuous (second order) though with an anomalously small critical exponent of the percolation cluster. Here we propose an efficient numerical method enabling us to find the critical exponents and other characteristics of this second-order transition for a representative set of explosive percolation models with different number of choices. The method is based on gluing together the numerical solutions of evolution equations for the cluster size distribution and power-law asymptotics. For each of the models, with high precision, we obtain critical exponents and the critical point.

  5. Object-Oriented Image Clustering Method Using UAS Photogrammetric Imagery

    NASA Astrophysics Data System (ADS)

    Lin, Y.; Larson, A.; Schultz-Fellenz, E. S.; Sussman, A. J.; Swanson, E.; Coppersmith, R.

    2016-12-01

    Unmanned Aerial Systems (UAS) have been used widely as an imaging modality to obtain remotely sensed multi-band surface imagery, and are growing in popularity due to their efficiency, ease of use, and affordability. Los Alamos National Laboratory (LANL) has employed the use of UAS for geologic site characterization and change detection studies at a variety of field sites. The deployed UAS equipped with a standard visible band camera to collect imagery datasets. Based on the imagery collected, we use deep sparse algorithmic processing to detect and discriminate subtle topographic features created or impacted by subsurface activities. In this work, we develop an object-oriented remote sensing imagery clustering method for land cover classification. To improve the clustering and segmentation accuracy, instead of using conventional pixel-based clustering methods, we integrate the spatial information from neighboring regions to create super-pixels to avoid salt-and-pepper noise and subsequent over-segmentation. To further improve robustness of our clustering method, we also incorporate a custom digital elevation model (DEM) dataset generated using a structure-from-motion (SfM) algorithm together with the red, green, and blue (RGB) band data for clustering. In particular, we first employ an agglomerative clustering to create an initial segmentation map, from where every object is treated as a single (new) pixel. Based on the new pixels obtained, we generate new features to implement another level of clustering. We employ our clustering method to the RGB+DEM datasets collected at the field site. Through binary clustering and multi-object clustering tests, we verify that our method can accurately separate vegetation from non-vegetation regions, and are also able to differentiate object features on the surface.

  6. A multi-point perspective on the formation of polar cap arcs: kinetic modeling and observations by Cluster and TIMED

    NASA Astrophysics Data System (ADS)

    de Keyser, J. M.; Maggiolo, R.; Echim, M.; Simon, C.; Zhang, Y.; Trotignon, J.

    2010-12-01

    On April 1st, 2004 the GUVI imager onboard the TIMED spacecraft spots an isolated and elongated polar cap arc. Simultaneously, the Cluster spacecraft detects an isolated upflowing ion beam above the polar cap. Cluster observations show that the ions are accelerated upward by a quasi-stationary electric field. The field-aligned potential drop is estimated to about 600 V and the upflowing ions are accompanied by a tenuous population of isotropic protons with a temperature of about 300eV. The footprint of the magnetic field line on which the Cluster spacecraft are situated, is located just outside the GUVI field of view in the prolongation of the polar cap arc. This suggests that the upflowing ion beam and the polar cap arc may be different signatures of the same phenomenon, as suggested by a recent statistical study of polar cap ion beams using Cluster data. We use Cluster observations at high altitude as input to a quasi-stationary magnetosphere-ionosphere (MI) coupling model. Using a Knight-type current-voltage relationship and the current continuity at the topside ionosphere, the model computes the energy spectrum of precipitating electrons at ionospheric altitudes corresponding to the generator electric field observed by Cluster. The MI coupling model provides a field-aligned potential drop in agreement with Cluster observations of upflowing ions and a spatial scale of the polar cap arc consistent with the optical observations by TIMED. The energy spectrum of the precipitating electrons provided by the model is introduced as input to the Trans4 ionospheric transport code. This 1-D model, based on Boltzmann's kinetic formalism, takes into account ionospheric processes like photoionisation and electron/proton precipitation, and computes the optical and UV emissions due to precipitating electrons. The emission rates provided by the Trans4 code are then compared to the optical observations by TIMED. Data and modeling results are consistent with quasi-static acceleration of precipitating magnetospheric electrons. We also discuss possible implications of our modeling results for optical observations of polar cap arcs.

  7. Helium segregation on surfaces of plasma-exposed tungsten

    NASA Astrophysics Data System (ADS)

    Maroudas, Dimitrios; Blondel, Sophie; Hu, Lin; Hammond, Karl D.; Wirth, Brian D.

    2016-02-01

    We report a hierarchical multi-scale modeling study of implanted helium segregation on surfaces of tungsten, considered as a plasma facing component in nuclear fusion reactors. We employ a hierarchy of atomic-scale simulations based on a reliable interatomic interaction potential, including molecular-statics simulations to understand the origin of helium surface segregation, targeted molecular-dynamics (MD) simulations of near-surface cluster reactions, and large-scale MD simulations of implanted helium evolution in plasma-exposed tungsten. We find that small, mobile He n (1  ⩽  n  ⩽  7) clusters in the near-surface region are attracted to the surface due to an elastic interaction force that provides the thermodynamic driving force for surface segregation. This elastic interaction force induces drift fluxes of these mobile He n clusters, which increase substantially as the migrating clusters approach the surface, facilitating helium segregation on the surface. Moreover, the clusters’ drift toward the surface enables cluster reactions, most importantly trap mutation, in the near-surface region at rates much higher than in the bulk material. These near-surface cluster dynamics have significant effects on the surface morphology, near-surface defect structures, and the amount of helium retained in the material upon plasma exposure. We integrate the findings of such atomic-scale simulations into a properly parameterized and validated spatially dependent, continuum-scale reaction-diffusion cluster dynamics model, capable of predicting implanted helium evolution, surface segregation, and its near-surface effects in tungsten. This cluster-dynamics model sets the stage for development of fully atomistically informed coarse-grained models for computationally efficient simulation predictions of helium surface segregation, as well as helium retention and surface morphological evolution, toward optimal design of plasma facing components.

  8. A comparison of fuzzy logic and cluster renewal approaches for heat transfer modeling in a 1296 t/h CFB boiler with low level of flue gas recirculation

    NASA Astrophysics Data System (ADS)

    Błaszczuk, Artur; Krzywański, Jarosław

    2017-03-01

    The interrelation between fuzzy logic and cluster renewal approaches for heat transfer modeling in a circulating fluidized bed (CFB) has been established based on a local furnace data. The furnace data have been measured in a 1296 t/h CFB boiler with low level of flue gas recirculation. In the present study, the bed temperature and suspension density were treated as experimental variables along the furnace height. The measured bed temperature and suspension density were varied in the range of 1131-1156 K and 1.93-6.32 kg/m3, respectively. Using the heat transfer coefficient for commercial CFB combustor, two empirical heat transfer correlation were developed in terms of important operating parameters including bed temperature and also suspension density. The fuzzy logic results were found to be in good agreement with the corresponding experimental heat transfer data obtained based on cluster renewal approach. The predicted bed-to-wall heat transfer coefficient covered a range of 109-241 W/(m2K) and 111-240 W/(m2K), for fuzzy logic and cluster renewal approach respectively. The divergence in calculated heat flux recovery along the furnace height between fuzzy logic and cluster renewal approach did not exceeded ±2%.

  9. Using Design-Based Latent Growth Curve Modeling with Cluster-Level Predictor to Address Dependency

    ERIC Educational Resources Information Center

    Wu, Jiun-Yu; Kwok, Oi-Man; Willson, Victor L.

    2014-01-01

    The authors compared the effects of using the true Multilevel Latent Growth Curve Model (MLGCM) with single-level regular and design-based Latent Growth Curve Models (LGCM) with or without the higher-level predictor on various criterion variables for multilevel longitudinal data. They found that random effect estimates were biased when the…

  10. Clustering Of Left Ventricular Wall Motion Patterns

    NASA Astrophysics Data System (ADS)

    Bjelogrlic, Z.; Jakopin, J.; Gyergyek, L.

    1982-11-01

    A method for detection of wall regions with similar motion was presented. A model based on local direction information was used to measure the left ventricular wall motion from cineangiographic sequence. Three time functions were used to define segmental motion patterns: distance of a ventricular contour segment from the mean contour, the velocity of a segment and its acceleration. Motion patterns were clustered by the UPGMA algorithm and by an algorithm based on K-nearest neighboor classification rule.

  11. Atomistic modeling of dropwise condensation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sikarwar, B. S., E-mail: bssikarwar@amity.edu; Singh, P. L.; Muralidhar, K.

    The basic aim of the atomistic modeling of condensation of water is to determine the size of the stable cluster and connect phenomena occurring at atomic scale to the macroscale. In this paper, a population balance model is described in terms of the rate equations to obtain the number density distribution of the resulting clusters. The residence time is taken to be large enough so that sufficient time is available for all the adatoms existing in vapor-phase to loose their latent heat and get condensed. The simulation assumes clusters of a given size to be formed from clusters of smallermore » sizes, but not by the disintegration of the larger clusters. The largest stable cluster size in the number density distribution is taken to be representative of the minimum drop radius formed in a dropwise condensation process. A numerical confirmation of this result against predictions based on a thermodynamic model has been obtained. Results show that the number density distribution is sensitive to the surface diffusion coefficient and the rate of vapor flux impinging on the substrate. The minimum drop radius increases with the diffusion coefficient and the impinging vapor flux; however, the dependence is weak. The minimum drop radius predicted from thermodynamic considerations matches the prediction of the cluster model, though the former does not take into account the effect of the surface properties on the nucleation phenomena. For a chemically passive surface, the diffusion coefficient and the residence time are dependent on the surface texture via the coefficient of friction. Thus, physical texturing provides a means of changing, within limits, the minimum drop radius. The study reveals that surface texturing at the scale of the minimum drop radius does not provide controllability of the macro-scale dropwise condensation at large timescales when a dynamic steady-state is reached.« less

  12. A General Class of Signed Rank Tests for Clustered Data when the Cluster Size is Potentially Informative

    PubMed Central

    Datta, Somnath; Nevalainen, Jaakko; Oja, Hannu

    2012-01-01

    SUMMARY Rank based tests are alternatives to likelihood based tests popularized by their relative robustness and underlying elegant mathematical theory. There has been a serge in research activities in this area in recent years since a number of researchers are working to develop and extend rank based procedures to clustered dependent data which include situations with known correlation structures (e.g., as in mixed effects models) as well as more general form of dependence. The purpose of this paper is to test the symmetry of a marginal distribution under clustered data. However, unlike most other papers in the area, we consider the possibility that the cluster size is a random variable whose distribution is dependent on the distribution of the variable of interest within a cluster. This situation typically arises when the clusters are defined in a natural way (e.g., not controlled by the experimenter or statistician) and in which the size of the cluster may carry information about the distribution of data values within a cluster. Under the scenario of an informative cluster size, attempts to use some form of variance adjusted sign or signed rank tests would fail since they would not maintain the correct size under the distribution of marginal symmetry. To overcome this difficulty Datta and Satten (2008; Biometrics, 64, 501–507) proposed a Wilcoxon type signed rank test based on the principle of within cluster resampling. In this paper we study this problem in more generality by introducing a class of valid tests employing a general score function. Asymptotic null distribution of these tests is obtained. A simulation study shows that a more general choice of the score function can sometimes result in greater power than the Datta and Satten test; furthermore, this development offers the user a wider choice. We illustrate our tests using a real data example on spinal cord injury patients. PMID:23074359

  13. A General Class of Signed Rank Tests for Clustered Data when the Cluster Size is Potentially Informative.

    PubMed

    Datta, Somnath; Nevalainen, Jaakko; Oja, Hannu

    2012-09-01

    Rank based tests are alternatives to likelihood based tests popularized by their relative robustness and underlying elegant mathematical theory. There has been a serge in research activities in this area in recent years since a number of researchers are working to develop and extend rank based procedures to clustered dependent data which include situations with known correlation structures (e.g., as in mixed effects models) as well as more general form of dependence.The purpose of this paper is to test the symmetry of a marginal distribution under clustered data. However, unlike most other papers in the area, we consider the possibility that the cluster size is a random variable whose distribution is dependent on the distribution of the variable of interest within a cluster. This situation typically arises when the clusters are defined in a natural way (e.g., not controlled by the experimenter or statistician) and in which the size of the cluster may carry information about the distribution of data values within a cluster.Under the scenario of an informative cluster size, attempts to use some form of variance adjusted sign or signed rank tests would fail since they would not maintain the correct size under the distribution of marginal symmetry. To overcome this difficulty Datta and Satten (2008; Biometrics, 64, 501-507) proposed a Wilcoxon type signed rank test based on the principle of within cluster resampling. In this paper we study this problem in more generality by introducing a class of valid tests employing a general score function. Asymptotic null distribution of these tests is obtained. A simulation study shows that a more general choice of the score function can sometimes result in greater power than the Datta and Satten test; furthermore, this development offers the user a wider choice. We illustrate our tests using a real data example on spinal cord injury patients.

  14. Dynamical Modeling of NGC 6397: Simulated HST Imaging

    NASA Astrophysics Data System (ADS)

    Dull, J. D.; Cohn, H. N.; Lugger, P. M.; Slavin, S. D.; Murphy, B. W.

    1994-12-01

    The proximity of NGC 6397 (2.2 kpc) provides an ideal opportunity to test current dynamical models for globular clusters with the HST Wide-Field/Planetary Camera (WFPC2)\\@. We have used a Monte Carlo algorithm to generate ensembles of simulated Planetary Camera (PC) U-band images of NGC 6397 from evolving, multi-mass Fokker-Planck models. These images, which are based on the post-repair HST-PC point-spread function, are used to develop and test analysis methods for recovering structural information from actual HST imaging. We have considered a range of exposure times up to 2.4times 10(4) s, based on our proposed HST Cycle 5 observations. Our Fokker-Planck models include energy input from dynamically-formed binaries. We have adopted a 20-group mass spectrum extending from 0.16 to 1.4 M_sun. We use theoretical luminosity functions for red giants and main sequence stars. Horizontal branch stars, blue stragglers, white dwarfs, and cataclysmic variables are also included. Simulated images are generated for cluster models at both maximal core collapse and at a post-collapse bounce. We are carrying out stellar photometry on these images using ``DAOPHOT-assisted aperture photometry'' software that we have developed. We are testing several techniques for analyzing the resulting star counts, to determine the underlying cluster structure, including parametric model fits and the nonparametric density estimation methods. Our simulated images also allow us to investigate the accuracy and completeness of methods for carrying out stellar photometry in HST Planetary Camera images of dense cluster cores.

  15. Spatial Clustering of Occupational Injuries in Communities

    PubMed Central

    Friedman, Lee; Chin, Brian; Madigan, Dana

    2015-01-01

    Objectives. Using the social-ecological model, we hypothesized that the home residences of injured workers would be clustered predictably and geographically. Methods. We linked health care and publicly available datasets by home zip code for traumatically injured workers in Illinois from 2000 to 2009. We calculated numbers and rates of injuries, determined the spatial relationships, and developed 3 models. Results. Among the 23 200 occupational injuries, 80% of cases were located in 20% of zip codes and clustered in 10 locations. After component analysis, numbers and clusters of injuries correlated directly with immigrants; injury rates inversely correlated with urban poverty. Conclusions. Traumatic occupational injuries were clustered spatially by home location of the affected workers and in a predictable way. This put an inequitable burden on communities and provided evidence for the possible value of community-based interventions for prevention of occupational injuries. Work should be included in health disparities research. Stakeholders should determine whether and how to intervene at the community level to prevent occupational injuries. PMID:25905838

  16. Global survey of star clusters in the Milky Way. VI. Age distribution and cluster formation history

    NASA Astrophysics Data System (ADS)

    Piskunov, A. E.; Just, A.; Kharchenko, N. V.; Berczik, P.; Scholz, R.-D.; Reffert, S.; Yen, S. X.

    2018-06-01

    Context. The all-sky Milky Way Star Clusters (MWSC) survey provides uniform and precise ages, along with other relevant parameters, for a wide variety of clusters in the extended solar neighbourhood. Aims: In this study we aim to construct the cluster age distribution, investigate its spatial variations, and discuss constraints on cluster formation scenarios of the Galactic disk during the last 5 Gyrs. Methods: Due to the spatial extent of the MWSC, we have considered spatial variations of the age distribution along galactocentric radius RG, and along Z-axis. For the analysis of the age distribution we used 2242 clusters, which all lie within roughly 2.5 kpc of the Sun. To connect the observed age distribution to the cluster formation history we built an analytical model based on simple assumptions on the cluster initial mass function and on the cluster mass-lifetime relation, fit it to the observations, and determined the parameters of the cluster formation law. Results: Comparison with the literature shows that earlier results strongly underestimated the number of evolved clusters with ages t ≳ 100 Myr. Recent studies based on all-sky catalogues agree better with our data, but still lack the oldest clusters with ages t ≳ 1 Gyr. We do not observe a strong variation in the age distribution along RG, though we find an enhanced fraction of older clusters (t > 1 Gyr) in the inner disk. In contrast, the distribution strongly varies along Z. The high altitude distribution practically does not contain clusters with t < 1 Gyr. With simple assumptions on the cluster formation history, the cluster initial mass function and the cluster lifetime we can reproduce the observations. The cluster formation rate and the cluster lifetime are strongly degenerate, which does not allow us to disentangle different formation scenarios. In all cases the cluster formation rate is strongly declining with time, and the cluster initial mass function is very shallow at the high mass end.

  17. Exploring the atomic structure of 1.8nm monolayer-protected gold clusters with aberration-corrected STEM.

    PubMed

    Liu, Jian; Jian, Nan; Ornelas, Isabel; Pattison, Alexander J; Lahtinen, Tanja; Salorinne, Kirsi; Häkkinen, Hannu; Palmer, Richard E

    2017-05-01

    Monolayer-protected (MP) Au clusters present attractive quantum systems with a range of potential applications e.g. in catalysis. Knowledge of the atomic structure is needed to obtain a full understanding of their intriguing physical and chemical properties. Here we employed aberration-corrected scanning transmission electron microscopy (ac-STEM), combined with multislice simulations, to make a round-robin investigation of the atomic structure of chemically synthesised clusters with nominal composition Au 144 (SCH 2 CH 2 Ph) 60 provided by two different research groups. The MP Au clusters were "weighed" by the atom counting method, based on their integrated intensities in the high angle annular dark field (HAADF) regime and calibrated exponent of the Z dependence. For atomic structure analysis, we compared experimental images of hundreds of clusters, with atomic resolution, against a variety of structural models. Across the size range 123-151 atoms, only 3% of clusters matched the theoretically predicted Au 144 (SR) 60 structure, while a large proportion of the clusters were amorphous (i.e. did not match any model structure). However, a distinct ring-dot feature, characteristic of local icosahedral symmetry, was observed in about 20% of the clusters. Copyright © 2017. Published by Elsevier B.V.

  18. LoCuSS: connecting the dominance and shape of brightest cluster galaxies with the assembly history of massive clusters

    NASA Astrophysics Data System (ADS)

    Smith, Graham P.; Khosroshahi, Habib G.; Dariush, A.; Sanderson, A. J. R.; Ponman, T. J.; Stott, J. P.; Haines, C. P.; Egami, E.; Stark, D. P.

    2010-11-01

    We study the luminosity gap, Δm12, between the first- and second-ranked galaxies in a sample of 59 massive (~1015Msolar) galaxy clusters, using data from the Hale Telescope, the Hubble Space Telescope, Chandra and Spitzer. We find that the Δm12 distribution, p(Δm12), is a declining function of Δm12 to which we fitted a straight line: p(Δm12) ~ -(0.13 +/- 0.02)Δm12. The fraction of clusters with `large' luminosity gaps is p(Δm12 >= 1) = 0.37 +/- 0.08, which represents a 3σ excess over that obtained from Monte Carlo simulations of a Schechter function that matches the mean cluster galaxy luminosity function. We also identify four clusters with `extreme' luminosity gaps, Δm12 >= 2, giving a fraction of . More generally, large luminosity gap clusters are relatively homogeneous, with elliptical/discy brightest cluster galaxies (BCGs), cuspy gas density profiles (i.e. strong cool cores), high concentrations and low substructure fractions. In contrast, small luminosity gap clusters are heterogeneous, spanning the full range of boxy/elliptical/discy BCG morphologies, the full range of cool core strengths and dark matter concentrations, and have large substructure fractions. Taken together, these results imply that the amplitude of the luminosity gap is a function of both the formation epoch and the recent infall history of the cluster. `BCG dominance' is therefore a phase that a cluster may evolve through and is not an evolutionary `cul-de-sac'. We also compare our results with semi-analytic model predictions based on the Millennium Simulation. None of the models is able to reproduce all of the observational results on Δm12, underlining the inability of the current generation of models to match the empirical properties of BCGs. We identify the strength of active galactic nucleus feedback and the efficiency with which cluster galaxies are replenished after they merge with the BCG in each model as possible causes of these discrepancies.

  19. Gravitational redshift of galaxies in clusters as predicted by general relativity.

    PubMed

    Wojtak, Radosław; Hansen, Steen H; Hjorth, Jens

    2011-09-28

    The theoretical framework of cosmology is mainly defined by gravity, of which general relativity is the current model. Recent tests of general relativity within the Lambda Cold Dark Matter (ΛCDM) model have found a concordance between predictions and the observations of the growth rate and clustering of the cosmic web. General relativity has not hitherto been tested on cosmological scales independently of the assumptions of the ΛCDM model. Here we report an observation of the gravitational redshift of light coming from galaxies in clusters at the 99 per cent confidence level, based on archival data. Our measurement agrees with the predictions of general relativity and its modification created to explain cosmic acceleration without the need for dark energy (the f(R) theory), but is inconsistent with alternative models designed to avoid the presence of dark matter. © 2011 Macmillan Publishers Limited. All rights reserved

  20. Fuzzy cluster analysis of high-field functional MRI data.

    PubMed

    Windischberger, Christian; Barth, Markus; Lamm, Claus; Schroeder, Lee; Bauer, Herbert; Gur, Ruben C; Moser, Ewald

    2003-11-01

    Functional magnetic resonance imaging (fMRI) based on blood-oxygen level dependent (BOLD) contrast today is an established brain research method and quickly gains acceptance for complementary clinical diagnosis. However, neither the basic mechanisms like coupling between neuronal activation and haemodynamic response are known exactly, nor can the various artifacts be predicted or controlled. Thus, modeling functional signal changes is non-trivial and exploratory data analysis (EDA) may be rather useful. In particular, identification and separation of artifacts as well as quantification of expected, i.e. stimulus correlated, and novel information on brain activity is important for both, new insights in neuroscience and future developments in functional MRI of the human brain. After an introduction on fuzzy clustering and very high-field fMRI we present several examples where fuzzy cluster analysis (FCA) of fMRI time series helps to identify and locally separate various artifacts. We also present and discuss applications and limitations of fuzzy cluster analysis in very high-field functional MRI: differentiate temporal patterns in MRI using (a) a test object with static and dynamic parts, (b) artifacts due to gross head motion artifacts. Using a synthetic fMRI data set we quantitatively examine the influences of relevant FCA parameters on clustering results in terms of receiver-operator characteristics (ROC) and compare them with a commonly used model-based correlation analysis (CA) approach. The application of FCA in analyzing in vivo fMRI data is shown for (a) a motor paradigm, (b) data from multi-echo imaging, and (c) a fMRI study using mental rotation of three-dimensional cubes. We found that differentiation of true "neural" from false "vascular" activation is possible based on echo time dependence and specific activation levels, as well as based on their signal time-course. Exploratory data analysis methods in general and fuzzy cluster analysis in particular may help to identify artifacts and add novel and unexpected information valuable for interpretation, classification and characterization of functional MRI data which can be used to design new data acquisition schemes, stimulus presentations, neuro(physio)logical paradigms, as well as to improve quantitative biophysical models.

  1. STAR CLUSTERS IN M33: UPDATED UBVRI PHOTOMETRY, AGES, METALLICITIES, AND MASSES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fan, Zhou; De Grijs, Richard, E-mail: zfan@bao.ac.cn, E-mail: grijs@pku.edu.cn

    2014-04-01

    The photometric characterization of M33 star clusters is far from complete. In this paper, we present homogeneous UBVRI photometry of 708 star clusters and cluster candidates in M33 based on archival images from the Local Group Galaxies Survey, which covers 0.8 deg{sup 2} along the galaxy's major axis. Our photometry includes 387, 563, 616, 580, and 478 objects in the UBVRI bands, respectively, of which 276, 405, 430, 457, and 363 do not have previously published UBVRI photometry. Our photometry is consistent with previous measurements (where available) in all filters. We adopted Sloan Digital Sky Survey ugriz photometry for complementarymore » purposes, as well as Two Micron All Sky Survey near-infrared JHK photometry where available. We fitted the spectral-energy distributions of 671 star clusters and candidates to derive their ages, metallicities, and masses based on the updated PARSEC simple stellar populations synthesis models. The results of our χ{sup 2} minimization routines show that only 205 of the 671 clusters (31%) are older than 2 Gyr, which represents a much smaller fraction of the cluster population than that in M31 (56%), suggesting that M33 is dominated by young star clusters (<1 Gyr). We investigate the mass distributions of the star clusters—both open and globular clusters—in M33, M31, the Milky Way, and the Large Magellanic Cloud. Their mean values are log (M {sub cl}/M {sub ☉}) = 4.25, 5.43, 2.72, and 4.18, respectively. The fraction of open to globular clusters is highest in the Milky Way and lowest in M31. Our comparisons of the cluster ages, masses, and metallicities show that our results are basically in agreement with previous studies (where objects in common are available); differences can be traced back to differences in the models adopted, the fitting methods used, and stochastic sampling effects.« less

  2. Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization.

    PubMed

    Mitra, Adway; Biswas, Soma; Bhattacharyya, Chiranjib

    2017-03-01

    A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.

  3. Implicit Priors in Galaxy Cluster Mass and Scaling Relation Determinations

    NASA Technical Reports Server (NTRS)

    Mantz, A.; Allen, S. W.

    2011-01-01

    Deriving the total masses of galaxy clusters from observations of the intracluster medium (ICM) generally requires some prior information, in addition to the assumptions of hydrostatic equilibrium and spherical symmetry. Often, this information takes the form of particular parametrized functions used to describe the cluster gas density and temperature profiles. In this paper, we investigate the implicit priors on hydrostatic masses that result from this fully parametric approach, and the implications of such priors for scaling relations formed from those masses. We show that the application of such fully parametric models of the ICM naturally imposes a prior on the slopes of the derived scaling relations, favoring the self-similar model, and argue that this prior may be influential in practice. In contrast, this bias does not exist for techniques which adopt an explicit prior on the form of the mass profile but describe the ICM non-parametrically. Constraints on the slope of the cluster mass-temperature relation in the literature show a separation based the approach employed, with the results from fully parametric ICM modeling clustering nearer the self-similar value. Given that a primary goal of scaling relation analyses is to test the self-similar model, the application of methods subject to strong, implicit priors should be avoided. Alternative methods and best practices are discussed.

  4. Modified multidimensional scaling approach to analyze financial markets.

    PubMed

    Yin, Yi; Shang, Pengjian

    2014-06-01

    Detrended cross-correlation coefficient (σDCCA) and dynamic time warping (DTW) are introduced as the dissimilarity measures, respectively, while multidimensional scaling (MDS) is employed to translate the dissimilarities between daily price returns of 24 stock markets. We first propose MDS based on σDCCA dissimilarity and MDS based on DTW dissimilarity creatively, while MDS based on Euclidean dissimilarity is also employed to provide a reference for comparisons. We apply these methods in order to further visualize the clustering between stock markets. Moreover, we decide to confront MDS with an alternative visualization method, "Unweighed Average" clustering method, for comparison. The MDS analysis and "Unweighed Average" clustering method are employed based on the same dissimilarity. Through the results, we find that MDS gives us a more intuitive mapping for observing stable or emerging clusters of stock markets with similar behavior, while the MDS analysis based on σDCCA dissimilarity can provide more clear, detailed, and accurate information on the classification of the stock markets than the MDS analysis based on Euclidean dissimilarity. The MDS analysis based on DTW dissimilarity indicates more knowledge about the correlations between stock markets particularly and interestingly. Meanwhile, it reflects more abundant results on the clustering of stock markets and is much more intensive than the MDS analysis based on Euclidean dissimilarity. In addition, the graphs, originated from applying MDS methods based on σDCCA dissimilarity and DTW dissimilarity, may also guide the construction of multivariate econometric models.

  5. Net-zero Building Cluster Simulations and On-line Energy Forecasting for Adaptive and Real-Time Control and Decisions

    NASA Astrophysics Data System (ADS)

    Li, Xiwang

    Buildings consume about 41.1% of primary energy and 74% of the electricity in the U.S. Moreover, it is estimated by the National Energy Technology Laboratory that more than 1/4 of the 713 GW of U.S. electricity demand in 2010 could be dispatchable if only buildings could respond to that dispatch through advanced building energy control and operation strategies and smart grid infrastructure. In this study, it is envisioned that neighboring buildings will have the tendency to form a cluster, an open cyber-physical system to exploit the economic opportunities provided by a smart grid, distributed power generation, and storage devices. Through optimized demand management, these building clusters will then reduce overall primary energy consumption and peak time electricity consumption, and be more resilient to power disruptions. Therefore, this project seeks to develop a Net-zero building cluster simulation testbed and high fidelity energy forecasting models for adaptive and real-time control and decision making strategy development that can be used in a Net-zero building cluster. The following research activities are summarized in this thesis: 1) Development of a building cluster emulator for building cluster control and operation strategy assessment. 2) Development of a novel building energy forecasting methodology using active system identification and data fusion techniques. In this methodology, a systematic approach for building energy system characteristic evaluation, system excitation and model adaptation is included. The developed methodology is compared with other literature-reported building energy forecasting methods; 3) Development of the high fidelity on-line building cluster energy forecasting models, which includes energy forecasting models for buildings, PV panels, batteries and ice tank thermal storage systems 4) Small scale real building validation study to verify the performance of the developed building energy forecasting methodology. The outcomes of this thesis can be used for building cluster energy forecasting model development and model based control and operation optimization. The thesis concludes with a summary of the key outcomes of this research, as well as a list of recommendations for future work.

  6. Nonthermal emission from clusters of galaxies

    NASA Astrophysics Data System (ADS)

    Kushnir, Doron; Waxman, Eli

    2009-08-01

    We show that the spectral and radial distribution of the nonthermal emission of massive, M gtrsim 1014.5Msun, galaxy clusters may be approximately described by simple analytic expressions, which depend on the cluster thermal X-ray properties and on two model parameter, βcore and ηe. βcore is the ratio of the cosmic-ray (CR) energy density (within a logarithmic CR energy interval) and the thermal energy density at the cluster core, and ηe(p) is the fraction of the thermal energy generated in strong collisionless shocks, which is deposited in CR electrons (protons). Using a simple analytic model for the evolution of intra-cluster medium CRs, which are produced by accretion shocks, we find that βcore simeq ηp/200, nearly independent of cluster mass and with a scatter Δln βcore simeq 1 between clusters of given mass. We show that the hard X-ray (HXR) and γ-ray luminosities produced by inverse Compton scattering of CMB photons by electrons accelerated in accretion shocks (primary electrons) exceed the luminosities produced by secondary particles (generated in hadronic interactions within the cluster) by factors simeq 500(ηe/ηp)(T/10 keV)-1/2 and simeq 150(ηe/ηp)(T/10 keV)-1/2 respectively, where T is the cluster temperature. Secondary particle emission may dominate at the radio and very high energy (gtrsim 1 TeV) γ-ray bands. Our model predicts, in contrast with some earlier work, that the HXR and γ-ray emission from clusters of galaxies are extended, since the emission is dominated at these energies by primary (rather than by secondary) electrons. Our predictions are consistent with the observed nonthermal emission of the Coma cluster for ηp ~ ηe ~ 0.1. The implications of our predictions to future HXR observations (e.g. by NuStar, Simbol-X) and to (space/ground based) γ-ray observations (e.g. by Fermi, HESS, MAGIC, VERITAS) are discussed. In particular, we identify the clusters which are the best candidates for detection in γ-rays. Finally, we show that our model's results agree with results of detailed numerical calculations, and that discrepancies between the results of various numerical simulations (and between such results and our model) are due to inaccuracies in the numerical calculations.

  7. A computational microscopy study of nanostructural evolution in irradiated pressure vessel steels

    NASA Astrophysics Data System (ADS)

    Odette, G. R.; Wirth, B. D.

    1997-11-01

    Nanostructural features that form in reactor pressure vessel steels under neutron irradiation at around 300°C lead to significant hardening and embrittlement. Continuum thermodynamic-kinetic based rate theories have been very successful in modeling the general characteristics of the copper and manganese nickel rich precipitate evolution, often the dominant source of embrittlement. However, a more detailed atomic scale understanding of these features is needed to interpret experimental measurements and better underpin predictive embrittlement models. Further, other embrittling features, believed to be subnanometer defect (vacancy)-solute complexes and small regions of modest enrichment of solutes are not well understood. A general approach to modeling embrittlement nanostructures, based on the concept of a computational microscope, is described. The objective of the computational microscope is to self-consistently integrate atomic scale simulations with other sources of information, including a wide range of experiments. In this work, lattice Monte Carlo (LMC) simulations are used to resolve the chemically and structurally complex nature of CuMnNiSi precipitates. The LMC simulations unify various nanoscale analytical characterization methods and basic thermodynamics. The LMC simulations also reveal that significant coupled vacancy and solute clustering takes place during cascade aging. The cascade clustering produces the metastable vacancy-cluster solute complexes that mediate flux effects. Cascade solute clustering may also play a role in the formation of dilute atmospheres of solute enrichment and enhance the nucleation of manganese-nickel rich precipitates at low Cu levels. Further, the simulations suggest that complex, highly correlated processes (e.g. cluster diffusion, formation of favored vacancy diffusion paths and solute scavenging vacancy cluster complexes) may lead to anomalous fast thermal aging kinetics at temperatures below about 450°C. The potential technical significance of these phenomena is described.

  8. THE YOUNG OPEN CLUSTERS KING 12, NGC 7788, AND NGC 7790: PRE-MAIN-SEQUENCE STARS AND EXTENDED STELLAR HALOS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davidge, T. J.

    2012-12-20

    The stellar contents of the open clusters King 12, NGC 7788, and NGC 7790 are investigated using MegaCam images. Comparisons with isochrones yield an age <20 Myr for King 12, 20-40 Myr for NGC 7788, and 60-80 Myr for NGC 7790 based on the properties of stars near the main-sequence turnoff (MSTO) in each cluster. The reddening of NGC 7788 is much larger than previously estimated. The luminosity functions (LFs) of King 12 and NGC 7788 show breaks that are attributed to the onset of pre-main-sequence (PMS) objects, and comparisons with models of PMS evolution yield ages that are consistentmore » with those measured from stars near the MSTO. In contrast, the r' LF of main-sequence stars in NGC 7790 is matched to r' = 20 by a model that is based on the solar neighborhood mass function. The structural properties of all three clusters are investigated by examining the two-point angular correlation function of blue main-sequence stars. King 12 and NGC 7788 are each surrounded by a stellar halo that extends out to a radius of 5 arcmin ({approx}3.4 pc). It is suggested that these halos form in response to large-scale mass ejection early in the evolution of the clusters, as predicted by models. In contrast, blue main-sequence stars in NGC 7790 are traced out to a radius of {approx}7.5 arcmin ({approx}5.5 pc), with no evidence of a halo. It is suggested that all three clusters may have originated in the same star-forming complex, but not in the same giant molecular cloud.« less

  9. Accurate recapture identification for genetic mark–recapture studies with error-tolerant likelihood-based match calling and sample clustering

    USGS Publications Warehouse

    Sethi, Suresh; Linden, Daniel; Wenburg, John; Lewis, Cara; Lemons, Patrick R.; Fuller, Angela K.; Hare, Matthew P.

    2016-01-01

    Error-tolerant likelihood-based match calling presents a promising technique to accurately identify recapture events in genetic mark–recapture studies by combining probabilities of latent genotypes and probabilities of observed genotypes, which may contain genotyping errors. Combined with clustering algorithms to group samples into sets of recaptures based upon pairwise match calls, these tools can be used to reconstruct accurate capture histories for mark–recapture modelling. Here, we assess the performance of a recently introduced error-tolerant likelihood-based match-calling model and sample clustering algorithm for genetic mark–recapture studies. We assessed both biallelic (i.e. single nucleotide polymorphisms; SNP) and multiallelic (i.e. microsatellite; MSAT) markers using a combination of simulation analyses and case study data on Pacific walrus (Odobenus rosmarus divergens) and fishers (Pekania pennanti). A novel two-stage clustering approach is demonstrated for genetic mark–recapture applications. First, repeat captures within a sampling occasion are identified. Subsequently, recaptures across sampling occasions are identified. The likelihood-based matching protocol performed well in simulation trials, demonstrating utility for use in a wide range of genetic mark–recapture studies. Moderately sized SNP (64+) and MSAT (10–15) panels produced accurate match calls for recaptures and accurate non-match calls for samples from closely related individuals in the face of low to moderate genotyping error. Furthermore, matching performance remained stable or increased as the number of genetic markers increased, genotyping error notwithstanding.

  10. Mutants of the Base Excision Repair Glycosylase, Endonuclease III: DNA Charge Transport as a First Step in Lesion Detection

    PubMed Central

    Romano, Christine A.; Sontz, Pamela A.; Barton, Jacqueline K.

    2011-01-01

    Endonuclease III (EndoIII) is a base excision repair glycosylase that targets damaged pyrimidines and contains a [4Fe-4S] cluster. We have proposed a model where BER proteins that contain redox-active [4Fe-4S] clusters utilize DNA charge transport (CT) as a first step in the detection of DNA lesions. Here, several mutants of EndoIII were prepared to probe their efficiency of DNA/protein charge transport. Cyclic voltammetry experiments on DNA-modified electrodes show that aromatic residues F30, Y55, Y75 and Y82 help mediate charge transport between DNA and the [4Fe-4S] cluster. Based on circular dichroism studies to measure protein stability, mutations at residues W178 and Y185 are found to destabilize the protein; these residues may function to protect the [4Fe-4S] cluster. Atomic force microscopy studies furthermore reveal a correlation in the ability of mutants to carry out protein/DNA CT and their ability to relocalize onto DNA strands containing a single base mismatch; EndoIII mutants that are defective in carrying out DNA/protein CT do not redistribute onto mismatch-containing strands, consistent with our model. These results demonstrate a link between the ability of the repair protein to carry out DNA CT and its ability to relocalize near lesions, thus pointing to DNA CT as a key first step in the detection of base damage in the genome. PMID:21651304

  11. Modeling spatio-temporal wildfire ignition point patterns

    Treesearch

    Amanda S. Hering; Cynthia L. Bell; Marc G. Genton

    2009-01-01

    We analyze and model the structure of spatio-temporal wildfire ignitions in the St. Johns River Water Management District in northeastern Florida. Previous studies, based on the K-function and an assumption of homogeneity, have shown that wildfire events occur in clusters. We revisit this analysis based on an inhomogeneous K-...

  12. Portfolio Decisions and Brain Reactions via the CEAD method.

    PubMed

    Majer, Piotr; Mohr, Peter N C; Heekeren, Hauke R; Härdle, Wolfgang K

    2016-09-01

    Decision making can be a complex process requiring the integration of several attributes of choice options. Understanding the neural processes underlying (uncertain) investment decisions is an important topic in neuroeconomics. We analyzed functional magnetic resonance imaging (fMRI) data from an investment decision study for stimulus-related effects. We propose a new technique for identifying activated brain regions: cluster, estimation, activation, and decision method. Our analysis is focused on clusters of voxels rather than voxel units. Thus, we achieve a higher signal-to-noise ratio within the unit tested and a smaller number of hypothesis tests compared with the often used General Linear Model (GLM). We propose to first conduct the brain parcellation by applying spatially constrained spectral clustering. The information within each cluster can then be extracted by the flexible dynamic semiparametric factor model (DSFM) dimension reduction technique and finally be tested for differences in activation between conditions. This sequence of Cluster, Estimation, Activation, and Decision admits a model-free analysis of the local fMRI signal. Applying a GLM on the DSFM-based time series resulted in a significant correlation between the risk of choice options and changes in fMRI signal in the anterior insula and dorsomedial prefrontal cortex. Additionally, individual differences in decision-related reactions within the DSFM time series predicted individual differences in risk attitudes as modeled with the framework of the mean-variance model.

  13. Getting Ready for School: Palm Beach County's Early Childhood Cluster Initiative

    ERIC Educational Resources Information Center

    Spielberger, Julie; Baker, Stephen; Winje, Carolyn

    2008-01-01

    This publication reports findings from the second year of an implementation study of the Early Childhood Cluster Initiative (ECCI). ECCI is a prekindergarten program in ten elementary schools and a community child care center in Palm Beach County, based on the design of the High/Scope Perry Preschool model. The initiative is characterized by low…

  14. The PG-TRAK Manual: Using PGCC's Custom Lifestyle Cluster System. Market Analysis MA91-3.

    ERIC Educational Resources Information Center

    Boughan, Karl

    In early 1990, Prince George's Community College (PGCC), in response to declining enrollments, developed an affordable and locally effective geo-demographic cluster system for meeting the college's research and marketing needs. The system, dubbed "PG-TRAK," is based on a model developed 15 years ago as a corporate marketing tool, and involves…

  15. Age determination of 15 old to intermediate-age small Magellanic cloud star clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parisi, M. C.; Clariá, J. J.; Piatti, A. E.

    2014-04-01

    We present color-magnitude diagrams in the V and I bands for 15 star clusters in the Small Magellanic Cloud (SMC) based on data taken with the Very Large Telescope (VLT, Chile). We selected these clusters from our previous work, wherein we derived cluster radial velocities and metallicities from calcium II infrared triplet (CaT) spectra also taken with the VLT. We discovered that the ages of six of our clusters have been appreciably underestimated by previous studies, which used comparatively small telescopes, graphically illustrating the need for large apertures to obtain reliable ages of old and intermediate-age SMC star clusters. Inmore » particular, three of these clusters, L4, L6, and L110, turn out to be among the oldest SMC clusters known, with ages of 7.9 ± 1.1, 8.7 ± 1.2, and 7.6 ± 1.0 Gyr, respectively, helping to fill a possible 'SMC cluster age gap'. Using the current ages and metallicities from Parisi et al., we analyze the age distribution, age gradient, and age-metallicity relation (AMR) of a sample of SMC clusters measured homogeneously. There is a suggestion of bimodality in the age distribution but it does not show a constant slope for the first 4 Gyr, and we find no evidence for an age gradient. Due to the improved ages of our cluster sample, we find that our AMR is now better represented in the intermediate/old period than we had derived in Parisi et al., where we simply took ages available in the literature. Additionally, clusters younger than ∼4 Gyr now show better agreement with the bursting model of Pagel and Tautvaišienė, but we confirm that this model is not a good representation of the AMR during the intermediate/old period. A more complicated model is needed to explain the SMC chemical evolution in that period.« less

  16. Cosmological constraints from X-ray all sky surveys, from CODEX to eROSITA

    NASA Astrophysics Data System (ADS)

    Finoguenov, A.

    2017-10-01

    Large area cluster cosmology has long become a multiwavelength discipline. Understanding the effect of various selections is currently the main path to improving on the validity of cluster cosmological results. Many of these results are based on the large area sample derived from RASS data. We perform wavelet detection of X-ray sources and make extensive simulations of the detection of clusters in the RASS data. We assign an optical richness to each of the 25,000 detected X-ray sources in the 10,000 square degrees of SDSS BOSS area. We show that there is no obvious separation of sources on galaxy clusters and AGN, based on distribution of systems on their richness. We conclude that previous catalogs, such as MACS, REFLEX are all subject to a complex optical selection function, in addition to an X-ray selection. We provide a complete model of identification of cluster counts are galaxy clusters, which includes chance identification, effect of AGN halo occupation distribution and the thermal emission of ICM. Finally we present the cosmological results obtained using this sample.

  17. The formation and evolution of M33 as revealed by its star clusters

    NASA Astrophysics Data System (ADS)

    San Roman, Izaskun

    2012-03-01

    Numerical simulations based on the Lambda-Cold Dark Matter (Λ-CDM) model predict a scenario consistent with observational evidence in terms of the build-up of Milky Way-like halos. Under this scenario, large disk galaxies derive from the merger and accretion of many smaller subsystems. However, it is less clear how low-mass spiral galaxies fit into this picture. The best way to answer this question is to study the nearest example of a dwarf spiral galaxy, M33. We will use star clusters to understand the structure, kinematics and stellar populations of this galaxy. Star clusters provide a unique and powerful tool for studying the star formation histories of galaxies. In particular, the ages and metallicities of star clusters bear the imprint of the galaxy formation process. We have made use of the star clusters to uncover the formation and evolution of M33. In this dissertation, we have carried out a comprehensive study of the M33 star cluster system, including deep photometry as well as high signal-to-noise spectroscopy. In order to mitigate the significant incompleteness presents in previous catalogs, we have conducted ground-based and space-based photometric surveys of M33 star clusters. Using archival images, we have analyzed 12 fields using the Advanced Camera for Surveys Wide Field Channel onboard the Hubble Space Telescope (ACS/HST) along the major axis of the galaxy. We present integrated photometry and color-magnitude diagrams for 161 star clusters in M33, of which 115 were previously uncataloged. This survey extends the depth of the existing M33 cluster catalogs by ˜ 1 mag. We have expanded our search through a photometric survey in a 1° x 1° area centered on M33 using the MegaCam camera on the 3.6m Canada-France-Hawaii Telescope (CFHT). In this work we discuss the photometric properties of the sample, including color-color diagrams of 599 new candidate stellar clusters, and 204 confirmed clusters. Comparisons with models of simple stellar populations suggest a large range of ages some as old as ˜ 10 Gyr. In addition, we find in the color-color diagrams a significant population of very young clusters (< 10 Myr) possessing nebular emission. Analysis of the radial density distribution suggests that the cluster system of M33 has suffered from significant depletion, possibly due to interactions with M31. To further understand the properties of M33 star clusters, we have carried out a morphological study 161 star clusters in M33 using ACS/HST images. We have obtained, for the first time, ellipticities, position angles, and surface brightness profiles of a statistically significant number of clusters. Ellipticities show that, on average, M33 clusters are more flattened than those of the Milky Way and M31, and more similar to clusters in the Small Magellanic Cloud. The ellipticities do not show any correlation with age or mass, suggesting that rotation is not the main cause of elongation in the M33 clusters. The position angles of the clusters show a bimodality with a strong peak perpendicular to the position angle of the galaxy. These results support the notion that tidal forces are the reason for the cluster flattening. We have fit analytical models to the surface brightness profiles, and derived structural parameters. The overall analysis shows several differences between the structural properties of the M33 cluster system and cluster systems in nearby galaxies. Finally, we have performed a spectroscopic study of star clusters in the above mentioned catalog. We present high-precision velocity measures of 45 star clusters, based on observations from the 10.4m Gran Telescopio Canarias (GTC) using OSIRIS and 4.2m William Herschel Telescope (WHT) using WYFFOS. All the clusters have been previously confirmed using HST imaging, and ages and integrated photometry are known. The velocity of the clusters with respect to local disk motion increases with age for young and intermediate clusters. The mean dispersion velocity for the intermediate age clusters in our sample is significantly larger than in previous studies. Analysis of these velocities along the major axis of the galaxy show no net rotation of the intermediate age subsample. The small number of old clusters in our sample does not allow for any conclusive evidence in that age division.

  18. The Mass Function of Abell Clusters

    NASA Astrophysics Data System (ADS)

    Chen, J.; Huchra, J. P.; McNamara, B. R.; Mader, J.

    1998-12-01

    The velocity dispersion and mass functions for rich clusters of galaxies provide important constraints on models of the formation of Large-Scale Structure (e.g., Frenk et al. 1990). However, prior estimates of the velocity dispersion or mass function for galaxy clusters have been based on either very small samples of clusters (Bahcall and Cen 1993; Zabludoff et al. 1994) or large but incomplete samples (e.g., the Girardi et al. (1998) determination from a sample of clusters with more than 30 measured galaxy redshifts). In contrast, we approach the problem by constructing a volume-limited sample of Abell clusters. We collected individual galaxy redshifts for our sample from two major galaxy velocity databases, the NASA Extragalactic Database, NED, maintained at IPAC, and ZCAT, maintained at SAO. We assembled a database with velocity information for possible cluster members and then selected cluster members based on both spatial and velocity data. Cluster velocity dispersions and masses were calculated following the procedures of Danese, De Zotti, and di Tullio (1980) and Heisler, Tremaine, and Bahcall (1985), respectively. The final velocity dispersion and mass functions were analyzed in order to constrain cosmological parameters by comparison to the results of N-body simulations. Our data for the cluster sample as a whole and for the individual clusters (spatial maps and velocity histograms) in our sample is available on-line at http://cfa-www.harvard.edu/ huchra/clusters. This website will be updated as more data becomes available in the master redshift compilations, and will be expanded to include more clusters and large groups of galaxies.

  19. Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

    PubMed

    Ren, Jie; Song, Kai; Deng, Minghua; Reinert, Gesine; Cannon, Charles H; Sun, Fengzhu

    2016-04-01

    Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential.A plausible model for this underlying distribution of word counts is given through modeling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution ,: using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate those using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results ,: and that the clustering results that use a N: MC of the estimated order give a plausible clustering of the species. Our implementation of the statistics developed here is available as R package 'NGS.MC' at http://www-rcf.usc.edu/∼fsun/Programs/NGS-MC/NGS-MC.html fsun@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Impact of a star formation efficiency profile on the evolution of open clusters

    NASA Astrophysics Data System (ADS)

    Shukirgaliyev, B.; Parmentier, G.; Berczik, P.; Just, A.

    2017-09-01

    Aims: We study the effect of the instantaneous expulsion of residual star-forming gas on star clusters in which the residual gas has a density profile that is shallower than that of the embedded cluster. This configuration is expected if star formation proceeds with a given star-formation efficiency per free-fall time in a centrally concentrated molecular gas clump. Methods: We performed direct N-body simulations whose initial conditions were generated by the program "mkhalo" from the package "falcON", adapted for our models. Our model clusters initially had a Plummer profile and are in virial equilibrium with the gravitational potential of the cluster-forming clump. The residual gas contribution was computed based on a local-density driven clustered star formation model. Our simulations included mass loss by stellar evolution and the tidal field of a host galaxy. Results: We find that a star cluster with a minimum global star formation efficiency (SFE) of 15 percent is able to survive instantaneous gas expulsion and to produce a bound cluster. Its violent relaxation lasts no longer than 20 Myr, independently of its global SFE and initial stellar mass. At the end of violent relaxation, the bound fractions of the surviving clusters with the same global SFEs are similar, regardless of their initial stellar mass. Their subsequent lifetime in the gravitational field of the Galaxy depends on their bound stellar masses. Conclusions: We therefore conclude that the critical SFE needed to produce a bound cluster is 15 percent, which is roughly half the earlier estimates of 33 percent. Thus we have improved the survival likelihood of young clusters after instantaneous gas expulsion. Young clusters can now survive instantaneous gas expulsion with a global SFEs as low as the SFEs observed for embedded clusters in the solar neighborhood (15-30 percent). The reason is that the star cluster density profile is steeper than that of the residual gas. However, in terms of the effective SFE, measured by the virial ratio of the cluster at gas expulsion, our results are in agreement with previous studies.

Top