NASA Astrophysics Data System (ADS)
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Elastic K-means using posterior probability.
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.
Elastic K-means using posterior probability
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model. PMID:29240756
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.
Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai
2016-03-01
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.
Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei
2013-05-01
Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.
Yin, Yihang; Liu, Fengzheng; Zhou, Xiang; Li, Quanzhong
2015-08-07
Wireless sensor networks (WSNs) have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA). First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.
Xiao, Yongling; Abrahamowicz, Michal
2010-03-30
We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
Liu, Xin
2015-10-30
In a cognitive sensor network (CSN), the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs) becomes very large. In this paper, a novel wireless power transfer (WPT)-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio frequency (RF) energy of the primary node (PN) to the cluster heads, in order to supply the electrical energy needed for sensing and cooperation. A joint resource optimization is formulated to maximize the spectrum access probability of the CSN, through jointly allocating sensing time and clustering number. According to the resource optimization results, a clustering algorithm is proposed. The simulation results have shown that compared to the traditional model, the cluster heads of the proposed model can achieve more transmission power and there exists optimal sensing time and clustering number to maximize the spectrum access probability.
Multi-mode clustering model for hierarchical wireless sensor networks
NASA Astrophysics Data System (ADS)
Hu, Xiangdong; Li, Yongfu; Xu, Huifen
2017-03-01
The topology management, i.e., clusters maintenance, of wireless sensor networks (WSNs) is still a challenge due to its numerous nodes, diverse application scenarios and limited resources as well as complex dynamics. To address this issue, a multi-mode clustering model (M2 CM) is proposed to maintain the clusters for hierarchical WSNs in this study. In particular, unlike the traditional time-trigger model based on the whole-network and periodic style, the M2 CM is proposed based on the local and event-trigger operations. In addition, an adaptive local maintenance algorithm is designed for the broken clusters in the WSNs using the spatial-temporal demand changes accordingly. Numerical experiments are performed using the NS2 network simulation platform. Results validate the effectiveness of the proposed model with respect to the network maintenance costs, node energy consumption and transmitted data as well as the network lifetime.
Clustering of change patterns using Fourier coefficients.
Kim, Jaehee; Kim, Haseong
2008-01-15
To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.
The formation of magnetic silicide Fe3Si clusters during ion implantation
NASA Astrophysics Data System (ADS)
Balakirev, N.; Zhikharev, V.; Gumarov, G.
2014-05-01
A simple two-dimensional model of the formation of magnetic silicide Fe3Si clusters during high-dose Fe ion implantation into silicon has been proposed and the cluster growth process has been computer simulated. The model takes into account the interaction between the cluster magnetization and magnetic moments of Fe atoms random walking in the implanted layer. If the clusters are formed in the presence of the external magnetic field parallel to the implanted layer, the model predicts the elongation of the growing cluster in the field direction. It has been proposed that the cluster elongation results in the uniaxial magnetic anisotropy in the plane of the implanted layer, which is observed in iron silicide films ion-beam synthesized in the external magnetic field.
Markov Chain Model-Based Optimal Cluster Heads Selection for Wireless Sensor Networks
Ahmed, Gulnaz; Zou, Jianhua; Zhao, Xi; Sadiq Fareed, Mian Muhammad
2017-01-01
The longer network lifetime of Wireless Sensor Networks (WSNs) is a goal which is directly related to energy consumption. This energy consumption issue becomes more challenging when the energy load is not properly distributed in the sensing area. The hierarchal clustering architecture is the best choice for these kind of issues. In this paper, we introduce a novel clustering protocol called Markov chain model-based optimal cluster heads (MOCHs) selection for WSNs. In our proposed model, we introduce a simple strategy for the optimal number of cluster heads selection to overcome the problem of uneven energy distribution in the network. The attractiveness of our model is that the BS controls the number of cluster heads while the cluster heads control the cluster members in each cluster in such a restricted manner that a uniform and even load is ensured in each cluster. We perform an extensive range of simulation using five quality measures, namely: the lifetime of the network, stable and unstable region in the lifetime of the network, throughput of the network, the number of cluster heads in the network, and the transmission time of the network to analyze the proposed model. We compare MOCHs against Sleep-awake Energy Efficient Distributed (SEED) clustering, Artificial Bee Colony (ABC), Zone Based Routing (ZBR), and Centralized Energy Efficient Clustering (CEEC) using the above-discussed quality metrics and found that the lifetime of the proposed model is almost 1095, 2630, 3599, and 2045 rounds (time steps) greater than SEED, ABC, ZBR, and CEEC, respectively. The obtained results demonstrate that the MOCHs is better than SEED, ABC, ZBR, and CEEC in terms of energy efficiency and the network throughput. PMID:28241492
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Lee, JongHyup; Pak, Dohyun
2016-01-01
For practical deployment of wireless sensor networks (WSN), WSNs construct clusters, where a sensor node communicates with other nodes in its cluster, and a cluster head support connectivity between the sensor nodes and a sink node. In hybrid WSNs, cluster heads have cellular network interfaces for global connectivity. However, when WSNs are active and the load of cellular networks is high, the optimal assignment of cluster heads to base stations becomes critical. Therefore, in this paper, we propose a game theoretic model to find the optimal assignment of base stations for hybrid WSNs. Since the communication and energy cost is different according to cellular systems, we devise two game models for TDMA/FDMA and CDMA systems employing power prices to adapt to the varying efficiency of recent wireless technologies. The proposed model is defined on the assumptions of the ideal sensing field, but our evaluation shows that the proposed model is more adaptive and energy efficient than local selections. PMID:27589743
Chen, Yun; Yang, Hui
2016-01-01
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581
Chen, Yun; Yang, Hui
2016-12-14
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.
Clustering of financial time series
NASA Astrophysics Data System (ADS)
D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo
2013-05-01
This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.
Clustering change patterns using Fourier transformation with time-course gene expression data.
Kim, Jaehee
2011-01-01
To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Hierarchical Dirichlet process model for gene expression clustering
2013-01-01
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
Possible world based consistency learning model for clustering and classifying uncertain data.
Liu, Han; Zhang, Xianchao; Zhang, Xiaotong
2018-06-01
Possible world has shown to be effective for handling various types of data uncertainty in uncertain data management. However, few uncertain data clustering and classification algorithms are proposed based on possible world. Moreover, existing possible world based algorithms suffer from the following issues: (1) they deal with each possible world independently and ignore the consistency principle across different possible worlds; (2) they require the extra post-processing procedure to obtain the final result, which causes that the effectiveness highly relies on the post-processing method and the efficiency is also not very good. In this paper, we propose a novel possible world based consistency learning model for uncertain data, which can be extended both for clustering and classifying uncertain data. This model utilizes the consistency principle to learn a consensus affinity matrix for uncertain data, which can make full use of the information across different possible worlds and then improve the clustering and classification performance. Meanwhile, this model imposes a new rank constraint on the Laplacian matrix of the consensus affinity matrix, thereby ensuring that the number of connected components in the consensus affinity matrix is exactly equal to the number of classes. This also means that the clustering and classification results can be directly obtained without any post-processing procedure. Furthermore, for the clustering and classification tasks, we respectively derive the efficient optimization methods to solve the proposed model. Experimental results on real benchmark datasets and real world uncertain datasets show that the proposed model outperforms the state-of-the-art uncertain data clustering and classification algorithms in effectiveness and performs competitively in efficiency. Copyright © 2018 Elsevier Ltd. All rights reserved.
Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection
Liu, Wenfen
2017-01-01
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447
NASA Astrophysics Data System (ADS)
Liu, Fang; Cao, San-xing; Lu, Rui
2012-04-01
This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.
Kernel spectral clustering with memory effect
NASA Astrophysics Data System (ADS)
Langone, Rocco; Alzate, Carlos; Suykens, Johan A. K.
2013-05-01
Evolving graphs describe many natural phenomena changing over time, such as social relationships, trade markets, metabolic networks etc. In this framework, performing community detection and analyzing the cluster evolution represents a critical task. Here we propose a new model for this purpose, where the smoothness of the clustering results over time can be considered as a valid prior knowledge. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness. The latter allows the model to cluster the current data well and to be consistent with the recent history. We also propose new model selection criteria in order to carefully choose the hyper-parameters of our model, which is a crucial issue to achieve good performances. We successfully test the model on four toy problems and on a real world network. We also compare our model with Evolutionary Spectral Clustering, which is a state-of-the-art algorithm for community detection of evolving networks, illustrating that the kernel spectral clustering with memory effect can achieve better or equal performances.
NASA Astrophysics Data System (ADS)
Sehgal, V.; Lakhanpal, A.; Maheswaran, R.; Khosa, R.; Sridhar, Venkataramana
2018-01-01
This study proposes a wavelet-based multi-resolution modeling approach for statistical downscaling of GCM variables to mean monthly precipitation for five locations at Krishna Basin, India. Climatic dataset from NCEP is used for training the proposed models (Jan.'69 to Dec.'94) and are applied to corresponding CanCM4 GCM variables to simulate precipitation for the validation (Jan.'95-Dec.'05) and forecast (Jan.'06-Dec.'35) periods. The observed precipitation data is obtained from the India Meteorological Department (IMD) gridded precipitation product at 0.25 degree spatial resolution. This paper proposes a novel Multi-Scale Wavelet Entropy (MWE) based approach for clustering climatic variables into suitable clusters using k-means methodology. Principal Component Analysis (PCA) is used to obtain the representative Principal Components (PC) explaining 90-95% variance for each cluster. A multi-resolution non-linear approach combining Discrete Wavelet Transform (DWT) and Second Order Volterra (SoV) is used to model the representative PCs to obtain the downscaled precipitation for each downscaling location (W-P-SoV model). The results establish that wavelet-based multi-resolution SoV models perform significantly better compared to the traditional Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) based frameworks. It is observed that the proposed MWE-based clustering and subsequent PCA, helps reduce the dimensionality of the input climatic variables, while capturing more variability compared to stand-alone k-means (no MWE). The proposed models perform better in estimating the number of precipitation events during the non-monsoon periods whereas the models with clustering without MWE over-estimate the rainfall during the dry season.
A spatial scan statistic for nonisotropic two-level risk cluster.
Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie
2012-01-30
Spatial scan statistic methods are commonly used for geographical disease surveillance and cluster detection. The standard spatial scan statistic does not model any variability in the underlying risks of subregions belonging to a detected cluster. For a multilevel risk cluster, the isotonic spatial scan statistic could model a centralized high-risk kernel in the cluster. Because variations in disease risks are anisotropic owing to different social, economical, or transport factors, the real high-risk kernel will not necessarily take the central place in a whole cluster area. We propose a spatial scan statistic for a nonisotropic two-level risk cluster, which could be used to detect a whole cluster and a noncentralized high-risk kernel within the cluster simultaneously. The performance of the three methods was evaluated through an intensive simulation study. Our proposed nonisotropic two-level method showed better power and geographical precision with two-level risk cluster scenarios, especially for a noncentralized high-risk kernel. Our proposed method is illustrated using the hand-foot-mouth disease data in Pingdu City, Shandong, China in May 2009, compared with two other methods. In this practical study, the nonisotropic two-level method is the only way to precisely detect a high-risk area in a detected whole cluster. Copyright © 2011 John Wiley & Sons, Ltd.
Network-based spatial clustering technique for exploring features in regional industry
NASA Astrophysics Data System (ADS)
Chou, Tien-Yin; Huang, Pi-Hui; Yang, Lung-Shih; Lin, Wen-Tzu
2008-10-01
In the past researches, industrial cluster mainly focused on single or particular industry and less on spatial industrial structure and mutual relations. Industrial cluster could generate three kinds of spillover effects, including knowledge, labor market pooling, and input sharing. In addition, industrial cluster indeed benefits industry development. To fully control the status and characteristics of district industrial cluster can facilitate to improve the competitive ascendancy of district industry. The related researches on industrial spatial cluster were of great significance for setting up industrial policies and promoting district economic development. In this study, an improved model, GeoSOM, that combines DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and SOM (Self-Organizing Map) was developed for analyzing industrial cluster. Different from former distance-based algorithm for industrial cluster, the proposed GeoSOM model can calculate spatial characteristics between firms based on DBSCAN algorithm and evaluate the similarity between firms based on SOM clustering analysis. The demonstrative data sets, the manufacturers around Taichung County in Taiwan, were analyzed for verifying the practicability of the proposed model. The analyzed results indicate that GeoSOM is suitable for evaluating spatial industrial cluster.
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta
2017-01-01
Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic. PMID:28245222
Wu, Jibing; Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta
2017-01-01
Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic.
Analyzing gene expression time-courses based on multi-resolution shape mixture model.
Li, Ying; He, Ye; Zhang, Yu
2016-11-01
Biological processes actually are a dynamic molecular process over time. Time course gene expression experiments provide opportunities to explore patterns of gene expression change over a time and understand the dynamic behavior of gene expression, which is crucial for study on development and progression of biology and disease. Analysis of the gene expression time-course profiles has not been fully exploited so far. It is still a challenge problem. We propose a novel shape-based mixture model clustering method for gene expression time-course profiles to explore the significant gene groups. Based on multi-resolution fractal features and mixture clustering model, we proposed a multi-resolution shape mixture model algorithm. Multi-resolution fractal features is computed by wavelet decomposition, which explore patterns of change over time of gene expression at different resolution. Our proposed multi-resolution shape mixture model algorithm is a probabilistic framework which offers a more natural and robust way of clustering time-course gene expression. We assessed the performance of our proposed algorithm using yeast time-course gene expression profiles compared with several popular clustering methods for gene expression profiles. The grouped genes identified by different methods are evaluated by enrichment analysis of biological pathways and known protein-protein interactions from experiment evidence. The grouped genes identified by our proposed algorithm have more strong biological significance. A novel multi-resolution shape mixture model algorithm based on multi-resolution fractal features is proposed. Our proposed model provides a novel horizons and an alternative tool for visualization and analysis of time-course gene expression profiles. The R and Matlab program is available upon the request. Copyright © 2016 Elsevier Inc. All rights reserved.
Energy Efficient Cluster Based Scheduling Scheme for Wireless Sensor Networks
Srie Vidhya Janani, E.; Ganesh Kumar, P.
2015-01-01
The energy utilization of sensor nodes in large scale wireless sensor network points out the crucial need for scalable and energy efficient clustering protocols. Since sensor nodes usually operate on batteries, the maximum utility of network is greatly dependent on ideal usage of energy leftover in these sensor nodes. In this paper, we propose an Energy Efficient Cluster Based Scheduling Scheme for wireless sensor networks that balances the sensor network lifetime and energy efficiency. In the first phase of our proposed scheme, cluster topology is discovered and cluster head is chosen based on remaining energy level. The cluster head monitors the network energy threshold value to identify the energy drain rate of all its cluster members. In the second phase, scheduling algorithm is presented to allocate time slots to cluster member data packets. Here congestion occurrence is totally avoided. In the third phase, energy consumption model is proposed to maintain maximum residual energy level across the network. Moreover, we also propose a new packet format which is given to all cluster member nodes. The simulation results prove that the proposed scheme greatly contributes to maximum network lifetime, high energy, reduced overhead, and maximum delivery ratio. PMID:26495417
Model selection for clustering of pharmacokinetic responses.
Guerra, Rui P; Carvalho, Alexandra M; Mateus, Paulo
2018-08-01
Pharmacokinetics comprises the study of drug absorption, distribution, metabolism and excretion over time. Clinical pharmacokinetics, focusing on therapeutic management, offers important insights towards personalised medicine through the study of efficacy and toxicity of drug therapies. This study is hampered by subject's high variability in drug blood concentration, when starting a therapy with the same drug dosage. Clustering of pharmacokinetics responses has been addressed recently as a way to stratify subjects and provide different drug doses for each stratum. This clustering method, however, is not able to automatically determine the correct number of clusters, using an user-defined parameter for collapsing clusters that are closer than a given heuristic threshold. We aim to use information-theoretical approaches to address parameter-free model selection. We propose two model selection criteria for clustering pharmacokinetics responses, founded on the Minimum Description Length and on the Normalised Maximum Likelihood. Experimental results show the ability of model selection schemes to unveil the correct number of clusters underlying the mixture of pharmacokinetics responses. In this work we were able to devise two model selection criteria to determine the number of clusters in a mixture of pharmacokinetics curves, advancing over previous works. A cost-efficient parallel implementation in Java of the proposed method is publicly available for the community. Copyright © 2018 Elsevier B.V. All rights reserved.
Shen, Chung-Wei; Chen, Yi-Hau
2018-03-13
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
Automatic pole-like object modeling via 3D part-based analysis of point cloud
NASA Astrophysics Data System (ADS)
He, Liu; Yang, Haoxiang; Huang, Yuchun
2016-10-01
Pole-like objects, including trees, lampposts and traffic signs, are indispensable part of urban infrastructure. With the advance of vehicle-based laser scanning (VLS), massive point cloud of roadside urban areas becomes applied in 3D digital city modeling. Based on the property that different pole-like objects have various canopy parts and similar trunk parts, this paper proposed the 3D part-based shape analysis to robustly extract, identify and model the pole-like objects. The proposed method includes: 3D clustering and recognition of trunks, voxel growing and part-based 3D modeling. After preprocessing, the trunk center is identified as the point that has local density peak and the largest minimum inter-cluster distance. Starting from the trunk centers, the remaining points are iteratively clustered to the same centers of their nearest point with higher density. To eliminate the noisy points, cluster border is refined by trimming boundary outliers. Then, candidate trunks are extracted based on the clustering results in three orthogonal planes by shape analysis. Voxel growing obtains the completed pole-like objects regardless of overlaying. Finally, entire trunk, branch and crown part are analyzed to obtain seven feature parameters. These parameters are utilized to model three parts respectively and get signal part-assembled 3D model. The proposed method is tested using the VLS-based point cloud of Wuhan University, China. The point cloud includes many kinds of trees, lampposts and other pole-like posters under different occlusions and overlaying. Experimental results show that the proposed method can extract the exact attributes and model the roadside pole-like objects efficiently.
A hybrid algorithm for clustering of time series data based on affinity search technique.
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
Liu, Jingxia; Colditz, Graham A
2018-05-01
There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the "working correlation structure" is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs-exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Probing dark matter physics with galaxy clusters
NASA Astrophysics Data System (ADS)
Dalal, Neal
2016-10-01
We propose a theoretical investigation of the effects of a class of dark matter (DM) self-interactions on the properties of galaxy clusters and their host dark matter halos. Recent work using HST has claimed the detection of a particular form of DM self-interaction, which can lead to observable displacements between satellite galaxies within clusters and the DM subhalos hosting them. This form of self-interaction is highly anisotropic, favoring forward scattering with low momentum transfer, unlike isotropically scattering self-interacting dark matter (SIDM) models. This class of models has not been simulated numerically, clouding the interpretation of the claimed offsets between galaxies and lensing peaks observed by HST. We propose to perform high resolution simulations of cosmological structure formation for this class of SIDM model, focusing on three observables accessible to existing HST observations of clusters. First, we will quantify the extent to which offsets between baryons and DM can arise in these models, as a function of the cross section. Secondly, we will also quantify the effects of this type of DM self-interaction on halo concentrations, to determine the range of cross-sections allowed by existing stringent constraints from HST. Finally we will compute the so-called splashback feature in clusters, specifically focusing on whether SIDM can resolve the current discrepancy between observed values of splashback radii in clusters compared to theoretical predictions for CDM. The proposed investigations will add value to all existing deep HST observations of galaxy clusters by allowing them to probe dark matter physics in three independent ways.
Structure of the starch granule--a curved crystal.
Larsson, K
1991-09-01
A structure model of the molecular arrangement in native starch proposed earlier is further considered, with special regard to the lateral packing of cluster units. The amylopectin molecules are radially distributed, with branches concentrated in clusters. Within each cluster the polyglucan chains form double helices which are hexagonally packed. The clusters form spherically concentric crystalline layers with amylose in an amorphous form acting as a space-filler. A translational mechanism for the change of helical direction at boundaries between clusters is proposed which can account for variations in the curvature of the concentric layers. The model is related to X-ray diffraction data and optical birefringence, considering dissembly at gelatinization. The structure is also discussed in relation to biosynthesis. Some aspects of gelatinization, such as the recent glass-transition approach, are then considered.
Optimizing the maximum reported cluster size in the spatial scan statistic for ordinal data.
Kim, Sehwi; Jung, Inkyung
2017-01-01
The spatial scan statistic is an important tool for spatial cluster detection. There have been numerous studies on scanning window shapes. However, little research has been done on the maximum scanning window size or maximum reported cluster size. Recently, Han et al. proposed to use the Gini coefficient to optimize the maximum reported cluster size. However, the method has been developed and evaluated only for the Poisson model. We adopt the Gini coefficient to be applicable to the spatial scan statistic for ordinal data to determine the optimal maximum reported cluster size. Through a simulation study and application to a real data example, we evaluate the performance of the proposed approach. With some sophisticated modification, the Gini coefficient can be effectively employed for the ordinal model. The Gini coefficient most often picked the optimal maximum reported cluster sizes that were the same as or smaller than the true cluster sizes with very high accuracy. It seems that we can obtain a more refined collection of clusters by using the Gini coefficient. The Gini coefficient developed specifically for the ordinal model can be useful for optimizing the maximum reported cluster size for ordinal data and helpful for properly and informatively discovering cluster patterns.
Optimizing the maximum reported cluster size in the spatial scan statistic for ordinal data
Kim, Sehwi
2017-01-01
The spatial scan statistic is an important tool for spatial cluster detection. There have been numerous studies on scanning window shapes. However, little research has been done on the maximum scanning window size or maximum reported cluster size. Recently, Han et al. proposed to use the Gini coefficient to optimize the maximum reported cluster size. However, the method has been developed and evaluated only for the Poisson model. We adopt the Gini coefficient to be applicable to the spatial scan statistic for ordinal data to determine the optimal maximum reported cluster size. Through a simulation study and application to a real data example, we evaluate the performance of the proposed approach. With some sophisticated modification, the Gini coefficient can be effectively employed for the ordinal model. The Gini coefficient most often picked the optimal maximum reported cluster sizes that were the same as or smaller than the true cluster sizes with very high accuracy. It seems that we can obtain a more refined collection of clusters by using the Gini coefficient. The Gini coefficient developed specifically for the ordinal model can be useful for optimizing the maximum reported cluster size for ordinal data and helpful for properly and informatively discovering cluster patterns. PMID:28753674
2012-01-01
Background Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models. Results We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that our model outperforms existing models to provide more reliable and robust clustering of time-course data. Our model provides superior results when genetic profiles are correlated. It also gives comparable results when the correlation between the gene profiles is weak. In the applications to real time-course data, relevant clusters of coregulated genes are obtained, which are supported by gene-function annotation databases. Conclusions Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clustering time-course data because it adopts a random effects model that allows for the correlation among observations at different time points. It postulates gene-specific random effects with an autocorrelation variance structure that models coregulation within the clusters. The developed R package is flexible in its specification of the random effects through user-input parameters that enables improved modelling and consequent clustering of time-course data. PMID:23151154
Banerjee, Arindam; Ghosh, Joydeep
2004-05-01
Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.
Clustered Multi-Task Learning for Automatic Radar Target Recognition
Li, Cong; Bao, Weimin; Xu, Luping; Zhang, Hua
2017-01-01
Model training is a key technique for radar target recognition. Traditional model training algorithms in the framework of single task leaning ignore the relationships among multiple tasks, which degrades the recognition performance. In this paper, we propose a clustered multi-task learning, which can reveal and share the multi-task relationships for radar target recognition. To further make full use of these relationships, the latent multi-task relationships in the projection space are taken into consideration. Specifically, a constraint term in the projection space is proposed, the main idea of which is that multiple tasks within a close cluster should be close to each other in the projection space. In the proposed method, the cluster structures and multi-task relationships can be autonomously learned and utilized in both of the original and projected space. In view of the nonlinear characteristics of radar targets, the proposed method is extended to a non-linear kernel version and the corresponding non-linear multi-task solving method is proposed. Comprehensive experimental studies on simulated high-resolution range profile dataset and MSTAR SAR public database verify the superiority of the proposed method to some related algorithms. PMID:28953267
Industry Cluster's Adaptive Co-competition Behavior Modeling Inspired by Swarm Intelligence
NASA Astrophysics Data System (ADS)
Xiang, Wei; Ye, Feifan
Adaptation helps the individual enterprise to adjust its behavior to uncertainties in environment and hence determines a healthy growth of both the individuals and the whole industry cluster as well. This paper is focused on the study on co-competition adaptation behavior of industry cluster, which is inspired by swarm intelligence mechanisms. By referencing to ant cooperative transportation and ant foraging behavior and their related swarm intelligence approaches, the cooperative adaptation and competitive adaptation behavior are studied and relevant models are proposed. Those adaptive co-competition behaviors model can be integrated to the multi-agent system of industry cluster to make the industry cluster model more realistic.
MOCCA-SURVEY Database I: Is NGC 6535 a dark star cluster harbouring an IMBH?
NASA Astrophysics Data System (ADS)
Askar, Abbas; Bianchini, Paolo; de Vita, Ruggero; Giersz, Mirek; Hypki, Arkadiusz; Kamann, Sebastian
2017-01-01
We describe the dynamical evolution of a unique type of dark star cluster model in which the majority of the cluster mass at Hubble time is dominated by an intermediate-mass black hole (IMBH). We analysed results from about 2000 star cluster models (Survey Database I) simulated using the Monte Carlo code MOnte Carlo Cluster simulAtor and identified these dark star cluster models. Taking one of these models, we apply the method of simulating realistic `mock observations' by utilizing the Cluster simulatiOn Comparison with ObservAtions (COCOA) and Simulating Stellar Cluster Observation (SISCO) codes to obtain the photometric and kinematic observational properties of the dark star cluster model at 12 Gyr. We find that the perplexing Galactic globular cluster NGC 6535 closely matches the observational photometric and kinematic properties of the dark star cluster model presented in this paper. Based on our analysis and currently observed properties of NGC 6535, we suggest that this globular cluster could potentially harbour an IMBH. If it exists, the presence of this IMBH can be detected robustly with proposed kinematic observations of NGC 6535.
Android Malware Classification Using K-Means Clustering Algorithm
NASA Astrophysics Data System (ADS)
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
H. Li; X. Deng; Andy Dolloff; E. P. Smith
2015-01-01
A novel clustering method for bivariate functional data is proposed to group streams based on their waterâair temperature relationship. A distance measure is developed for bivariate curves by using a time-varying coefficient model and a weighting scheme. This distance is also adjusted by spatial correlation of streams via the variogram. Therefore, the proposed...
Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis.
Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan
2017-09-27
A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.
Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis
Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan
2017-01-01
A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation. PMID:28953231
Hierarchical modeling of cluster size in wildlife surveys
Royle, J. Andrew
2008-01-01
Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).
Spatially Compact Neural Clusters in the Dorsal Striatum Encode Locomotion Relevant Information.
Barbera, Giovanni; Liang, Bo; Zhang, Lifeng; Gerfen, Charles R; Culurciello, Eugenio; Chen, Rong; Li, Yun; Lin, Da-Ting
2016-10-05
An influential striatal model postulates that neural activities in the striatal direct and indirect pathways promote and inhibit movement, respectively. Normal behavior requires coordinated activity in the direct pathway to facilitate intended locomotion and indirect pathway to inhibit unwanted locomotion. In this striatal model, neuronal population activity is assumed to encode locomotion relevant information. Here, we propose a novel encoding mechanism for the dorsal striatum. We identified spatially compact neural clusters in both the direct and indirect pathways. Detailed characterization revealed similar cluster organization between the direct and indirect pathways, and cluster activities from both pathways were correlated with mouse locomotion velocities. Using machine-learning algorithms, cluster activities could be used to decode locomotion relevant behavioral states and locomotion velocity. We propose that neural clusters in the dorsal striatum encode locomotion relevant information and that coordinated activities of direct and indirect pathway neural clusters are required for normal striatal controlled behavior. VIDEO ABSTRACT. Published by Elsevier Inc.
A segmentation/clustering model for the analysis of array CGH data.
Picard, F; Robin, S; Lebarbier, E; Daudin, J-J
2007-09-01
Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.
Cluster-cluster correlations and constraints on the correlation hierarchy
NASA Technical Reports Server (NTRS)
Hamilton, A. J. S.; Gott, J. R., III
1988-01-01
The hypothesis that galaxies cluster around clusters at least as strongly as they cluster around galaxies imposes constraints on the hierarchy of correlation amplitudes in hierachical clustering models. The distributions which saturate these constraints are the Rayleigh-Levy random walk fractals proposed by Mandelbrot; for these fractal distributions cluster-cluster correlations are all identically equal to galaxy-galaxy correlations. If correlation amplitudes exceed the constraints, as is observed, then cluster-cluster correlations must exceed galaxy-galaxy correlations, as is observed.
A Hidden Markov Model for Urban-Scale Traffic Estimation Using Floating Car Data.
Wang, Xiaomeng; Peng, Ling; Chi, Tianhe; Li, Mengzhu; Yao, Xiaojing; Shao, Jing
2015-01-01
Urban-scale traffic monitoring plays a vital role in reducing traffic congestion. Owing to its low cost and wide coverage, floating car data (FCD) serves as a novel approach to collecting traffic data. However, sparse probe data represents the vast majority of the data available on arterial roads in most urban environments. In order to overcome the problem of data sparseness, this paper proposes a hidden Markov model (HMM)-based traffic estimation model, in which the traffic condition on a road segment is considered as a hidden state that can be estimated according to the conditions of road segments having similar traffic characteristics. An algorithm based on clustering and pattern mining rather than on adjacency relationships is proposed to find clusters with road segments having similar traffic characteristics. A multi-clustering strategy is adopted to achieve a trade-off between clustering accuracy and coverage. Finally, the proposed model is designed and implemented on the basis of a real-time algorithm. Results of experiments based on real FCD confirm the applicability, accuracy, and efficiency of the model. In addition, the results indicate that the model is practicable for traffic estimation on urban arterials and works well even when more than 70% of the probe data are missing.
Rumor Diffusion in an Interests-Based Dynamic Social Network
Mao, Xinjun; Guessoum, Zahia; Zhou, Huiping
2013-01-01
To research rumor diffusion in social friend network, based on interests, a dynamic friend network is proposed, which has the characteristics of clustering and community, and a diffusion model is also proposed. With this friend network and rumor diffusion model, based on the zombie-city model, some simulation experiments to analyze the characteristics of rumor diffusion in social friend networks have been conducted. The results show some interesting observations: (1) positive information may evolve to become a rumor through the diffusion process that people may modify the information by word of mouth; (2) with the same average degree, a random social network has a smaller clustering coefficient and is more beneficial for rumor diffusion than the dynamic friend network; (3) a rumor is spread more widely in a social network with a smaller global clustering coefficient than in a social network with a larger global clustering coefficient; and (4) a network with a smaller clustering coefficient has a larger efficiency. PMID:24453911
Rumor diffusion in an interests-based dynamic social network.
Tang, Mingsheng; Mao, Xinjun; Guessoum, Zahia; Zhou, Huiping
2013-01-01
To research rumor diffusion in social friend network, based on interests, a dynamic friend network is proposed, which has the characteristics of clustering and community, and a diffusion model is also proposed. With this friend network and rumor diffusion model, based on the zombie-city model, some simulation experiments to analyze the characteristics of rumor diffusion in social friend networks have been conducted. The results show some interesting observations: (1) positive information may evolve to become a rumor through the diffusion process that people may modify the information by word of mouth; (2) with the same average degree, a random social network has a smaller clustering coefficient and is more beneficial for rumor diffusion than the dynamic friend network; (3) a rumor is spread more widely in a social network with a smaller global clustering coefficient than in a social network with a larger global clustering coefficient; and (4) a network with a smaller clustering coefficient has a larger efficiency.
Review of methods for handling confounding by cluster and informative cluster size in clustered data
Seaman, Shaun; Pavlou, Menelaos; Copas, Andrew
2014-01-01
Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland. PMID:25087978
Topic modeling for cluster analysis of large biological and medical datasets
2014-01-01
Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106
Topic modeling for cluster analysis of large biological and medical datasets.
Zhao, Weizhong; Zou, Wen; Chen, James J
2014-01-01
The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip
2014-11-01
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.
Regional SAR Image Segmentation Based on Fuzzy Clustering with Gamma Mixture Model
NASA Astrophysics Data System (ADS)
Li, X. L.; Zhao, Q. H.; Li, Y.
2017-09-01
Most of stochastic based fuzzy clustering algorithms are pixel-based, which can not effectively overcome the inherent speckle noise in SAR images. In order to deal with the problem, a regional SAR image segmentation algorithm based on fuzzy clustering with Gamma mixture model is proposed in this paper. First, initialize some generating points randomly on the image, the image domain is divided into many sub-regions using Voronoi tessellation technique. Each sub-region is regarded as a homogeneous area in which the pixels share the same cluster label. Then, assume the probability of the pixel to be a Gamma mixture model with the parameters respecting to the cluster which the pixel belongs to. The negative logarithm of the probability represents the dissimilarity measure between the pixel and the cluster. The regional dissimilarity measure of one sub-region is defined as the sum of the measures of pixels in the region. Furthermore, the Markov Random Field (MRF) model is extended from pixels level to Voronoi sub-regions, and then the regional objective function is established under the framework of fuzzy clustering. The optimal segmentation results can be obtained by the solution of model parameters and generating points. Finally, the effectiveness of the proposed algorithm can be proved by the qualitative and quantitative analysis from the segmentation results of the simulated and real SAR images.
Data-driven process decomposition and robust online distributed modelling for large-scale processes
NASA Astrophysics Data System (ADS)
Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou
2018-02-01
With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.
Persistent Topology and Metastable State in Conformational Dynamics
Chang, Huang-Wei; Bacallado, Sergio; Pande, Vijay S.; Carlsson, Gunnar E.
2013-01-01
The large amount of molecular dynamics simulation data produced by modern computational models brings big opportunities and challenges to researchers. Clustering algorithms play an important role in understanding biomolecular kinetics from the simulation data, especially under the Markov state model framework. However, the ruggedness of the free energy landscape in a biomolecular system makes common clustering algorithms very sensitive to perturbations of the data. Here, we introduce a data-exploratory tool which provides an overview of the clustering structure under different parameters. The proposed Multi-Persistent Clustering analysis combines insights from recent studies on the dynamics of systems with dominant metastable states with the concept of multi-dimensional persistence in computational topology. We propose to explore the clustering structure of the data based on its persistence on scale and density. The analysis provides a systematic way to discover clusters that are robust to perturbations of the data. The dominant states of the system can be chosen with confidence. For the clusters on the borderline, the user can choose to do more simulation or make a decision based on their structural characteristics. Furthermore, our multi-resolution analysis gives users information about the relative potential of the clusters and their hierarchical relationship. The effectiveness of the proposed method is illustrated in three biomolecules: alanine dipeptide, Villin headpiece, and the FiP35 WW domain. PMID:23565139
Huang, Yangxin; Lu, Xiaosun; Chen, Jiaqing; Liang, Juan; Zangmeister, Miriam
2017-10-27
Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately.
Transformation and model choice for RNA-seq co-expression analysis.
Rau, Andrea; Maugis-Rabusseau, Cathy
2018-05-01
Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.
Al Mamoon, Ishtiak; Muzahidul Islam, A K M; Baharun, Sabariah; Ahmed, Ashir; Komaki, Shozo
2016-08-01
Due to the rapid growth of wireless medical devices in near future, wireless healthcare services may face some inescapable issue such as medical spectrum scarcity, electromagnetic interference (EMI), bandwidth constraint, security and finally medical data communication model. To mitigate these issues, cognitive radio (CR) or opportunistic radio network enabled wireless technology is suitable for the upcoming wireless healthcare system. The up-to-date research on CR based healthcare has exposed some developments on EMI and spectrum problems. However, the investigation recommendation on system design and network model for CR enabled hospital is rare. Thus, this research designs a hierarchy based hybrid network architecture and network maintenance protocols for previously proposed CR hospital system, known as CogMed. In the previous study, the detail architecture of CogMed and its maintenance protocols were not present. The proposed architecture includes clustering concepts for cognitive base stations and non-medical devices. Two cluster head (CH selector equations are formulated based on priority of location, device, mobility rate of devices and number of accessible channels. In order to maintain the integrity of the proposed network model, node joining and node leaving protocols are also proposed. Finally, the simulation results show that the proposed network maintenance time is very low for emergency medical devices (average maintenance period 9.5 ms) and the re-clustering effects for different mobility enabled non-medical devices are also balanced.
Jeon, Jihyoun; Hsu, Li; Gorfine, Malka
2012-07-01
Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.
NASA Astrophysics Data System (ADS)
Mitchell, Myles A.; He, Jian-hua; Arnold, Christian; Li, Baojiu
2018-06-01
We propose a new framework for testing gravity using cluster observations, which aims to provide an unbiased constraint on modified gravity models from Sunyaev-Zel'dovich (SZ) and X-ray cluster counts and the cluster gas fraction, among other possible observables. Focusing on a popular f(R) model of gravity, we propose a novel procedure to recalibrate mass scaling relations from Λ cold dark matter (ΛCDM) to f(R) gravity for SZ and X-ray cluster observables. We find that the complicated modified gravity effects can be simply modelled as a dependence on a combination of the background scalar field and redshift, fR(z)/(1 + z), regardless of the f(R) model parameter. By employing a large suite of N-body simulations, we demonstrate that a theoretically derived tanh fitting formula is in excellent agreement with the dynamical mass enhancement of dark matter haloes for a large range of background field parameters and redshifts. Our framework is sufficiently flexible to allow for tests of other models and inclusion of further observables, and the one-parameter description of the dynamical mass enhancement can have important implications on the theoretical modelling of observables and on practical tests of gravity.
Biclustering Models for Two-Mode Ordinal Data.
Matechou, Eleni; Liu, Ivy; Fernández, Daniel; Farias, Miguel; Gjelsvik, Bergljot
2016-09-01
The work in this paper introduces finite mixture models that can be used to simultaneously cluster the rows and columns of two-mode ordinal categorical response data, such as those resulting from Likert scale responses. We use the popular proportional odds parameterisation and propose models which provide insights into major patterns in the data. Model-fitting is performed using the EM algorithm, and a fuzzy allocation of rows and columns to corresponding clusters is obtained. The clustering ability of the models is evaluated in a simulation study and demonstrated using two real data sets.
Lu, Chi-Jie; Chang, Chi-Chang
2014-01-01
Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting.
A model of autophagy size selectivity by receptor clustering on peroxisomes
NASA Astrophysics Data System (ADS)
Brown, Aidan I.; Rutenberg, Andrew D.
2017-05-01
Selective autophagy must not only select the correct type of organelle, but also must discriminate between individual organelles of the same kind so that some but not all of the organelles are removed. We propose that physical clustering of autophagy receptor proteins on the organelle surface can provide an appropriate all-or-none signal for organelle degradation. We explore this proposal using a computational model restricted to peroxisomes and the relatively well characterized pexophagy receptor proteins NBR1 and p62. We find that larger peroxisomes nucleate NBR1 clusters first and lose them last through competitive coarsening. This results in significant size-selectivity that favors large peroxisomes, and can explain the increased catalase signal that results from siRNA inhibition of p62. Excess ubiquitin, resulting from damaged organelles, suppresses size-selectivity but not cluster formation. Our proposed selectivity mechanism thus allows all damaged organelles to be degraded, while otherwise selecting only a portion of organelles for degradation.
2014-01-01
Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting. PMID:25045738
NASA Astrophysics Data System (ADS)
Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.
2017-08-01
Self-learning equivalent-convolutional neural structures (SLECNS) for auto-coding-decoding and image clustering are discussed. The SLECNS architectures and their spatially invariant equivalent models (SI EMs) using the corresponding matrix-matrix procedures with basic operations of continuous logic and non-linear processing are proposed. These SI EMs have several advantages, such as the ability to recognize image fragments with better efficiency and strong cross correlation. The proposed clustering method of fragments with regard to their structural features is suitable not only for binary, but also color images and combines self-learning and the formation of weight clustered matrix-patterns. Its model is constructed and designed on the basis of recursively processing algorithms and to k-average method. The experimental results confirmed that larger images and 2D binary fragments with a large numbers of elements may be clustered. For the first time the possibility of generalization of these models for space invariant case is shown. The experiment for an image with dimension of 256x256 (a reference array) and fragments with dimensions of 7x7 and 21x21 for clustering is carried out. The experiments, using the software environment Mathcad, showed that the proposed method is universal, has a significant convergence, the small number of iterations is easily, displayed on the matrix structure, and confirmed its prospects. Thus, to understand the mechanisms of self-learning equivalence-convolutional clustering, accompanying her to the competitive processes in neurons, and the neural auto-encoding-decoding and recognition principles with the use of self-learning cluster patterns is very important which used the algorithm and the principles of non-linear processing of two-dimensional spatial functions of images comparison. These SIEMs can simply describe the signals processing during the all training and recognition stages and they are suitable for unipolar-coding multilevel signals. We show that the implementation of SLECNS based on known equivalentors or traditional correlators is possible if they are based on proposed equivalental two-dimensional functions of image similarity. The clustering efficiency in such models and their implementation depends on the discriminant properties of neural elements of hidden layers. Therefore, the main models and architecture parameters and characteristics depends on the applied types of non-linear processing and function used for image comparison or for adaptive-equivalental weighing of input patterns. Real model experiments in Mathcad are demonstrated, which confirm that non-linear processing on equivalent functions allows you to determine the neuron winners and adjust the weight matrix. Experimental results have shown that such models can be successfully used for auto- and hetero-associative recognition. They can also be used to explain some mechanisms known as "focus" and "competing gain-inhibition concept". The SLECNS architecture and hardware implementations of its basic nodes based on multi-channel convolvers and correlators with time integration are proposed. The parameters and performance of such architectures are estimated.
Euler-Vector Clustering of GPS Velocities Defines Microplate Geometry in Southwest Japan
NASA Astrophysics Data System (ADS)
Savage, J. C.
2018-02-01
I have used Euler-vector clustering to assign 469 GEONET stations in southwest Japan to k clusters (k = 2, 3,..., 9) so that, for any k, the velocities of stations within each cluster are most consistent with rigid-block motion on a sphere. That is, I attempt to explain the raw (i.e., uncorrected for strain accumulation), 1996-2006 velocities of those 469 Global Positioning System stations by rigid motion of k clusters on the surface of a spherical Earth. Because block geometry is maintained as strain accumulates, Euler-vector clustering may better approximate the block geometry than the values of the associated Euler vectors. The microplate solution for each k is constructed by merging contiguous clusters that have closely similar Euler vectors. The best solution consists of three microplates arranged along the Nankaido Trough-Ryukyu Trench between the Amurian and Philippine Sea Plates. One of these microplates, the South Kyushu Microplate (an extension of the Ryukyu forearc into the southeast corner of Kyushu), had previously been identified from paleomagnetic rotations. Relative to ITRF2000 the three microplates rotate at different rates about neighboring poles located close to the northwest corner of Shikoku. The microplate model is identical to that proposed in the block model of Wallace et al. (2009, https://doi.org/10.1130/G2522A.1) except in southernmost Kyushu. On Shikoku and Honshu, but not Kyushu, the microplate model is consistent with that proposed in the block models of Nishimura and Hashimoto (2006, https://doi.org/10.1016/j.tecto.2006.04.017) and Loveless and Meade (2010, https://doi.org/10.1029/2008JB006248) without the low-slip-rate boundaries proposed in the latter.
A two-stage method for microcalcification cluster segmentation in mammography by deformable models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.
Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods aremore » applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross-validation methodology. A previously developed B-spline active rays segmentation method was also considered for comparison purposes. Results: Interobserver and intraobserver segmentation agreements (median and [25%, 75%] quartile range) were substantial with respect to the distance metrics HDIST{sub cluster} (2.3 [1.8, 2.9] and 2.5 [2.1, 3.2] pixels) and AMINDIST{sub cluster} (0.8 [0.6, 1.0] and 1.0 [0.8, 1.2] pixels), while moderate with respect to AOM{sub cluster} (0.64 [0.55, 0.71] and 0.59 [0.52, 0.66]). The proposed segmentation method outperformed (0.80 ± 0.04) statistically significantly (Mann-Whitney U-test, p < 0.05) the B-spline active rays segmentation method (0.69 ± 0.04), suggesting the significance of the proposed semiautomated method. Conclusions: Results indicate a reliable semiautomated segmentation method for MC clusters offered by deformable models, which could be utilized in MC cluster quantitative image analysis.« less
Population Structure With Localized Haplotype Clusters
Browning, Sharon R.; Weir, Bruce S.
2010-01-01
We propose a multilocus version of FST and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific FST estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based FST than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of FST and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data. PMID:20457877
CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS
McParland, Damien; Gormley, Isobel Claire; McCormick, Tyler H.; Clark, Samuel J.; Kabudula, Chodziwadziwa Whiteson; Collinson, Mark A.
2014-01-01
The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region. PMID:25485026
ERIC Educational Resources Information Center
Vera, J. Fernando; Macias, Rodrigo; Heiser, Willem J.
2009-01-01
In this paper, we propose a cluster-MDS model for two-way one-mode continuous rating dissimilarity data. The model aims at partitioning the objects into classes and simultaneously representing the cluster centers in a low-dimensional space. Under the normal distribution assumption, a latent class model is developed in terms of the set of…
Leong, Siow Hoo; Ong, Seng Huat
2017-01-01
This paper considers three crucial issues in processing scaled down image, the representation of partial image, similarity measure and domain adaptation. Two Gaussian mixture model based algorithms are proposed to effectively preserve image details and avoids image degradation. Multiple partial images are clustered separately through Gaussian mixture model clustering with a scan and select procedure to enhance the inclusion of small image details. The local image features, represented by maximum likelihood estimates of the mixture components, are classified by using the modified Bayes factor (MBF) as a similarity measure. The detection of novel local features from MBF will suggest domain adaptation, which is changing the number of components of the Gaussian mixture model. The performance of the proposed algorithms are evaluated with simulated data and real images and it is shown to perform much better than existing Gaussian mixture model based algorithms in reproducing images with higher structural similarity index.
Leong, Siow Hoo
2017-01-01
This paper considers three crucial issues in processing scaled down image, the representation of partial image, similarity measure and domain adaptation. Two Gaussian mixture model based algorithms are proposed to effectively preserve image details and avoids image degradation. Multiple partial images are clustered separately through Gaussian mixture model clustering with a scan and select procedure to enhance the inclusion of small image details. The local image features, represented by maximum likelihood estimates of the mixture components, are classified by using the modified Bayes factor (MBF) as a similarity measure. The detection of novel local features from MBF will suggest domain adaptation, which is changing the number of components of the Gaussian mixture model. The performance of the proposed algorithms are evaluated with simulated data and real images and it is shown to perform much better than existing Gaussian mixture model based algorithms in reproducing images with higher structural similarity index. PMID:28686634
Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis
2015-01-01
ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach. PMID:25794172
Weakly supervised image semantic segmentation based on clustering superpixels
NASA Astrophysics Data System (ADS)
Yan, Xiong; Liu, Xiaohua
2018-04-01
In this paper, we propose an image semantic segmentation model which is trained from image-level labeled images. The proposed model starts with superpixel segmenting, and features of the superpixels are extracted by trained CNN. We introduce a superpixel-based graph followed by applying the graph partition method to group correlated superpixels into clusters. For the acquisition of inter-label correlations between the image-level labels in dataset, we not only utilize label co-occurrence statistics but also exploit visual contextual cues simultaneously. At last, we formulate the task of mapping appropriate image-level labels to the detected clusters as a problem of convex minimization. Experimental results on MSRC-21 dataset and LableMe dataset show that the proposed method has a better performance than most of the weakly supervised methods and is even comparable to fully supervised methods.
Chen, Yingyi; Yu, Huihui; Cheng, Yanjun; Cheng, Qianqian; Li, Daoliang
2018-01-01
A precise predictive model is important for obtaining a clear understanding of the changes in dissolved oxygen content in crab ponds. Highly accurate interval forecasting of dissolved oxygen content is fundamental to reduce risk, and three-dimensional prediction can provide more accurate results and overall guidance. In this study, a hybrid three-dimensional (3D) dissolved oxygen content prediction model based on a radial basis function (RBF) neural network, K-means and subtractive clustering was developed and named the subtractive clustering (SC)-K-means-RBF model. In this modeling process, K-means and subtractive clustering methods were employed to enhance the hyperparameters required in the RBF neural network model. The comparison of the predicted results of different traditional models validated the effectiveness and accuracy of the proposed hybrid SC-K-means-RBF model for three-dimensional prediction of dissolved oxygen content. Consequently, the proposed model can effectively display the three-dimensional distribution of dissolved oxygen content and serve as a guide for feeding and future studies.
Price Formation Based on Particle-Cluster Aggregation
NASA Astrophysics Data System (ADS)
Wang, Shijun; Zhang, Changshui
In the present work, we propose a microscopic model of financial markets based on particle-cluster aggregation on a two-dimensional small-world information network in order to simulate the dynamics of the stock markets. "Stylized facts" of the financial market time series, such as fat-tail distribution of returns, volatility clustering and multifractality, are observed in the model. The results of the model agree with empirical data taken from historical records of the daily closures of the NYSE composite index.
Properties of highly clustered networks
NASA Astrophysics Data System (ADS)
Newman, M. E.
2003-08-01
We propose and solve exactly a model of a network that has both a tunable degree distribution and a tunable clustering coefficient. Among other things, our results indicate that increased clustering leads to a decrease in the size of the giant component of the network. We also study susceptible/infective/recovered type epidemic processes within the model and find that clustering decreases the size of epidemics, but also decreases the epidemic threshold, making it easier for diseases to spread. In addition, clustering causes epidemics to saturate sooner, meaning that they infect a near-maximal fraction of the network for quite low transmission rates.
On the multi-scale description of micro-structured fluids composed of aggregating rods
NASA Astrophysics Data System (ADS)
Perez, Marta; Scheuer, Adrien; Abisset-Chavanne, Emmanuelle; Ammar, Amine; Chinesta, Francisco; Keunings, Roland
2018-05-01
When addressing the flow of concentrated suspensions composed of rods, dense clusters are observed. Thus, the adequate modelling and simulation of such a flow requires addressing the kinematics of these dense clusters and their impact on the flow in which they are immersed. In a former work, we addressed a first modelling framework of these clusters, assumed so dense that they were considered rigid and their kinematics (flow-induced rotation) were totally defined by a symmetric tensor c with unit trace representing the cluster conformation. Then, the rigid nature of the clusters was relaxed, assuming them deformable, and a model giving the evolution of both the cluster shape and its microstructural orientation descriptor (the so-called shape and orientation tensors) was proposed. This paper compares the predictions coming from those models with finer-scale discrete simulations inspired from molecular dynamics modelling.
Activity of a social dynamics model
NASA Astrophysics Data System (ADS)
Reia, Sandro M.; Neves, Ubiraci P. C.
2015-10-01
Axelrod's model was proposed to study interactions between agents and the formation of cultural domains. It presents a transition from a monocultural to a multicultural steady state which has been studied in the literature by evaluation of the relative size of the largest cluster. In this article, we propose new measurements based on the concept of activity per agent to study the Axelrod's model on the square lattice. We show that the variance of system activity can be used to indicate the critical points of the transition. Furthermore the frequency distribution of the system activity is able to show a coexistence of phases typical of a first order phase transition. Finally, we verify a power law dependence between cluster activity and cluster size for multicultural steady state configurations at the critical point.
Text Summarization Model based on Facility Location Problem
NASA Astrophysics Data System (ADS)
Takamura, Hiroya; Okumura, Manabu
e propose a novel multi-document generic summarization model based on the budgeted median problem, which is a facility location problem. The summarization method based on our model is an extractive method, which selects sentences from the given document cluster and generates a summary. Each sentence in the document cluster will be assigned to one of the selected sentences, where the former sentece is supposed to be represented by the latter. Our method selects sentences to generate a summary that yields a good sentence assignment and hence covers the whole content of the document cluster. An advantage of this method is that it can incorporate asymmetric relations between sentences such as textual entailment. Through experiments, we showed that the proposed method yields good summaries on the dataset of DUC'04.
Percolation of the site random-cluster model by Monte Carlo method
NASA Astrophysics Data System (ADS)
Wang, Songsong; Zhang, Wanzhou; Ding, Chengxiang
2015-08-01
We propose a site random-cluster model by introducing an additional cluster weight in the partition function of the traditional site percolation. To simulate the model on a square lattice, we combine the color-assignation and the Swendsen-Wang methods to design a highly efficient cluster algorithm with a small critical slowing-down phenomenon. To verify whether or not it is consistent with the bond random-cluster model, we measure several quantities, such as the wrapping probability Re, the percolating cluster density P∞, and the magnetic susceptibility per site χp, as well as two exponents, such as the thermal exponent yt and the fractal dimension yh of the percolating cluster. We find that for different exponents of cluster weight q =1.5 , 2, 2.5 , 3, 3.5 , and 4, the numerical estimation of the exponents yt and yh are consistent with the theoretical values. The universalities of the site random-cluster model and the bond random-cluster model are completely identical. For larger values of q , we find obvious signatures of the first-order percolation transition by the histograms and the hysteresis loops of percolating cluster density and the energy per site. Our results are helpful for the understanding of the percolation of traditional statistical models.
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-01-01
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices’ service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes’ life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN. PMID:26703619
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things.
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-12-23
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices' service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes' life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN.
A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set
Peng, Yi; Zhang, Yong; Kou, Gang; Shi, Yong
2012-01-01
Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm–k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study. PMID:22870181
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Resche-Rigon, Matthieu; White, Ian R
2018-06-01
In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
Kasza, J; Hemming, K; Hooper, R; Matthews, Jns; Forbes, A B
2017-01-01
Stepped wedge and cluster randomised crossover trials are examples of cluster randomised designs conducted over multiple time periods that are being used with increasing frequency in health research. Recent systematic reviews of both of these designs indicate that the within-cluster correlation is typically taken account of in the analysis of data using a random intercept mixed model, implying a constant correlation between any two individuals in the same cluster no matter how far apart in time they are measured: within-period and between-period intra-cluster correlations are assumed to be identical. Recently proposed extensions allow the within- and between-period intra-cluster correlations to differ, although these methods require that all between-period intra-cluster correlations are identical, which may not be appropriate in all situations. Motivated by a proposed intensive care cluster randomised trial, we propose an alternative correlation structure for repeated cross-sectional multiple-period cluster randomised trials in which the between-period intra-cluster correlation is allowed to decay depending on the distance between measurements. We present results for the variance of treatment effect estimators for varying amounts of decay, investigating the consequences of the variation in decay on sample size planning for stepped wedge, cluster crossover and multiple-period parallel-arm cluster randomised trials. We also investigate the impact of assuming constant between-period intra-cluster correlations instead of decaying between-period intra-cluster correlations. Our results indicate that in certain design configurations, including the one corresponding to the proposed trial, a correlation decay can have an important impact on variances of treatment effect estimators, and hence on sample size and power. An R Shiny app allows readers to interactively explore the impact of correlation decay.
NASA Astrophysics Data System (ADS)
Ji, Yu; Sheng, Wanxing; Jin, Wei; Wu, Ming; Liu, Haitao; Chen, Feng
2018-02-01
A coordinated optimal control method of active and reactive power of distribution network with distributed PV cluster based on model predictive control is proposed in this paper. The method divides the control process into long-time scale optimal control and short-time scale optimal control with multi-step optimization. The models are transformed into a second-order cone programming problem due to the non-convex and nonlinear of the optimal models which are hard to be solved. An improved IEEE 33-bus distribution network system is used to analyse the feasibility and the effectiveness of the proposed control method
Cluster kinetics model of particle separation in vibrated granular media.
McCoy, Benjamin J; Madras, Giridhar
2006-01-01
We model the Brazil-nut effect (BNE) by hypothesizing that granules form clusters that fragment and aggregate. This provides a heterogeneous medium in which the immersed intruder particle rises (BNE) or sinks (reverse BNE) according to relative convection currents and buoyant and drag forces. A simple relationship proposed for viscous drag in terms of the vibrational intensity and the particle to grain density ratio allows simulation of published experimental data for rise and sink times as functions of particle radius, initial depth of the particle, and particle-grain density ratio. The proposed model correctly describes the experimentally observed maximum in risetime.
Chaos theory perspective for industry clusters development
NASA Astrophysics Data System (ADS)
Yu, Haiying; Jiang, Minghui; Li, Chengzhang
2016-03-01
Industry clusters have outperformed in economic development in most developing countries. The contributions of industrial clusters have been recognized as promotion of regional business and the alleviation of economic and social costs. It is no doubt globalization is rendering clusters in accelerating the competitiveness of economic activities. In accordance, many ideas and concepts involve in illustrating evolution tendency, stimulating the clusters development, meanwhile, avoiding industrial clusters recession. The term chaos theory is introduced to explain inherent relationship of features within industry clusters. A preferred life cycle approach is proposed for industrial cluster recessive theory analysis. Lyapunov exponents and Wolf model are presented for chaotic identification and examination. A case study of Tianjin, China has verified the model effectiveness. The investigations indicate that the approaches outperform in explaining chaos properties in industrial clusters, which demonstrates industrial clusters evolution, solves empirical issues and generates corresponding strategies.
Effects of cluster-shell competition and BCS-like pairing in 12C
NASA Astrophysics Data System (ADS)
Matsuno, H.; Itagaki, N.
2017-12-01
The antisymmetrized quasi-cluster model (AQCM) was proposed to describe α-cluster and jj-coupling shell models on the same footing. In this model, the cluster-shell transition is characterized by two parameters, R representing the distance between α clusters and Λ describing the breaking of α clusters, and the contribution of the spin-orbit interaction, very important in the jj-coupling shell model, can be taken into account starting with the α-cluster model wave function. Not only the closure configurations of the major shells but also the subclosure configurations of the jj-coupling shell model can be described starting with the α-cluster model wave functions; however, the particle-hole excitations of single particles have not been fully established yet. In this study we show that the framework of AQCM can be extended even to the states with the character of single-particle excitations. For ^{12}C, two-particle-two-hole (2p2h) excitations from the subclosure configuration of 0p_{3/2} corresponding to a BCS-like pairing are described, and these shell model states are coupled with the three α-cluster model wave functions. The correlation energy from the optimal configuration can be estimated not only in the cluster part but also in the shell model part. We try to pave the way to establish a generalized description of the nuclear structure.
Modeling online social signed networks
NASA Astrophysics Data System (ADS)
Li, Le; Gu, Ke; Zeng, An; Fan, Ying; Di, Zengru
2018-04-01
People's online rating behavior can be modeled by user-object bipartite networks directly. However, few works have been devoted to reveal the hidden relations between users, especially from the perspective of signed networks. We analyze the signed monopartite networks projected by the signed user-object bipartite networks, finding that the networks are highly clustered with obvious community structure. Interestingly, the positive clustering coefficient is remarkably higher than the negative clustering coefficient. Then, a Signed Growing Network model (SGN) based on local preferential attachment is proposed to generate a user's signed network that has community structure and high positive clustering coefficient. Other structural properties of the modeled networks are also found to be similar to the empirical networks.
A clustering-based fuzzy wavelet neural network model for short-term load forecasting.
Kodogiannis, Vassilis S; Amina, Mahdi; Petrounias, Ilias
2013-10-01
Load forecasting is a critical element of power system operation, involving prediction of the future level of demand to serve as the basis for supply and demand planning. This paper presents the development of a novel clustering-based fuzzy wavelet neural network (CB-FWNN) model and validates its prediction on the short-term electric load forecasting of the Power System of the Greek Island of Crete. The proposed model is obtained from the traditional Takagi-Sugeno-Kang fuzzy system by replacing the THEN part of fuzzy rules with a "multiplication" wavelet neural network (MWNN). Multidimensional Gaussian type of activation functions have been used in the IF part of the fuzzyrules. A Fuzzy Subtractive Clustering scheme is employed as a pre-processing technique to find out the initial set and adequate number of clusters and ultimately the number of multiplication nodes in MWNN, while Gaussian Mixture Models with the Expectation Maximization algorithm are utilized for the definition of the multidimensional Gaussians. The results corresponding to the minimum and maximum power load indicate that the proposed load forecasting model provides significantly accurate forecasts, compared to conventional neural networks models.
Confidence Intervals for Assessing Heterogeneity in Generalized Linear Mixed Models
ERIC Educational Resources Information Center
Wagler, Amy E.
2014-01-01
Generalized linear mixed models are frequently applied to data with clustered categorical outcomes. The effect of clustering on the response is often difficult to practically assess partly because it is reported on a scale on which comparisons with regression parameters are difficult to make. This article proposes confidence intervals for…
Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello
2018-04-22
A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Qin, Lei; Snoussi, Hichem; Abdallah, Fahed
2014-01-01
We propose a novel approach for tracking an arbitrary object in video sequences for visual surveillance. The first contribution of this work is an automatic feature extraction method that is able to extract compact discriminative features from a feature pool before computing the region covariance descriptor. As the feature extraction method is adaptive to a specific object of interest, we refer to the region covariance descriptor computed using the extracted features as the adaptive covariance descriptor. The second contribution is to propose a weakly supervised method for updating the object appearance model during tracking. The method performs a mean-shift clustering procedure among the tracking result samples accumulated during a period of time and selects a group of reliable samples for updating the object appearance model. As such, the object appearance model is kept up-to-date and is prevented from contamination even in case of tracking mistakes. We conducted comparing experiments on real-world video sequences, which confirmed the effectiveness of the proposed approaches. The tracking system that integrates the adaptive covariance descriptor and the clustering-based model updating method accomplished stable object tracking on challenging video sequences. PMID:24865883
Network Modeling and Energy-Efficiency Optimization for Advanced Machine-to-Machine Sensor Networks
Jung, Sungmo; Kim, Jong Hyun; Kim, Seoksoo
2012-01-01
Wireless machine-to-machine sensor networks with multiple radio interfaces are expected to have several advantages, including high spatial scalability, low event detection latency, and low energy consumption. Here, we propose a network model design method involving network approximation and an optimized multi-tiered clustering algorithm that maximizes node lifespan by minimizing energy consumption in a non-uniformly distributed network. Simulation results show that the cluster scales and network parameters determined with the proposed method facilitate a more efficient performance compared to existing methods. PMID:23202190
Cluster kinetics model for mixtures of glassformers
NASA Astrophysics Data System (ADS)
Brenskelle, Lisa A.; McCoy, Benjamin J.
2007-10-01
For glassformers we propose a binary mixture relation for parameters in a cluster kinetics model previously shown to represent pure compound data for viscosity and dielectric relaxation as functions of either temperature or pressure. The model parameters are based on activation energies and activation volumes for cluster association-dissociation processes. With the mixture parameters, we calculated dielectric relaxation times and compared the results to experimental values for binary mixtures. Mixtures of sorbitol and glycerol (seven compositions), sorbitol and xylitol (three compositions), and polychloroepihydrin and polyvinylmethylether (three compositions) were studied.
An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network
Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian
2015-01-01
Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish–Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection. PMID:26447696
An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network.
Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian
2015-01-01
Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish-Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection.
Double Cluster Heads Model for Secure and Accurate Data Fusion in Wireless Sensor Networks
Fu, Jun-Song; Liu, Yun
2015-01-01
Secure and accurate data fusion is an important issue in wireless sensor networks (WSNs) and has been extensively researched in the literature. In this paper, by combining clustering techniques, reputation and trust systems, and data fusion algorithms, we propose a novel cluster-based data fusion model called Double Cluster Heads Model (DCHM) for secure and accurate data fusion in WSNs. Different from traditional clustering models in WSNs, two cluster heads are selected after clustering for each cluster based on the reputation and trust system and they perform data fusion independently of each other. Then, the results are sent to the base station where the dissimilarity coefficient is computed. If the dissimilarity coefficient of the two data fusion results exceeds the threshold preset by the users, the cluster heads will be added to blacklist, and the cluster heads must be reelected by the sensor nodes in a cluster. Meanwhile, feedback is sent from the base station to the reputation and trust system, which can help us to identify and delete the compromised sensor nodes in time. Through a series of extensive simulations, we found that the DCHM performed very well in data fusion security and accuracy. PMID:25608211
Warming rays in cluster cool cores
NASA Astrophysics Data System (ADS)
Colafrancesco, S.; Marchegiani, P.
2008-06-01
Context: Cosmic rays are confined in the atmospheres of galaxy clusters and, therefore, they can play a crucial role in the heating of their cool cores. Aims: We discuss here the thermal and non-thermal features of a model of cosmic ray heating of cluster cores that can provide a solution to the cooling-flow problems. To this aim, we generalize a model originally proposed by Colafrancesco, Dar & DeRujula (2004) and we show that our model predicts specific correlations between the thermal and non-thermal properties of galaxy clusters and enables various observational tests. Methods: The model reproduces the observed temperature distribution in clusters by using an energy balance condition in which the X-ray energy emitted by clusters is supplied, in a quasi-steady state, by the hadronic cosmic rays, which act as “warming rays” (WRs). The temperature profile of the intracluster (IC) gas is strictly correlated with the pressure distribution of the WRs and, consequently, with the non-thermal emission (radio, hard X-ray and gamma-ray) induced by the interaction of the WRs with the IC gas and the IC magnetic field. Results: The temperature distribution of the IC gas in both cool-core and non cool-core clusters is successfully predicted from the measured IC plasma density distribution. Under this contraint, the WR model is also able to reproduce the thermal and non-thermal pressure distribution in clusters, as well as their radial entropy distribution, as shown by the analysis of three clusters studied in detail: Perseus, A2199 and Hydra. The WR model provides other observable features of galaxy clusters: a correlation of the pressure ratio (WRs to thermal IC gas) with the inner cluster temperature (P_WR/P_th) ˜ (kT_inner)-2/3, a correlation of the gamma-ray luminosity with the inner cluster temperature Lγ ˜ (kT_inner)4/3, a substantial number of cool-core clusters observable with the GLAST-LAT experiment, a surface brightness of radio halos in cool-core clusters that recovers the observed one, a hard X-ray ICS emission from cool-core clusters that is systematically lower than the observed limits and yet observable with the next generation high-sensitivity and spatial resolution HXR experiments like Simbol-X. Conclusions: The specific theoretical properties and the multi-frequency distribution of the e.m. signals predicted in the WR model render it quite different from the other models so far proposed for the heating of clusters' cool-cores. Such differences make it possible to prove or disprove our model as an explanation for the cooling-flow problems on the basis of multi-frequency observations of galaxy clusters.
Shape and dynamics of thermoregulating honey bee clusters.
Sumpter, D J; Broomhead, D S
2000-05-07
A model of simple algorithmic "agents" acting in a discrete temperature field is used to investigate the movement of individuals in thermoregulating honey bee (Apis mellifera) clusters. Thermoregulation in over-wintering clusters is thought to be the result of individual bees attempting to regulate their own body temperatures. At ambient temperatures above 0( degrees )C, a clustering bee will move relative to its neighbours so as to put its local temperature within some ideal range. The proposed model incorporates this behaviour into an algorithm for bee agents moving on a two-dimensional lattice. Heat transport on the lattice is modelled by a discrete diffusion process. Computer simulation of this model demonstrates qualitative behaviour which agrees with that of real honey bee clusters. In particular, we observe the formation of both disc- and ring-like cluster shapes. The simulation also suggests that at lower ambient temperatures, clusters do not always have a stable shape but can oscillate between insulating rings of different sizes and densities. Copyright 2000 Academic Press.
Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.
Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin
2018-05-03
Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.
Modeling tensional homeostasis in multicellular clusters.
Tam, Sze Nok; Smith, Michael L; Stamenović, Dimitrije
2017-03-01
Homeostasis of mechanical stress in cells, or tensional homeostasis, is essential for normal physiological function of tissues and organs and is protective against disease progression, including atherosclerosis and cancer. Recent experimental studies have shown that isolated cells are not capable of maintaining tensional homeostasis, whereas multicellular clusters are, with stability increasing with the size of the clusters. Here, we proposed simple mathematical models to interpret experimental results and to obtain insight into factors that determine homeostasis. Multicellular clusters were modeled as one-dimensional arrays of linearly elastic blocks that were either jointed or disjointed. Fluctuating forces that mimicked experimentally measured cell-substrate tractions were obtained from Monte Carlo simulations. These forces were applied to the cluster models, and the corresponding stress field in the cluster was calculated by solving the equilibrium equation. It was found that temporal fluctuations of the cluster stress field became attenuated with increasing cluster size, indicating that the cluster approached tensional homeostasis. These results were consistent with previously reported experimental data. Furthermore, the models revealed that key determinants of tensional homeostasis in multicellular clusters included the cluster size, the distribution of traction forces, and mechanical coupling between adjacent cells. Based on these findings, we concluded that tensional homeostasis was a multicellular phenomenon. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Chiara, Matteo; Horner, David S; Spada, Alberto
2013-01-01
De novo transcriptome characterization from Next Generation Sequencing data has become an important approach in the study of non-model plants. Despite notable advances in the assembly of short reads, the clustering of transcripts into unigene-like (locus-specific) clusters remains a somewhat neglected subject. Indeed, closely related paralogous transcripts are often merged into single clusters by current approaches. Here, a novel heuristic method for locus-specific clustering is compared to that implemented in the de novo assembler Oases, using the same initial transcript collections, derived from Arabidopsis thaliana and the developmental model Streptocarpus rexii. We show that the proposed approach improves cluster specificity in the A. thaliana dataset for which the reference genome is available. Furthermore, for the S. rexii data our filtered transcript collection matches a larger number of distinct annotated loci in reference genomes than the Oases set, while containing a reduced overall number of loci. A detailed discussion of advantages and limitations of our approach in processing de novo transcriptome reconstructions is presented. The proposed method should be widely applicable to other organisms, irrespective of the transcript assembly method employed. The S. rexii transcriptome is available as a sophisticated and augmented publicly available online database.
Yu, Huihui; Cheng, Yanjun; Cheng, Qianqian; Li, Daoliang
2018-01-01
A precise predictive model is important for obtaining a clear understanding of the changes in dissolved oxygen content in crab ponds. Highly accurate interval forecasting of dissolved oxygen content is fundamental to reduce risk, and three-dimensional prediction can provide more accurate results and overall guidance. In this study, a hybrid three-dimensional (3D) dissolved oxygen content prediction model based on a radial basis function (RBF) neural network, K-means and subtractive clustering was developed and named the subtractive clustering (SC)-K-means-RBF model. In this modeling process, K-means and subtractive clustering methods were employed to enhance the hyperparameters required in the RBF neural network model. The comparison of the predicted results of different traditional models validated the effectiveness and accuracy of the proposed hybrid SC-K-means-RBF model for three-dimensional prediction of dissolved oxygen content. Consequently, the proposed model can effectively display the three-dimensional distribution of dissolved oxygen content and serve as a guide for feeding and future studies. PMID:29466394
Lo, Kenneth
2011-01-01
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components. PMID:22125375
Lo, Kenneth; Gottardo, Raphael
2012-01-01
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.
Dynamic Fuzzy Model Development for a Drum-type Boiler-turbine Plant Through GK Clustering
NASA Astrophysics Data System (ADS)
Habbi, Ahcène; Zelmat, Mimoun
2008-10-01
This paper discusses a TS fuzzy model identification method for an industrial drum-type boiler plant using the GK fuzzy clustering approach. The fuzzy model is constructed from a set of input-output data that covers a wide operating range of the physical plant. The reference data is generated using a complex first-principle-based mathematical model that describes the key dynamical properties of the boiler-turbine dynamics. The proposed fuzzy model is derived by means of fuzzy clustering method with particular attention on structure flexibility and model interpretability issues. This may provide a basement of a new way to design model based control and diagnosis mechanisms for the complex nonlinear plant.
Description of alternating-parity bands within the dinuclear-system model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shneidman, T. M.; Adamian, G. G., E-mail: adamian@theor.jinr.ru; Antonenko, N. V.
2016-11-15
A cluster approach is used to describe ground-state-based alternating-parity bands in even–even nuclei and to study the band-termination mechanism. A method is proposed for testing the cluster nature of alternating-parity bands.
NASA Astrophysics Data System (ADS)
Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor
2017-05-01
Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.
NASA Astrophysics Data System (ADS)
Fume, Kosei; Ishitani, Yasuto
2008-01-01
We propose a document categorization method based on a document model that can be defined externally for each task and that categorizes Web content or business documents into a target category in accordance with the similarity of the model. The main feature of the proposed method consists of two aspects of semantics extraction from an input document. The semantics of terms are extracted by the semantic pattern analysis and implicit meanings of document substructure are specified by a bottom-up text clustering technique focusing on the similarity of text line attributes. We have constructed a system based on the proposed method for trial purposes. The experimental results show that the system achieves more than 80% classification accuracy in categorizing Web content and business documents into 15 or 70 categories.
NASA Astrophysics Data System (ADS)
Liu, Jianjun; Kan, Jianquan
2018-04-01
In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.
Modeling and clustering water demand patterns from real-world smart meter data
NASA Astrophysics Data System (ADS)
Cheifetz, Nicolas; Noumir, Zineb; Samé, Allou; Sandraz, Anne-Claire; Féliers, Cédric; Heim, Véronique
2017-08-01
Nowadays, drinking water utilities need an acute comprehension of the water demand on their distribution network, in order to efficiently operate the optimization of resources, manage billing and propose new customer services. With the emergence of smart grids, based on automated meter reading (AMR), a better understanding of the consumption modes is now accessible for smart cities with more granularities. In this context, this paper evaluates a novel methodology for identifying relevant usage profiles from the water consumption data produced by smart meters. The methodology is fully data-driven using the consumption time series which are seen as functions or curves observed with an hourly time step. First, a Fourier-based additive time series decomposition model is introduced to extract seasonal patterns from time series. These patterns are intended to represent the customer habits in terms of water consumption. Two functional clustering approaches are then used to classify the extracted seasonal patterns: the functional version of K-means, and the Fourier REgression Mixture (FReMix) model. The K-means approach produces a hard segmentation and K representative prototypes. On the other hand, the FReMix is a generative model and also produces K profiles as well as a soft segmentation based on the posterior probabilities. The proposed approach is applied to a smart grid deployed on the largest water distribution network (WDN) in France. The two clustering strategies are evaluated and compared. Finally, a realistic interpretation of the consumption habits is given for each cluster. The extensive experiments and the qualitative interpretation of the resulting clusters allow one to highlight the effectiveness of the proposed methodology.
A modified procedure for mixture-model clustering of regional geochemical data
Ellefsen, Karl J.; Smith, David B.; Horton, John D.
2014-01-01
A modified procedure is proposed for mixture-model clustering of regional-scale geochemical data. The key modification is the robust principal component transformation of the isometric log-ratio transforms of the element concentrations. This principal component transformation and the associated dimension reduction are applied before the data are clustered. The principal advantage of this modification is that it significantly improves the stability of the clustering. The principal disadvantage is that it requires subjective selection of the number of clusters and the number of principal components. To evaluate the efficacy of this modified procedure, it is applied to soil geochemical data that comprise 959 samples from the state of Colorado (USA) for which the concentrations of 44 elements are measured. The distributions of element concentrations that are derived from the mixture model and from the field samples are similar, indicating that the mixture model is a suitable representation of the transformed geochemical data. Each cluster and the associated distributions of the element concentrations are related to specific geologic and anthropogenic features. In this way, mixture model clustering facilitates interpretation of the regional geochemical data.
Kwekkeboom, Kristine L; Tostrud, Lauren; Costanzo, Erin; Coe, Christopher L; Serlin, Ronald C; Ward, Sandra E; Zhang, Yingzi
2018-05-01
Symptom researchers have proposed a model of inflammatory cytokine activity and dysregulation in cancer to explain co-occurring symptoms including pain, fatigue, and sleep disturbance. We tested the hypothesis that psychological stress accentuates inflammation and that stress and inflammation contribute to one's experience of the pain, fatigue, and sleep disturbance symptom cluster (symptom cluster severity, symptom cluster distress) and its impact (symptom cluster interference with daily life, quality of life). We used baseline data from a symptom cluster management trial. Adult participants (N = 158) receiving chemotherapy for advanced cancer reported pain, fatigue, and sleep disturbance on enrollment. Before intervention, participants completed measures of demographics, perceived stress, symptom cluster severity, symptom cluster distress, symptom cluster interference with daily life, and quality of life and provided a blood sample for four inflammatory biomarkers (interleukin-1β, interleukin-6, tumor necrosis factor-α, and C-reactive protein). Stress was not directly related to any inflammatory biomarker. Stress and tumor necrosis factor-α were positively related to symptom cluster distress, although not symptom cluster severity. Tumor necrosis factor-α was indirectly related to symptom cluster interference with daily life, through its effect on symptom cluster distress. Stress was positively associated with symptom cluster interference with daily life and inversely with quality of life. Stress also had indirect effects on symptom cluster interference with daily life, through its effect on symptom cluster distress. The proposed inflammatory model of symptoms was partially supported. Investigators should test interventions that target stress as a contributing factor in co-occurring pain, fatigue, and sleep disturbance and explore other factors that may influence inflammatory biomarker levels within the context of an advanced cancer diagnosis and treatment. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Portfolio Decisions and Brain Reactions via the CEAD method.
Majer, Piotr; Mohr, Peter N C; Heekeren, Hauke R; Härdle, Wolfgang K
2016-09-01
Decision making can be a complex process requiring the integration of several attributes of choice options. Understanding the neural processes underlying (uncertain) investment decisions is an important topic in neuroeconomics. We analyzed functional magnetic resonance imaging (fMRI) data from an investment decision study for stimulus-related effects. We propose a new technique for identifying activated brain regions: cluster, estimation, activation, and decision method. Our analysis is focused on clusters of voxels rather than voxel units. Thus, we achieve a higher signal-to-noise ratio within the unit tested and a smaller number of hypothesis tests compared with the often used General Linear Model (GLM). We propose to first conduct the brain parcellation by applying spatially constrained spectral clustering. The information within each cluster can then be extracted by the flexible dynamic semiparametric factor model (DSFM) dimension reduction technique and finally be tested for differences in activation between conditions. This sequence of Cluster, Estimation, Activation, and Decision admits a model-free analysis of the local fMRI signal. Applying a GLM on the DSFM-based time series resulted in a significant correlation between the risk of choice options and changes in fMRI signal in the anterior insula and dorsomedial prefrontal cortex. Additionally, individual differences in decision-related reactions within the DSFM time series predicted individual differences in risk attitudes as modeled with the framework of the mean-variance model.
NASA Astrophysics Data System (ADS)
Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.
2018-03-01
The biologically-motivated self-learning equivalence-convolutional recurrent-multilayer neural structures (BLM_SL_EC_RMNS) for fragments images clustering and recognition will be discussed. We shall consider these neural structures and their spatial-invariant equivalental models (SIEMs) based on proposed equivalent two-dimensional functions of image similarity and the corresponding matrix-matrix (or tensor) procedures using as basic operations of continuous logic and nonlinear processing. These SIEMs can simply describe the signals processing during the all training and recognition stages and they are suitable for unipolar-coding multilevel signals. The clustering efficiency in such models and their implementation depends on the discriminant properties of neural elements of hidden layers. Therefore, the main models and architecture parameters and characteristics depends on the applied types of non-linear processing and function used for image comparison or for adaptive-equivalent weighing of input patterns. We show that these SL_EC_RMNSs have several advantages, such as the self-study and self-identification of features and signs of the similarity of fragments, ability to clustering and recognize of image fragments with best efficiency and strong mutual correlation. The proposed combined with learning-recognition clustering method of fragments with regard to their structural features is suitable not only for binary, but also color images and combines self-learning and the formation of weight clustered matrix-patterns. Its model is constructed and designed on the basis of recursively continuous logic and nonlinear processing algorithms and to k-average method or method the winner takes all (WTA). The experimental results confirmed that fragments with a large numbers of elements may be clustered. For the first time the possibility of generalization of these models for space invariant case is shown. The experiment for an images of different dimensions (a reference array) and fragments with diferent dimensions for clustering is carried out. The experiments, using the software environment Mathcad showed that the proposed method is universal, has a significant convergence, the small number of iterations is easily, displayed on the matrix structure, and confirmed its prospects. Thus, to understand the mechanisms of self-learning equivalence-convolutional clustering, accompanying her to the competitive processes in neurons, and the neural auto-encoding-decoding and recognition principles with the use of self-learning cluster patterns is very important which used the algorithm and the principles of non-linear processing of two-dimensional spatial functions of images comparison. The experimental results show that such models can be successfully used for auto- and hetero-associative recognition. Also they can be used to explain some mechanisms, known as "the reinforcementinhibition concept". Also we demonstrate a real model experiments, which confirm that the nonlinear processing by equivalent function allow to determine the neuron-winners and customize the weight matrix. At the end of the report, we will show how to use the obtained results and to propose new more efficient hardware architecture of SL_EC_RMNS based on matrix-tensor multipliers. Also we estimate the parameters and performance of such architectures.
Cluster state generation in one-dimensional Kitaev honeycomb model via shortcut to adiabaticity
NASA Astrophysics Data System (ADS)
Kyaw, Thi Ha; Kwek, Leong-Chuan
2018-04-01
We propose a mean to obtain computationally useful resource states also known as cluster states, for measurement-based quantum computation, via transitionless quantum driving algorithm. The idea is to cool the system to its unique ground state and tune some control parameters to arrive at computationally useful resource state, which is in one of the degenerate ground states. Even though there is set of conserved quantities already present in the model Hamiltonian, which prevents the instantaneous state to go to any other eigenstate subspaces, one cannot quench the control parameters to get the desired state. In that case, the state will not evolve. With involvement of the shortcut Hamiltonian, we obtain cluster states in fast-forward manner. We elaborate our proposal in the one-dimensional Kitaev honeycomb model, and show that the auxiliary Hamiltonian needed for the counterdiabatic driving is of M-body interaction.
NASA Astrophysics Data System (ADS)
Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok
2015-01-01
This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
NASA Astrophysics Data System (ADS)
Haghighi, Babak; Choi, Jiwoong; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long
2017-11-01
Accurate modeling of small airway diameters in patients with chronic obstructive pulmonary disease (COPD) is a crucial step toward patient-specific CFD simulations of regional airflow and particle transport. We proposed to use computed tomography (CT) imaging-based cluster membership to identify structural characteristics of airways in each cluster and use them to develop cluster-specific airway diameter models. We analyzed 284 COPD smokers with airflow limitation, and 69 healthy controls. We used multiscale imaging-based cluster analysis (MICA) to classify smokers into 4 clusters. With representative cluster patients and healthy controls, we performed multiple regressions to quantify variation of airway diameters by generation as well as by cluster. The cluster 2 and 4 showed more diameter decrease as generation increases than other clusters. The cluster 4 had more rapid decreases of airway diameters in the upper lobes, while cluster 2 in the lower lobes. We then used these regression models to estimate airway diameters in CT unresolved regions to obtain pressure-volume hysteresis curves using a 1D resistance model. These 1D flow solutions can be used to provide the patient-specific boundary conditions for 3D CFD simulations in COPD patients. Support for this study was provided, in part, by NIH Grants U01-HL114494, R01-HL112986 and S10-RR022421.
Estimation of homogeneous nucleation flux via a kinetic model
NASA Technical Reports Server (NTRS)
Wilcox, C. F.; Bauer, S. H.
1991-01-01
The proposed kinetic model for condensation under homogeneous conditions, and the onset of unidirectional cluster growth in supersaturated gases, does not suffer from the conceptual flaws that characterize classical nucleation theory. When a full set of simultaneous rate equation is solved, a characteristic time emerges, for each cluster size, at which the production rate, and its rate of conversion to the next size (n + 1) are equal. Procedures for estimating the essential parameters are proposed; condensation fluxes J(kin) exp ss are evaluated. Since there are practical limits to the cluster size that can be incorporated in the set of simultaneous first-order differential equations, a code was developed for computing an approximate J(th) exp ss based on estimates of a 'constrained equilibrium' distribution, and identification of its minimum.
NASA Astrophysics Data System (ADS)
Eliçabe, Guillermo E.
2013-09-01
In this work, an exact scattering model for a system of clusters of spherical particles, based on the Rayleigh-Gans approximation, has been parameterized in such a way that it can be solved in inverse form using Thikhonov Regularization to obtain the morphological parameters of the clusters. That is to say, the average number of particles per cluster, the size of the primary spherical units that form the cluster, and the Discrete Distance Distribution Function from which the z-average square radius of gyration of the system of clusters is obtained. The methodology is validated through a series of simulated and experimental examples of x-ray and light scattering that show that the proposed methodology works satisfactorily in unideal situations such as: presence of error in the measurements, presence of error in the model, and several types of unideallities present in the experimental cases.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-02-19
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
The stable clustering ansatz, consistency relations and gravity dual of large-scale structure
NASA Astrophysics Data System (ADS)
Munshi, Dipak
2018-02-01
Gravitational clustering in the nonlinear regime remains poorly understood. Gravity dual of gravitational clustering has recently been proposed as a means to study the nonlinear regime. The stable clustering ansatz remains a key ingredient to our understanding of gravitational clustering in the highly nonlinear regime. We study certain aspects of violation of the stable clustering ansatz in the gravity dual of Large Scale Structure (LSS). We extend the recent studies of gravitational clustering using AdS gravity dual to take into account possible departure from the stable clustering ansatz and to arbitrary dimensions. Next, we extend the recently introduced consistency relations to arbitrary dimensions. We use the consistency relations to test the commonly used models of gravitational clustering including the halo models and hierarchical ansätze. In particular we establish a tower of consistency relations for the hierarchical amplitudes: Q, Ra, Rb, Sa,Sb,Sc etc. as a functions of the scaled peculiar velocity h. We also study the variants of popular halo models in this context. In contrast to recent claims, none of these models, in their simplest incarnation, seem to satisfy the consistency relations in the soft limit.
Marginal regression approach for additive hazards models with clustered current status data.
Su, Pei-Fang; Chi, Yunchan
2014-01-15
Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated. Copyright © 2013 John Wiley & Sons, Ltd.
Approximate kernel competitive learning.
Wu, Jian-Sheng; Zheng, Wei-Shi; Lai, Jian-Huang
2015-03-01
Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.
Efficient Deployment of Key Nodes for Optimal Coverage of Industrial Mobile Wireless Networks
Li, Xiaomin; Li, Di; Dong, Zhijie; Hu, Yage; Liu, Chengliang
2018-01-01
In recent years, industrial wireless networks (IWNs) have been transformed by the introduction of mobile nodes, and they now offer increased extensibility, mobility, and flexibility. Nevertheless, mobile nodes pose efficiency and reliability challenges. Efficient node deployment and management of channel interference directly affect network system performance, particularly for key node placement in clustered wireless networks. This study analyzes this system model, considering both industrial properties of wireless networks and their mobility. Then, static and mobile node coverage problems are unified and simplified to target coverage problems. We propose a novel strategy for the deployment of clustered heads in grouped industrial mobile wireless networks (IMWNs) based on the improved maximal clique model and the iterative computation of new candidate cluster head positions. The maximal cliques are obtained via a double-layer Tabu search. Each cluster head updates its new position via an improved virtual force while moving with full coverage to find the minimal inter-cluster interference. Finally, we develop a simulation environment. The simulation results, based on a performance comparison, show the efficacy of the proposed strategies and their superiority over current approaches. PMID:29439439
Vaddypally, Shivaiah; Kondaveeti, Sandeep K; Karki, Santosh; Van Vliet, Megan M; Levis, Robert J; Zdilla, Michael J
2017-04-05
The molecular mechanism of the Oxygen Evolving Center of photosystem II has been under debate for decades. One frequently cited proposal is the nucleophilic attack by water hydroxide on a pendant Mn═O moiety, though no chemical example of this reactivity at a manganese cubane cluster has been reported. We describe here the preparation, characterization, and a reactivity study of a synthetic manganese cubane cluster with a pendant manganese-oxo moiety. Reaction of this cluster with alkenes results in oxygen and hydrogen atom transfer reactions to form alcohol- and ketone-based oxygen-containing products. Nitrene transfer from core imides is negligible. The inorganic product is a cluster identical to the precursor, but with the pendant Mn═O moiety replaced by a hydrogen abstracted from the organic substrate, and is isolated in quantitative yield. 18 O and 2 H isotopic labeling studies confirm the transfer of atoms between the cluster and the organic substrate. The results suggest that the core cubane structure of this model compound remains intact, and that the pendant Mn═O moiety is preferentially reactive.
Chen, Ling; Feng, Yanqin; Sun, Jianguo
2017-10-01
This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.
Bruneau, Marine; Mottet, Thierry; Moulin, Serge; Kerbiriou, Maël; Chouly, Franz; Chretien, Stéphane; Guyeux, Christophe
2018-02-01
In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clusters is not required here. For the sake of illustration, this method is applied on a set of 100 DNA sequences taken from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene, extracted from a collection of Platyhelminthes and Nematoda species. The resulting clusters are tightly consistent with the phylogenetic tree computed using a maximum likelihood approach on gene alignment. They are coherent too with the NCBI taxonomy. Further test results based on synthesized data are then provided, showing that the proposed approach is better able to recover the clusters than the most widely used software, namely Cd-hit-est and BLASTClust. Copyright © 2017 Elsevier Ltd. All rights reserved.
Luminosity Function of Faint Globular Clusters in M87
NASA Astrophysics Data System (ADS)
Waters, Christopher Z.; Zepf, Stephen E.; Lauer, Tod R.; Baltz, Edward A.; Silk, Joseph
2006-10-01
We present the luminosity function to very faint magnitudes for the globular clusters in M87, based on a 30 orbit Hubble Space Telescope (HST) WFPC2 imaging program. The very deep images and corresponding improved false source rejection allow us to probe the mass function further beyond the turnover than has been done before. We compare our luminosity function to those that have been observed in the past, and confirm the similarity of the turnover luminosity between M87 and the Milky Way. We also find with high statistical significance that the M87 luminosity function is broader than that of the Milky Way. We discuss how determining the mass function of the cluster system to low masses can constrain theoretical models of the dynamical evolution of globular cluster systems. Our mass function is consistent with the dependence of mass loss on the initial cluster mass given by classical evaporation, and somewhat inconsistent with newer proposals that have a shallower mass dependence. In addition, the rate of mass loss is consistent with standard evaporation models, and not with the much higher rates proposed by some recent studies of very young cluster systems. We also find that the mass-size relation has very little slope, indicating that there is almost no increase in the size of a cluster with increasing mass.
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.
Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania
2015-01-01
This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
NASA Astrophysics Data System (ADS)
Chen, Xin; Liu, Li; Zhou, Sida; Yue, Zhenjiang
2016-09-01
Reduced order models(ROMs) based on the snapshots on the CFD high-fidelity simulations have been paid great attention recently due to their capability of capturing the features of the complex geometries and flow configurations. To improve the efficiency and precision of the ROMs, it is indispensable to add extra sampling points to the initial snapshots, since the number of sampling points to achieve an adequately accurate ROM is generally unknown in prior, but a large number of initial sampling points reduces the parsimony of the ROMs. A fuzzy-clustering-based adding-point strategy is proposed and the fuzzy clustering acts an indicator of the region in which the precision of ROMs is relatively low. The proposed method is applied to construct the ROMs for the benchmark mathematical examples and a numerical example of hypersonic aerothermodynamics prediction for a typical control surface. The proposed method can achieve a 34.5% improvement on the efficiency than the estimated mean squared error prediction algorithm and shows same-level prediction accuracy.
Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf
2017-09-01
Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.
Linear regression models and k-means clustering for statistical analysis of fNIRS data.
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-02-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
Linear regression models and k-means clustering for statistical analysis of fNIRS data
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-01-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nielsen, Michael A.; School of Information Technology and Electrical Engineering, University of Queensland, Brisbane, Queensland 4072; Dawson, Christopher M.
The one-way quantum computing model introduced by Raussendorf and Briegel [Phys. Rev. Lett. 86, 5188 (2001)] shows that it is possible to quantum compute using only a fixed entangled resource known as a cluster state, and adaptive single-qubit measurements. This model is the basis for several practical proposals for quantum computation, including a promising proposal for optical quantum computation based on cluster states [M. A. Nielsen, Phys. Rev. Lett. (to be published), quant-ph/0402005]. A significant open question is whether such proposals are scalable in the presence of physically realistic noise. In this paper we prove two threshold theorems which showmore » that scalable fault-tolerant quantum computation may be achieved in implementations based on cluster states, provided the noise in the implementations is below some constant threshold value. Our first threshold theorem applies to a class of implementations in which entangling gates are applied deterministically, but with a small amount of noise. We expect this threshold to be applicable in a wide variety of physical systems. Our second threshold theorem is specifically adapted to proposals such as the optical cluster-state proposal, in which nondeterministic entangling gates are used. A critical technical component of our proofs is two powerful theorems which relate the properties of noisy unitary operations restricted to act on a subspace of state space to extensions of those operations acting on the entire state space. We expect these theorems to have a variety of applications in other areas of quantum-information science.« less
Modeling solute clustering in the diffusion layer around a growing crystal.
Shiau, Lie-Ding; Lu, Yung-Fang
2009-03-07
The mechanism of crystal growth from solution is often thought to consist of a mass transfer diffusion step followed by a surface reaction step. Solute molecules might form clusters in the diffusion step before incorporating into the crystal lattice. A model is proposed in this work to simulate the evolution of the cluster size distribution due to the simultaneous aggregation and breakage of solute molecules in the diffusion layer around a growing crystal in the stirred solution. The crystallization of KAl(SO(4))(2)12H(2)O from aqueous solution is studied to illustrate the effect of supersaturation and diffusion layer thickness on the number-average degree of clustering and the size distribution of solute clusters in the diffusion layer.
A cluster merging method for time series microarray with production values.
Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio
2014-09-01
A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.
Isothermality of the gas in the Coma cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hughes, J.P.; Yamashita, K.; Okumura, Y.
1988-04-01
The high-quality X-ray spectrum of the Coma cluster observed by the Japanese satelite Tenma in conjunction with imaging data from the Einstein Observatory was used to explore the temperature distribution of the cluster gas. It is found that pure polytropic models are inadequate to describe this temperature distribution. Instead, a hybrid model is proposed consisting of a central isothermal region surrounded by a polytropic distribution. It is shown that as much as 75 percent of the global emission may come from the isothermal component. 30 references.
A density-based clustering model for community detection in complex networks
NASA Astrophysics Data System (ADS)
Zhao, Xiang; Li, Yantao; Qu, Zehui
2018-04-01
Network clustering (or graph partitioning) is an important technique for uncovering the underlying community structures in complex networks, which has been widely applied in various fields including astronomy, bioinformatics, sociology, and bibliometric. In this paper, we propose a density-based clustering model for community detection in complex networks (DCCN). The key idea is to find group centers with a higher density than their neighbors and a relatively large integrated-distance from nodes with higher density. The experimental results indicate that our approach is efficient and effective for community detection of complex networks.
NASA Astrophysics Data System (ADS)
Kim, J.; Park, K.
2016-12-01
In order to evaluate the performance of operational forecast models in the Korea operational oceanographic system (KOOS) which has been developed by Korea Institute of Ocean Science and Technology (KIOST), a skill assessment (SA) tool has developed and provided multiple skill metrics including not only correlation and error skills by comparing predictions and observation but also pattern clustering with numerical models, satellite, and observation. The KOOS has produced 72 hours forecast information on atmospheric and hydrodynamic forecast variables of wind, pressure, current, tide, wave, temperature, and salinity at every 12 hours per day produced by operating numerical models such as WRF, ROMS, MOM5, WW-III, and SWAN and the SA has conducted to evaluate the forecasts. We have been operationally operated several kinds of numerical models such as WRF, ROMS, MOM5, MOHID, WW-III. Quantitative assessment of operational ocean forecast model is very important to provide accurate ocean forecast information not only to general public but also to support ocean-related problems. In this work, we propose a method of pattern clustering using machine learning method and GIS-based spatial analytics to evaluate spatial distribution of numerical models and spatial observation data such as satellite and HF radar. For the clustering, we use 10 or 15 years-long reanalysis data which was computed by the KOOS, ECMWF, and HYCOM to make best matching clusters which are classified physical meaning with time variation and then we compare it with forecast data. Moreover, for evaluating current, we develop extraction method of dominant flow and apply it to hydrodynamic models and HF radar's sea surface current data. By applying pattern clustering method, it allows more accurate and effective assessment of ocean forecast models' performance by comparing not only specific observation positions which are determined by observation stations but also spatio-temporal distribution of whole model areas. We believe that our proposed method will be very useful to examine and evaluate large amount of numerical modeling data as well as satellite data.
Cluster-based control of a separating flow over a smoothly contoured ramp
NASA Astrophysics Data System (ADS)
Kaiser, Eurika; Noack, Bernd R.; Spohn, Andreas; Cattafesta, Louis N.; Morzyński, Marek
2017-12-01
The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. The proposed closed-loop control framework addresses a key issue of model-based control: The actuation effect often results from slow dynamics of strongly nonlinear interactions which the flow reveals at timescales much longer than the prediction horizon of any model. Hence, we employ a probabilistic approach based on a cluster-based discretization of the Liouville equation for the evolution of the probability distribution. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a control-dependent Markov model. This Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is determined. We examine how the approach can be used to improve the open-loop actuation in a separating flow dominated by Kelvin-Helmholtz shedding. For this purpose, the feature space, in which the model is learned, and the admissible control inputs are tailored to strongly oscillatory flows.
Fast clustering using adaptive density peak detection.
Wang, Xiao-Feng; Xu, Yifan
2017-12-01
Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images
NASA Astrophysics Data System (ADS)
Zhai, Han; Zhang, Hongyan; Zhang, Liangpei; Li, Pingxiang
2016-10-01
Considering the inevitable obstacles faced by the pixel-based clustering methods, such as salt-and-pepper noise, high computational complexity, and the lack of spatial information, a reweighted mass center based object-oriented sparse subspace clustering (RMC-OOSSC) algorithm for hyperspectral images (HSIs) is proposed. First, the mean-shift segmentation method is utilized to oversegment the HSI to obtain meaningful objects. Second, a distance reweighted mass center learning model is presented to extract the representative and discriminative features for each object. Third, assuming that all the objects are sampled from a union of subspaces, it is natural to apply the SSC algorithm to the HSI. Faced with the high correlation among the hyperspectral objects, a weighting scheme is adopted to ensure that the highly correlated objects are preferred in the procedure of sparse representation, to reduce the representation errors. Two widely used hyperspectral datasets were utilized to test the performance of the proposed RMC-OOSSC algorithm, obtaining high clustering accuracies (overall accuracy) of 71.98% and 89.57%, respectively. The experimental results show that the proposed method clearly improves the clustering performance with respect to the other state-of-the-art clustering methods, and it significantly reduces the computational time.
Schramm, Catherine; Vial, Céline; Bachoud-Lévi, Anne-Catherine; Katsahian, Sandrine
2018-01-01
Heterogeneity in treatment efficacy is a major concern in clinical trials. Clustering may help to identify the treatment responders and the non-responders. In the context of longitudinal cluster analyses, sample size and variability of the times of measurements are the main issues with the current methods. Here, we propose a new two-step method for the Clustering of Longitudinal data by using an Extended Baseline. The first step relies on a piecewise linear mixed model for repeated measurements with a treatment-time interaction. The second step clusters the random predictions and considers several parametric (model-based) and non-parametric (partitioning, ascendant hierarchical clustering) algorithms. A simulation study compares all options of the clustering of longitudinal data by using an extended baseline method with the latent-class mixed model. The clustering of longitudinal data by using an extended baseline method with the two model-based algorithms was the more robust model. The clustering of longitudinal data by using an extended baseline method with all the non-parametric algorithms failed when there were unequal variances of treatment effect between clusters or when the subgroups had unbalanced sample sizes. The latent-class mixed model failed when the between-patients slope variability is high. Two real data sets on neurodegenerative disease and on obesity illustrate the clustering of longitudinal data by using an extended baseline method and show how clustering may help to identify the marker(s) of the treatment response. The application of the clustering of longitudinal data by using an extended baseline method in exploratory analysis as the first stage before setting up stratified designs can provide a better estimation of treatment effect in future clinical trials.
Automated modal parameter estimation using correlation analysis and bootstrap sampling
NASA Astrophysics Data System (ADS)
Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.
2018-02-01
The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh
2013-11-30
Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less
Stochastic fire-diffuse-fire model with realistic cluster dynamics.
Calabrese, Ana; Fraiman, Daniel; Zysman, Daniel; Ponce Dawson, Silvina
2010-09-01
Living organisms use waves that propagate through excitable media to transport information. Ca2+ waves are a paradigmatic example of this type of processes. A large hierarchy of Ca2+ signals that range from localized release events to global waves has been observed in Xenopus laevis oocytes. In these cells, Ca2+ release occurs trough inositol 1,4,5-trisphosphate receptors (IP3Rs) which are organized in clusters of channels located on the membrane of the endoplasmic reticulum. In this article we construct a stochastic model for a cluster of IP3R 's that replicates the experimental observations reported in [D. Fraiman, Biophys. J. 90, 3897 (2006)]. We then couple this phenomenological cluster model with a reaction-diffusion equation, so as to have a discrete stochastic model for calcium dynamics. The model we propose describes the transition regimes between isolated release and steadily propagating waves as the IP3 concentration is increased.
Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen
2017-09-25
In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.
Data mining with unsupervised clustering using photonic micro-ring resonators
NASA Astrophysics Data System (ADS)
McAulay, Alastair D.
2013-09-01
Data is commonly moved through optical fiber in modern data centers and may be stored optically. We propose an optical method of data mining for future data centers to enhance performance. For example, in clustering, a form of unsupervised learning, we propose that parameters corresponding to information in a database are converted from analog values to frequencies, as in the brain's neurons, where similar data will have close frequencies. We describe the Wilson-Cowan model for oscillating neurons. In optics we implement the frequencies with micro ring resonators. Due to the influence of weak coupling, a group of resonators will form clusters of similar frequencies that will indicate the desired parameters having close relations. Fewer clusters are formed as clustering proceeds, which allows the creation of a tree showing topics of importance and their relationships in the database. The tree can be used for instance to target advertising and for planning.
NASA Astrophysics Data System (ADS)
Nourani, Vahid; Andalib, Gholamreza; Dąbrowska, Dominika
2017-05-01
Accurate nitrate load predictions can elevate decision management of water quality of watersheds which affects to environment and drinking water. In this paper, two scenarios were considered for Multi-Station (MS) nitrate load modeling of the Little River watershed. In the first scenario, Markovian characteristics of streamflow-nitrate time series were proposed for the MS modeling. For this purpose, feature extraction criterion of Mutual Information (MI) was employed for input selection of artificial intelligence models (Feed Forward Neural Network, FFNN and least square support vector machine). In the second scenario for considering seasonality-based characteristics of the time series, wavelet transform was used to extract multi-scale features of streamflow-nitrate time series of the watershed's sub-basins to model MS nitrate loads. Self-Organizing Map (SOM) clustering technique which finds homogeneous sub-series clusters was also linked to MI for proper cluster agent choice to be imposed into the models for predicting the nitrate loads of the watershed's sub-basins. The proposed MS method not only considers the prediction of the outlet nitrate but also covers predictions of interior sub-basins nitrate load values. The results indicated that the proposed FFNN model coupled with the SOM-MI improved the performance of MS nitrate predictions compared to the Markovian-based models up to 39%. Overall, accurate selection of dominant inputs which consider seasonality-based characteristics of streamflow-nitrate process could enhance the efficiency of nitrate load predictions.
NASA Astrophysics Data System (ADS)
Zhang, Congyao; Yu, Qingjuan; Lu, Youjun
2018-03-01
The massive galaxy cluster “El Gordo” (ACT-CL J0102–4915) is a rare merging system with a high collision speed suggested by multi-wavelength observations and theoretical modeling. Zhang et al. propose two types of mergers, a nearly head-on merger and an off-axis merger with a large impact parameter, to reproduce most of the observational features of the cluster using numerical simulations. The different merger configurations of the two models result in different gas motion in the simulated clusters. In this paper, we predict the kinetic Sunyaev–Zel’dovich (kSZ) effect, the relativistic correction of the thermal Sunyaev–Zel’dovich (tSZ) effect, and the X-ray spectrum of this cluster, based on the two proposed models. We find that (1) the amplitudes of the kSZ effect resulting from the two models are both on the order of ΔT/T ∼ 10‑5 but their morphologies are different, which trace the different line-of-sight velocity distributions of the systems; (2) the relativistic correction of the tSZ effect around 240 GHz can be possibly used to constrain the temperature of the hot electrons heated by the shocks; and (3) the shift between the X-ray spectral lines emitted from different regions of the cluster can be significantly different in the two models. The shift and the line broadening can be up to ∼25 eV and 50 eV, respectively. We expect that future observations of the kSZ effect and the X-ray spectral lines (e.g., by ALMA, XARM) will provide a strong constraint on the gas motion and the merger configuration of ACT-CL J0102–4915.
Combining Mixture Components for Clustering*
Baudry, Jean-Patrick; Raftery, Adrian E.; Celeux, Gilles; Lo, Kenneth; Gottardo, Raphaël
2010-01-01
Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K. These clusterings can be compared on substantive grounds, and we also describe an automatic way of selecting the number of clusters via a piecewise linear regression fit to the rescaled entropy plot. We illustrate the method with simulated data and a flow cytometry dataset. Supplemental Materials are available on the journal Web site and described at the end of the paper. PMID:20953302
Nuclear Potential Clustering As a New Tool to Detect Patterns in High Dimensional Datasets
NASA Astrophysics Data System (ADS)
Tonkova, V.; Paulus, D.; Neeb, H.
2013-02-01
We present a new approach for the clustering of high dimensional data without prior assumptions about the structure of the underlying distribution. The proposed algorithm is based on a concept adapted from nuclear physics. To partition the data, we model the dynamic behaviour of nucleons interacting in an N-dimensional space. An adaptive nuclear potential, comprised of a short-range attractive (strong interaction) and a long-range repulsive term (Coulomb force) is assigned to each data point. By modelling the dynamics, nucleons that are densely distributed in space fuse to build nuclei (clusters) whereas single point clusters repel each other. The formation of clusters is completed when the system reaches the state of minimal potential energy. The data are then grouped according to the particles' final effective potential energy level. The performance of the algorithm is tested with several synthetic datasets showing that the proposed method can robustly identify clusters even when complex configurations are present. Furthermore, quantitative MRI data from 43 multiple sclerosis patients were analyzed, showing a reasonable splitting into subgroups according to the individual patients' disease grade. The good performance of the algorithm on such highly correlated non-spherical datasets, which are typical for MRI derived image features, shows that Nuclear Potential Clustering is a valuable tool for automated data analysis, not only in the MRI domain.
Density-Aware Clustering Based on Aggregated Heat Kernel and Its Transformation
Huang, Hao; Yoo, Shinjae; Yu, Dantong; ...
2015-06-01
Current spectral clustering algorithms suffer from the sensitivity to existing noise, and parameter scaling, and may not be aware of different density distributions across clusters. If these problems are left untreated, the consequent clustering results cannot accurately represent true data patterns, in particular, for complex real world datasets with heterogeneous densities. This paper aims to solve these problems by proposing a diffusion-based Aggregated Heat Kernel (AHK) to improve the clustering stability, and a Local Density Affinity Transformation (LDAT) to correct the bias originating from different cluster densities. AHK statistically\\ models the heat diffusion traces along the entire time scale, somore » it ensures robustness during clustering process, while LDAT probabilistically reveals local density of each instance and suppresses the local density bias in the affinity matrix. Our proposed framework integrates these two techniques systematically. As a result, not only does it provide an advanced noise-resisting and density-aware spectral mapping to the original dataset, but also demonstrates the stability during the processing of tuning the scaling parameter (which usually controls the range of neighborhood). Furthermore, our framework works well with the majority of similarity kernels, which ensures its applicability to many types of data and problem domains. The systematic experiments on different applications show that our proposed algorithms outperform state-of-the-art clustering algorithms for the data with heterogeneous density distributions, and achieve robust clustering performance with respect to tuning the scaling parameter and handling various levels and types of noise.« less
Machine learning approaches for estimation of prediction interval for the model output.
Shrestha, Durga L; Solomatine, Dimitri P
2006-03-01
A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.
Clustering Multivariate Time Series Using Hidden Markov Models
Ghassempour, Shima; Girosi, Federico; Maeder, Anthony
2014-01-01
In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers. PMID:24662996
Collaborative filtering recommendation model based on fuzzy clustering algorithm
NASA Astrophysics Data System (ADS)
Yang, Ye; Zhang, Yunhua
2018-05-01
As one of the most widely used algorithms in recommender systems, collaborative filtering algorithm faces two serious problems, which are the sparsity of data and poor recommendation effect in big data environment. In traditional clustering analysis, the object is strictly divided into several classes and the boundary of this division is very clear. However, for most objects in real life, there is no strict definition of their forms and attributes of their class. Concerning the problems above, this paper proposes to improve the traditional collaborative filtering model through the hybrid optimization of implicit semantic algorithm and fuzzy clustering algorithm, meanwhile, cooperating with collaborative filtering algorithm. In this paper, the fuzzy clustering algorithm is introduced to fuzzy clustering the information of project attribute, which makes the project belong to different project categories with different membership degrees, and increases the density of data, effectively reduces the sparsity of data, and solves the problem of low accuracy which is resulted from the inaccuracy of similarity calculation. Finally, this paper carries out empirical analysis on the MovieLens dataset, and compares it with the traditional user-based collaborative filtering algorithm. The proposed algorithm has greatly improved the recommendation accuracy.
Community detection using Kernel Spectral Clustering with memory
NASA Astrophysics Data System (ADS)
Langone, Rocco; Suykens, Johan A. K.
2013-02-01
This work is related to the problem of community detection in dynamic scenarios, which for instance arises in the segmentation of moving objects, clustering of telephone traffic data, time-series micro-array data etc. A desirable feature of a clustering model which has to capture the evolution of communities over time is the temporal smoothness between clusters in successive time-steps. In this way the model is able to track the long-term trend and in the same time it smooths out short-term variation due to noise. We use the Kernel Spectral Clustering with Memory effect (MKSC) which allows to predict cluster memberships of new nodes via out-of-sample extension and has a proper model selection scheme. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness as a valid prior knowledge. The latter, in fact, allows the model to cluster the current data well and to be consistent with the recent history. Here we propose a generalization of the MKSC model with an arbitrary memory, not only one time-step in the past. The experiments conducted on toy problems confirm our expectations: the more memory we add to the model, the smoother over time are the clustering results. We also compare with the Evolutionary Spectral Clustering (ESC) algorithm which is a state-of-the art method, and we obtain comparable or better results.
Applications of modern statistical methods to analysis of data in physical science
NASA Astrophysics Data System (ADS)
Wicker, James Eric
Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
NASA Astrophysics Data System (ADS)
Ban, Sang-Woo; Lee, Minho
2008-04-01
Knowledge-based clustering and autonomous mental development remains a high priority research topic, among which the learning techniques of neural networks are used to achieve optimal performance. In this paper, we present a new framework that can automatically generate a relevance map from sensory data that can represent knowledge regarding objects and infer new knowledge about novel objects. The proposed model is based on understating of the visual what pathway in our brain. A stereo saliency map model can selectively decide salient object areas by additionally considering local symmetry feature. The incremental object perception model makes clusters for the construction of an ontology map in the color and form domains in order to perceive an arbitrary object, which is implemented by the growing fuzzy topology adaptive resonant theory (GFTART) network. Log-polar transformed color and form features for a selected object are used as inputs of the GFTART. The clustered information is relevant to describe specific objects, and the proposed model can automatically infer an unknown object by using the learned information. Experimental results with real data have demonstrated the validity of this approach.
Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks.
Wang, Zhaowei; Zeng, Peng; Zhou, Mingtuo; Li, Dong; Wang, Jintao
2017-01-13
Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs), and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs' demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS) method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays.
Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks †
Wang, Zhaowei; Zeng, Peng; Zhou, Mingtuo; Li, Dong; Wang, Jintao
2017-01-01
Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs), and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs’ demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS) method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays. PMID:28098750
ODE, RDE and SDE models of cell cycle dynamics and clustering in yeast.
Boczko, Erik M; Gedeon, Tomas; Stowers, Chris C; Young, Todd R
2010-07-01
Biologists have long observed periodic-like oxygen consumption oscillations in yeast populations under certain conditions, and several unsatisfactory explanations for this phenomenon have been proposed. These ‘autonomous oscillations’ have often appeared with periods that are nearly integer divisors of the calculated doubling time of the culture. We hypothesize that these oscillations could be caused by a form of cell cycle synchronization that we call clustering. We develop some novel ordinary differential equation models of the cell cycle. For these models, and for random and stochastic perturbations, we give both rigorous proofs and simulations showing that both positive and negative growth rate feedback within the cell cycle are possible agents that can cause clustering of populations within the cell cycle. It occurs for a variety of models and for a broad selection of parameter values. These results suggest that the clustering phenomenon is robust and is likely to be observed in nature. Since there are necessarily an integer number of clusters, clustering would lead to periodic-like behaviour with periods that are nearly integer divisors of the period of the cell cycle. Related experiments have shown conclusively that cell cycle clustering occurs in some oscillating yeast cultures.
The Azotobacter vinelandii NifEN complex contains two identical [4Fe-4S] clusters.
Goodwin, P J; Agar, J N; Roll, J T; Roberts, G P; Johnson, M K; Dean, D R
1998-07-21
The nifE and nifN gene products from Azotobacter vinelandii form an alpha2beta2 tetramer (NifEN complex) that is required for the biosynthesis of the nitrogenase FeMo cofactor. In the current model for NifEN complex organization and function, the complex is structurally analogous to the nitrogenase MoFe protein and provides an assembly site for a portion of FeMo cofactor biosynthesis. In this work, gene fusion and immobilized metal-affinity chromatography strategies were used to elevate the in vivo production of the NifEN complex and to facilitate its rapid and efficient purification. The NifEN complex produced and purified in this way exhibits an FeMo cofactor biosynthetic activity similar to that previously described for the NifEN complex purified by traditional chromatography methods. UV-visible, EPR, variable-temperature magnetic circular dichroism, and resonance Raman spectroscopies were used to show that the NifEN complex contains two identical [4Fe-4S]2+ clusters. These clusters have a predominantly S = 1/2 ground state in the reduced form, exhibit a reduction potential of -350 mV, and are likely to be coordinated entirely by cysteinyl residues on the basis of spectroscopic properties and sequence comparisons. A model is proposed where each NifEN complex [4Fe-4S] cluster is bridged between a NifE-NifN subunit interface at a position analogous to that occupied by the P clusters in the nitrogenase MoFe protein. In contrast to the MoFe protein P clusters, the NifEN complex [4Fe-4S] clusters are proposed to be asymmetrically coordinated to the NifEN complex where NifE cysteines-37, -62, and -124 and NifN cysteine-44 are the coordinating ligands. On the basis of a homology model of the three-dimensional structure of the NifEN complex, the [4Fe-4S] cluster sites are likely to be remote from the proposed FeMo cofactor assembly site and are unlikely to become incorporated into the FeMo cofactor during its assembly.
Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan
2015-06-01
Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Study of cluster behavior in the riser of CFB by the DSMC method
NASA Astrophysics Data System (ADS)
Liu, H. P.; Liu, D. Y.; Liu, H.
2010-03-01
The flow behaviors of clusters in the riser of a two-dimensional (2D) circulating fluidized bed was numerically studied based on the Euler-Lagrangian approach. Gas turbulence was modeled by means of Large Eddy Simulation (LES). Particle collision was modeled by means of the direct simulation Monte Carlo (DSMC) method. Clusters' hydrodynamic characteristics are obtained using a cluster identification method proposed by sharrma et al. (2000). The descending clusters near the wall region and the up- and down-flowing clusters in the core were studied separately due to their different flow behaviors. The effects of superficial gas velocity on the cluster behavior were analyzed. Simulated results showed that near wall clusters flow downward and the descent velocity is about -45 cm/s. The occurrence frequency of the up-flowing cluster is higher than that of down-flowing cluster in the core of riser. With the increase of superficial gas velocity, the solid concentration and occurrence frequency of clusters decrease, while the cluster axial velocity increase. Simulated results were in agreement with experimental data. The stochastic method used in present paper is feasible for predicting the cluster flow behavior in CFBs.
Monthly streamflow forecasting with auto-regressive integrated moving average
NASA Astrophysics Data System (ADS)
Nasir, Najah; Samsudin, Ruhaidah; Shabri, Ani
2017-09-01
Forecasting of streamflow is one of the many ways that can contribute to better decision making for water resource management. The auto-regressive integrated moving average (ARIMA) model was selected in this research for monthly streamflow forecasting with enhancement made by pre-processing the data using singular spectrum analysis (SSA). This study also proposed an extension of the SSA technique to include a step where clustering was performed on the eigenvector pairs before reconstruction of the time series. The monthly streamflow data of Sungai Muda at Jeniang, Sungai Muda at Jambatan Syed Omar and Sungai Ketil at Kuala Pegang was gathered from the Department of Irrigation and Drainage Malaysia. A ratio of 9:1 was used to divide the data into training and testing sets. The ARIMA, SSA-ARIMA and Clustered SSA-ARIMA models were all developed in R software. Results from the proposed model are then compared to a conventional auto-regressive integrated moving average model using the root-mean-square error and mean absolute error values. It was found that the proposed model can outperform the conventional model.
Galaxy evolution in the densest environments: HST imaging
NASA Astrophysics Data System (ADS)
Jorgensen, Inger
2013-10-01
We propose to process in a consistent fashion all available HST/ACS and WFC3 imaging of seven rich clusters of galaxies at z=1.2-1.6. The clusters are part of our larger project aimed at constraining models for galaxy evolution in dense environments from observations of stellar populations in rich z=1.2-2 galaxy clusters. The main objective is to establish the star formation {SF} history and structural evolution over this epoch during which large changes in SF rates and galaxy structure are expected to take place in cluster galaxies.The observational data required to meet our main objective are deep HST imaging and high S/N spectroscopy of individual cluster members. The HST imaging already exists for the seven rich clusters at z=1.2-1.6 included in this archive proposal. However, the data have not been consistently processed to derive colors, magnitudes, sizes and morphological parameters for all potential cluster members bright enough to be suitable for spectroscopic observations with 8-m class telescopes. We propose to carry out this processing and make all derived parameters publicly available. We will use the parameters derived from the HST imaging to {1} study the structural evolution of the galaxies, {2} select clusters and galaxies for spectroscopic observations, and {3} use the photometry and spectroscopy together for a unified analysis aimed at the SF history and structural changes. The analysis will also utilize data from the Gemini/HST Cluster Galaxy Project, which covers rich clusters at z=0.2-1.0 and for which we have similar HST imaging and high S/N spectroscopy available.
Multivalent Cation-Bridged PI(4,5)P2 Clusters Form at Very Low Concentrations.
Wen, Yi; Vogt, Volker M; Feigenson, Gerald W
2018-06-05
Phosphatidylinositol 4,5-bisphosphate (PI(4,5)P 2 or PIP2), is a key component of the inner leaflet of the plasma membrane in eukaryotic cells. In model membranes, PIP2 has been reported to form clusters, but whether these locally different conditions could give rise to distinct pools of unclustered and clustered PIP2 is unclear. By use of both fluorescence self-quenching and Förster resonance energy transfer assays, we have discovered that PIP2 self-associates at remarkably low concentrations starting below 0.05 mol% of total lipids. Formation of these clusters was dependent on physiological divalent metal ions, such as Ca 2+ , Mg 2+ , Zn 2+ , or trivalent ions Fe 3+ and Al 3+ . Formation of PIP2 clusters was also headgroup-specific, being largely independent of the type of acyl chain. The similarly labeled phospholipids phosphatidylcholine, phosphatidylethanolamine, phosphatidylserine, and phosphatidylinositol exhibited no such clustering. However, six phosphoinositide species coclustered with PIP2. The degree of PIP2 cation clustering was significantly influenced by the composition of the surrounding lipids, with cholesterol and phosphatidylinositol enhancing this behavior. We propose that PIP2 cation-bridged cluster formation, which might be similar to micelle formation, can be used as a physical model for what could be distinct pools of PIP2 in biological membranes. To our knowledge, this study provides the first evidence of PIP2 forming clusters at such low concentrations. The property of PIP2 to form such clusters at such extremely low concentrations in model membranes reveals, to our knowledge, a new behavior of PIP2 proposed to occur in cells, in which local multivalent metal ions, lipid compositions, and various binding proteins could greatly influence PIP2 properties. In turn, these different pools of PIP2 could further regulate cellular events. Copyright © 2018 Biophysical Society. Published by Elsevier Inc. All rights reserved.
The Chandra Strong Lens Sample: Revealing Baryonic Physics In Strong Lensing Selected Clusters
NASA Astrophysics Data System (ADS)
Bayliss, Matthew
2017-08-01
We propose for Chandra imaging of the hot intra-cluster gas in a unique new sample of 29 galaxy clusters selected purely on their strong gravitational lensing signatures. This will be the first program targeting a purely strong lensing selected cluster sample, enabling new comparisons between the ICM properties and scaling relations of strong lensing and mass/ICM selected cluster samples. Chandra imaging, combined with high precision strong lens models, ensures powerful constraints on the distribution and state of matter in the cluster cores. This represents a novel angle from which we can address the role played by baryonic physics |*| the infamous |*|gastrophysics|*| in shaping the cores of massive clusters, and opens up an exciting new galaxy cluster discovery space with Chandra.
The Chandra Strong Lens Sample: Revealing Baryonic Physics In Strong Lensing Selected Clusters
NASA Astrophysics Data System (ADS)
Bayliss, Matthew
2017-09-01
We propose for Chandra imaging of the hot intra-cluster gas in a unique new sample of 29 galaxy clusters selected purely on their strong gravitational lensing signatures. This will be the first program targeting a purely strong lensing selected cluster sample, enabling new comparisons between the ICM properties and scaling relations of strong lensing and mass/ICM selected cluster samples. Chandra imaging, combined with high precision strong lens models, ensures powerful constraints on the distribution and state of matter in the cluster cores. This represents a novel angle from which we can address the role played by baryonic physics -- the infamous ``gastrophysics''-- in shaping the cores of massive clusters, and opens up an exciting new galaxy cluster discovery space with Chandra.
Cosmology from galaxy clusters as observed by Planck
NASA Astrophysics Data System (ADS)
Pierpaoli, Elena
We propose to use current all-sky data on galaxy clusters in the radio/infrared bands in order to constrain cosmology. This will be achieved performing parameter estimation with number counts and power spectra for galaxy clusters detected by Planck through their Sunyaev—Zeldovich signature. The ultimate goal of this proposal is to use clusters as tracers of matter density in order to provide information about fundamental properties of our Universe, such as the law of gravity on large scale, early Universe phenomena, structure formation and the nature of dark matter and dark energy. We will leverage on the availability of a larger and deeper cluster catalog from the latest Planck data release in order to include, for the first time, the cluster power spectrum in the cosmological parameter determination analysis. Furthermore, we will extend clusters' analysis to cosmological models not yet investigated by the Planck collaboration. These aims require a diverse set of activities, ranging from the characterization of the clusters' selection function, the choice of the cosmological cluster sample to be used for parameter estimation, the construction of mock samples in the various cosmological models with correct correlation properties in order to produce reliable selection functions and noise covariance matrices, and finally the construction of the appropriate likelihood for number counts and power spectra. We plan to make the final code available to the community and compatible with the most widely used cosmological parameter estimation code. This research makes use of data from the NASA satellites Planck and, less directly, Chandra, in order to constrain cosmology; and therefore perfectly fits the NASA objectives and the specifications of this solicitation.
Decentralized cooperative TOA/AOA target tracking for hierarchical wireless sensor networks.
Chen, Ying-Chih; Wen, Chih-Yu
2012-11-08
This paper proposes a distributed method for cooperative target tracking in hierarchical wireless sensor networks. The concept of leader-based information processing is conducted to achieve object positioning, considering a cluster-based network topology. Random timers and local information are applied to adaptively select a sub-cluster for the localization task. The proposed energy-efficient tracking algorithm allows each sub-cluster member to locally estimate the target position with a Bayesian filtering framework and a neural networking model, and further performs estimation fusion in the leader node with the covariance intersection algorithm. This paper evaluates the merits and trade-offs of the protocol design towards developing more efficient and practical algorithms for object position estimation.
Model-based Clustering of Categorical Time Series with Multinomial Logit Classification
NASA Astrophysics Data System (ADS)
Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea
2010-09-01
A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.
Design of double fuzzy clustering-driven context neural networks.
Kim, Eun-Hu; Oh, Sung-Kwun; Pedrycz, Witold
2018-08-01
In this study, we introduce a novel category of double fuzzy clustering-driven context neural networks (DFCCNNs). The study is focused on the development of advanced design methodologies for redesigning the structure of conventional fuzzy clustering-based neural networks. The conventional fuzzy clustering-based neural networks typically focus on dividing the input space into several local spaces (implied by clusters). In contrast, the proposed DFCCNNs take into account two distinct local spaces called context and cluster spaces, respectively. Cluster space refers to the local space positioned in the input space whereas context space concerns a local space formed in the output space. Through partitioning the output space into several local spaces, each context space is used as the desired (target) local output to construct local models. To complete this, the proposed network includes a new context layer for reasoning about context space in the output space. In this sense, Fuzzy C-Means (FCM) clustering is useful to form local spaces in both input and output spaces. The first one is used in order to form clusters and train weights positioned between the input and hidden layer, whereas the other one is applied to the output space to form context spaces. The key features of the proposed DFCCNNs can be enumerated as follows: (i) the parameters between the input layer and hidden layer are built through FCM clustering. The connections (weights) are specified as constant terms being in fact the centers of the clusters. The membership functions (represented through the partition matrix) produced by the FCM are used as activation functions located at the hidden layer of the "conventional" neural networks. (ii) Following the hidden layer, a context layer is formed to approximate the context space of the output variable and each node in context layer means individual local model. The outputs of the context layer are specified as a combination of both weights formed as linear function and the outputs of the hidden layer. The weights are updated using the least square estimation (LSE)-based method. (iii) At the output layer, the outputs of context layer are decoded to produce the corresponding numeric output. At this time, the weighted average is used and the weights are also adjusted with the use of the LSE scheme. From the viewpoint of performance improvement, the proposed design methodologies are discussed and experimented with the aid of benchmark machine learning datasets. Through the experiments, it is shown that the generalization abilities of the proposed DFCCNNs are better than those of the conventional FCNNs reported in the literature. Copyright © 2018 Elsevier Ltd. All rights reserved.
Lavi, Yael; Gov, Nir; Edidin, Michael; Gheber, Levi A.
2012-01-01
Lateral heterogeneity of cell membranes has been demonstrated in numerous studies showing anomalous diffusion of membrane proteins; it has been explained by models and experiments suggesting dynamic barriers to free diffusion, that temporarily confine membrane proteins into microscopic patches. This picture, however, comes short of explaining a steady-state patchy distribution of proteins, in face of the transient opening of the barriers. In our previous work we directly imaged persistent clusters of MHC-I, a type I transmembrane protein, and proposed a model of a dynamic equilibrium between proteins newly delivered to the cell surface by vesicle traffic, temporary confinement by dynamic barriers to lateral diffusion, and dispersion of the clusters by diffusion over the dynamic barriers. Our model predicted that the clusters are dynamic, appearing when an exocytic vesicle fuses with the plasma membrane and dispersing with a typical lifetime that depends on lateral diffusion and the dynamics of barriers. In a subsequent work, we showed this to be the case. Here we test another prediction of the model, and show that changing the stability of actin barriers to lateral diffusion changes cluster lifetimes. We also develop a model for the distribution of cluster lifetimes, consistent with the function of barriers to lateral diffusion in maintaining MHC-I clusters. PMID:22500754
Implicit Regularization for Reconstructing 3D Building Rooftop Models Using Airborne LiDAR Data
Jung, Jaewook; Jwa, Yoonseok; Sohn, Gunho
2017-01-01
With rapid urbanization, highly accurate and semantically rich virtualization of building assets in 3D become more critical for supporting various applications, including urban planning, emergency response and location-based services. Many research efforts have been conducted to automatically reconstruct building models at city-scale from remotely sensed data. However, developing a fully-automated photogrammetric computer vision system enabling the massive generation of highly accurate building models still remains a challenging task. One the most challenging task for 3D building model reconstruction is to regularize the noises introduced in the boundary of building object retrieved from a raw data with lack of knowledge on its true shape. This paper proposes a data-driven modeling approach to reconstruct 3D rooftop models at city-scale from airborne laser scanning (ALS) data. The focus of the proposed method is to implicitly derive the shape regularity of 3D building rooftops from given noisy information of building boundary in a progressive manner. This study covers a full chain of 3D building modeling from low level processing to realistic 3D building rooftop modeling. In the element clustering step, building-labeled point clouds are clustered into homogeneous groups by applying height similarity and plane similarity. Based on segmented clusters, linear modeling cues including outer boundaries, intersection lines, and step lines are extracted. Topology elements among the modeling cues are recovered by the Binary Space Partitioning (BSP) technique. The regularity of the building rooftop model is achieved by an implicit regularization process in the framework of Minimum Description Length (MDL) combined with Hypothesize and Test (HAT). The parameters governing the MDL optimization are automatically estimated based on Min-Max optimization and Entropy-based weighting method. The performance of the proposed method is tested over the International Society for Photogrammetry and Remote Sensing (ISPRS) benchmark datasets. The results show that the proposed method can robustly produce accurate regularized 3D building rooftop models. PMID:28335486
Implicit Regularization for Reconstructing 3D Building Rooftop Models Using Airborne LiDAR Data.
Jung, Jaewook; Jwa, Yoonseok; Sohn, Gunho
2017-03-19
With rapid urbanization, highly accurate and semantically rich virtualization of building assets in 3D become more critical for supporting various applications, including urban planning, emergency response and location-based services. Many research efforts have been conducted to automatically reconstruct building models at city-scale from remotely sensed data. However, developing a fully-automated photogrammetric computer vision system enabling the massive generation of highly accurate building models still remains a challenging task. One the most challenging task for 3D building model reconstruction is to regularize the noises introduced in the boundary of building object retrieved from a raw data with lack of knowledge on its true shape. This paper proposes a data-driven modeling approach to reconstruct 3D rooftop models at city-scale from airborne laser scanning (ALS) data. The focus of the proposed method is to implicitly derive the shape regularity of 3D building rooftops from given noisy information of building boundary in a progressive manner. This study covers a full chain of 3D building modeling from low level processing to realistic 3D building rooftop modeling. In the element clustering step, building-labeled point clouds are clustered into homogeneous groups by applying height similarity and plane similarity. Based on segmented clusters, linear modeling cues including outer boundaries, intersection lines, and step lines are extracted. Topology elements among the modeling cues are recovered by the Binary Space Partitioning (BSP) technique. The regularity of the building rooftop model is achieved by an implicit regularization process in the framework of Minimum Description Length (MDL) combined with Hypothesize and Test (HAT). The parameters governing the MDL optimization are automatically estimated based on Min-Max optimization and Entropy-based weighting method. The performance of the proposed method is tested over the International Society for Photogrammetry and Remote Sensing (ISPRS) benchmark datasets. The results show that the proposed method can robustly produce accurate regularized 3D building rooftop models.
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
NASA Astrophysics Data System (ADS)
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Di Costanzo, Ezio; Giacomello, Alessandro; Messina, Elisa; Natalini, Roberto; Pontrelli, Giuseppe; Rossi, Fabrizio; Smits, Robert; Twarogowska, Monika
2018-03-14
We propose a discrete in continuous mathematical model describing the in vitro growth process of biophsy-derived mammalian cardiac progenitor cells growing as clusters in the form of spheres (Cardiospheres). The approach is hybrid: discrete at cellular scale and continuous at molecular level. In the present model, cells are subject to the self-organizing collective dynamics mechanism and, additionally, they can proliferate and differentiate, also depending on stochastic processes. The two latter processes are triggered and regulated by chemical signals present in the environment. Numerical simulations show the structure and the development of the clustered progenitors and are in a good agreement with the results obtained from in vitro experiments.
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-08-13
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
Analytical network process based optimum cluster head selection in wireless sensor network.
Farman, Haleem; Javed, Huma; Jan, Bilal; Ahmad, Jamil; Ali, Shaukat; Khalil, Falak Naz; Khan, Murad
2017-01-01
Wireless Sensor Networks (WSNs) are becoming ubiquitous in everyday life due to their applications in weather forecasting, surveillance, implantable sensors for health monitoring and other plethora of applications. WSN is equipped with hundreds and thousands of small sensor nodes. As the size of a sensor node decreases, critical issues such as limited energy, computation time and limited memory become even more highlighted. In such a case, network lifetime mainly depends on efficient use of available resources. Organizing nearby nodes into clusters make it convenient to efficiently manage each cluster as well as the overall network. In this paper, we extend our previous work of grid-based hybrid network deployment approach, in which merge and split technique has been proposed to construct network topology. Constructing topology through our proposed technique, in this paper we have used analytical network process (ANP) model for cluster head selection in WSN. Five distinct parameters: distance from nodes (DistNode), residual energy level (REL), distance from centroid (DistCent), number of times the node has been selected as cluster head (TCH) and merged node (MN) are considered for CH selection. The problem of CH selection based on these parameters is tackled as a multi criteria decision system, for which ANP method is used for optimum cluster head selection. Main contribution of this work is to check the applicability of ANP model for cluster head selection in WSN. In addition, sensitivity analysis is carried out to check the stability of alternatives (available candidate nodes) and their ranking for different scenarios. The simulation results show that the proposed method outperforms existing energy efficient clustering protocols in terms of optimum CH selection and minimizing CH reselection process that results in extending overall network lifetime. This paper analyzes that ANP method used for CH selection with better understanding of the dependencies of different components involved in the evaluation process.
Analytical network process based optimum cluster head selection in wireless sensor network
Javed, Huma; Jan, Bilal; Ahmad, Jamil; Ali, Shaukat; Khalil, Falak Naz; Khan, Murad
2017-01-01
Wireless Sensor Networks (WSNs) are becoming ubiquitous in everyday life due to their applications in weather forecasting, surveillance, implantable sensors for health monitoring and other plethora of applications. WSN is equipped with hundreds and thousands of small sensor nodes. As the size of a sensor node decreases, critical issues such as limited energy, computation time and limited memory become even more highlighted. In such a case, network lifetime mainly depends on efficient use of available resources. Organizing nearby nodes into clusters make it convenient to efficiently manage each cluster as well as the overall network. In this paper, we extend our previous work of grid-based hybrid network deployment approach, in which merge and split technique has been proposed to construct network topology. Constructing topology through our proposed technique, in this paper we have used analytical network process (ANP) model for cluster head selection in WSN. Five distinct parameters: distance from nodes (DistNode), residual energy level (REL), distance from centroid (DistCent), number of times the node has been selected as cluster head (TCH) and merged node (MN) are considered for CH selection. The problem of CH selection based on these parameters is tackled as a multi criteria decision system, for which ANP method is used for optimum cluster head selection. Main contribution of this work is to check the applicability of ANP model for cluster head selection in WSN. In addition, sensitivity analysis is carried out to check the stability of alternatives (available candidate nodes) and their ranking for different scenarios. The simulation results show that the proposed method outperforms existing energy efficient clustering protocols in terms of optimum CH selection and minimizing CH reselection process that results in extending overall network lifetime. This paper analyzes that ANP method used for CH selection with better understanding of the dependencies of different components involved in the evaluation process. PMID:28719616
Competing risks regression for clustered data
Zhou, Bingqing; Fine, Jason; Latouche, Aurelien; Labopin, Myriam
2012-01-01
A population average regression model is proposed to assess the marginal effects of covariates on the cumulative incidence function when there is dependence across individuals within a cluster in the competing risks setting. This method extends the Fine–Gray proportional hazards model for the subdistribution to situations, where individuals within a cluster may be correlated due to unobserved shared factors. Estimators of the regression parameters in the marginal model are developed under an independence working assumption where the correlation across individuals within a cluster is completely unspecified. The estimators are consistent and asymptotically normal, and variance estimation may be achieved without specifying the form of the dependence across individuals. A simulation study evidences that the inferential procedures perform well with realistic sample sizes. The practical utility of the methods is illustrated with data from the European Bone Marrow Transplant Registry. PMID:22045910
Efficient view based 3-D object retrieval using Hidden Markov Model
NASA Astrophysics Data System (ADS)
Jain, Yogendra Kumar; Singh, Roshan Kumar
2013-12-01
Recent research effort has been dedicated to view based 3-D object retrieval, because of highly discriminative property of 3-D object and has multi view representation. The state-of-art method is highly depending on their own camera array setting for capturing views of 3-D object and use complex Zernike descriptor, HAC for representative view selection which limit their practical application and make it inefficient for retrieval. Therefore, an efficient and effective algorithm is required for 3-D Object Retrieval. In order to move toward a general framework for efficient 3-D object retrieval which is independent of camera array setting and avoidance of representative view selection, we propose an Efficient View Based 3-D Object Retrieval (EVBOR) method using Hidden Markov Model (HMM). In this framework, each object is represented by independent set of view, which means views are captured from any direction without any camera array restriction. In this, views are clustered (including query view) to generate the view cluster, which is then used to build the query model with HMM. In our proposed method, HMM is used in twofold: in the training (i.e. HMM estimate) and in the retrieval (i.e. HMM decode). The query model is trained by using these view clusters. The EVBOR query model is worked on the basis of query model combining with HMM. The proposed approach remove statically camera array setting for view capturing and can be apply for any 3-D object database to retrieve 3-D object efficiently and effectively. Experimental results demonstrate that the proposed scheme has shown better performance than existing methods. [Figure not available: see fulltext.
Conditions for the Evolution of Gene Clusters in Bacterial Genomes
Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.
2010-01-01
Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992
From virtual clustering analysis to self-consistent clustering analysis: a mathematical study
NASA Astrophysics Data System (ADS)
Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam
2018-03-01
In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.
Practical 3-D Beam Pattern Based Channel Modeling for Multi-Polarized Massive MIMO Systems.
Aghaeinezhadfirouzja, Saeid; Liu, Hui; Balador, Ali
2018-04-12
In this paper, a practical non-stationary three-dimensional (3-D) channel models for massive multiple-input multiple-output (MIMO) systems, considering beam patterns for different antenna elements, is proposed. The beam patterns using dipole antenna elements with different phase excitation toward the different direction of travels (DoTs) contributes various correlation weights for rays related towards/from the cluster, thus providing different elevation angle of arrivals (EAoAs) and elevation angle of departures (EAoDs) for each antenna element. These include the movements of the user that makes our channel to be a non-stationary model of clusters at the receiver (RX) on both the time and array axes. In addition, their impacts on 3-D massive MIMO channels are investigated via statistical properties including received spatial correlation. Additionally, the impact of elevation/azimuth angles of arrival on received spatial correlation is discussed. Furthermore, experimental validation of the proposed 3-D channel models on azimuth and elevation angles of the polarized antenna are specifically evaluated and compared through simulations. The proposed 3-D generic models are verified using relevant measurement data.
Practical 3-D Beam Pattern Based Channel Modeling for Multi-Polarized Massive MIMO Systems †
Aghaeinezhadfirouzja, Saeid; Liu, Hui
2018-01-01
In this paper, a practical non-stationary three-dimensional (3-D) channel models for massive multiple-input multiple-output (MIMO) systems, considering beam patterns for different antenna elements, is proposed. The beam patterns using dipole antenna elements with different phase excitation toward the different direction of travels (DoTs) contributes various correlation weights for rays related towards/from the cluster, thus providing different elevation angle of arrivals (EAoAs) and elevation angle of departures (EAoDs) for each antenna element. These include the movements of the user that makes our channel to be a non-stationary model of clusters at the receiver (RX) on both the time and array axes. In addition, their impacts on 3-D massive MIMO channels are investigated via statistical properties including received spatial correlation. Additionally, the impact of elevation/azimuth angles of arrival on received spatial correlation is discussed. Furthermore, experimental validation of the proposed 3-D channel models on azimuth and elevation angles of the polarized antenna are specifically evaluated and compared through simulations. The proposed 3-D generic models are verified using relevant measurement data. PMID:29649177
Collaborative Filtering Based on Sequential Extraction of User-Item Clusters
NASA Astrophysics Data System (ADS)
Honda, Katsuhiro; Notsu, Akira; Ichihashi, Hidetomo
Collaborative filtering is a computational realization of “word-of-mouth” in network community, in which the items prefered by “neighbors” are recommended. This paper proposes a new item-selection model for extracting user-item clusters from rectangular relation matrices, in which mutual relations between users and items are denoted in an alternative process of “liking or not”. A technique for sequential co-cluster extraction from rectangular relational data is given by combining the structural balancing-based user-item clustering method with sequential fuzzy cluster extraction appraoch. Then, the tecunique is applied to the collaborative filtering problem, in which some items may be shared by several user clusters.
Improved community model for social networks based on social mobility
NASA Astrophysics Data System (ADS)
Lu, Zhe-Ming; Wu, Zhen; Luo, Hao; Wang, Hao-Xian
2015-07-01
This paper proposes an improved community model for social networks based on social mobility. The relationship between the group distribution and the community size is investigated in terms of communication rate and turnover rate. The degree distributions, clustering coefficients, average distances and diameters of networks are analyzed. Experimental results demonstrate that the proposed model possesses the small-world property and can reproduce social networks effectively and efficiently.
A Bayesian, generalized frailty model for comet assays.
Ghebretinsae, Aklilu Habteab; Faes, Christel; Molenberghs, Geert; De Boeck, Marlies; Geys, Helena
2013-05-01
This paper proposes a flexible modeling approach for so-called comet assay data regularly encountered in preclinical research. While such data consist of non-Gaussian outcomes in a multilevel hierarchical structure, traditional analyses typically completely or partly ignore this hierarchical nature by summarizing measurements within a cluster. Non-Gaussian outcomes are often modeled using exponential family models. This is true not only for binary and count data, but also for, example, time-to-event outcomes. Two important reasons for extending this family are for (1) the possible occurrence of overdispersion, meaning that the variability in the data may not be adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of a hierarchical structure in the data, owing to clustering in the data. The first issue is dealt with through so-called overdispersion models. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. In the case of time-to-event data, one encounters, for example, the gamma frailty model (Duchateau and Janssen, 2007 ). While both of these issues may occur simultaneously, models combining both are uncommon. Molenberghs et al. ( 2010 ) proposed a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. Here, we use this method to model data from a comet assay with a three-level hierarchical structure. Although a conjugate gamma random effect is used for the overdispersion random effect, both gamma and normal random effects are considered for the hierarchical random effect. Apart from model formulation, we place emphasis on Bayesian estimation. Our proposed method has an upper hand over the traditional analysis in that it (1) uses the appropriate distribution stipulated in the literature; (2) deals with the complete hierarchical nature; and (3) uses all information instead of summary measures. The fit of the model to the comet assay is compared against the background of more conventional model fits. Results indicate the toxicity of 1,2-dimethylhydrazine dihydrochloride at different dose levels (low, medium, and high).
IoT Service Clustering for Dynamic Service Matchmaking.
Zhao, Shuai; Yu, Le; Cheng, Bo; Chen, Junliang
2017-07-27
As the adoption of service-oriented paradigms in the IoT (Internet of Things) environment, real-world devices will open their capabilities through service interfaces, which enable other functional entities to interact with them. In an IoT application, it is indispensable to find suitable services for satisfying users' requirements or replacing the unavailable services. However, from the perspective of performance, it is inappropriate to find desired services from the service repository online directly. Instead, clustering services offline according to their similarity and matchmaking or discovering service online in limited clusters is necessary. This paper proposes a multidimensional model-based approach to measure the similarity between IoT services. Then, density-peaks-based clustering is employed to gather similar services together according to the result of similarity measurement. Based on the service clustering, the algorithms of dynamic service matchmaking, discovery, and replacement will be performed efficiently. Evaluating experiments are conducted to validate the performance of proposed approaches, and the results are promising.
IoT Service Clustering for Dynamic Service Matchmaking
Yu, Le; Cheng, Bo; Chen, Junliang
2017-01-01
As the adoption of service-oriented paradigms in the IoT (Internet of Things) environment, real-world devices will open their capabilities through service interfaces, which enable other functional entities to interact with them. In an IoT application, it is indispensable to find suitable services for satisfying users’ requirements or replacing the unavailable services. However, from the perspective of performance, it is inappropriate to find desired services from the service repository online directly. Instead, clustering services offline according to their similarity and matchmaking or discovering service online in limited clusters is necessary. This paper proposes a multidimensional model-based approach to measure the similarity between IoT services. Then, density-peaks-based clustering is employed to gather similar services together according to the result of similarity measurement. Based on the service clustering, the algorithms of dynamic service matchmaking, discovery, and replacement will be performed efficiently. Evaluating experiments are conducted to validate the performance of proposed approaches, and the results are promising. PMID:28749431
NASA Astrophysics Data System (ADS)
Zaichik, Leonid I.; Alipchenkov, Vladimir M.
2009-10-01
The purpose of this paper is twofold: (i) to advance and extend the statistical two-point models of pair dispersion and particle clustering in isotropic turbulence that were previously proposed by Zaichik and Alipchenkov (2003 Phys. Fluids15 1776-87 2007 Phys. Fluids 19, 113308) and (ii) to present some applications of these models. The models developed are based on a kinetic equation for the two-point probability density function of the relative velocity distribution of two particles. These models predict the pair relative velocity statistics and the preferential accumulation of heavy particles in stationary and decaying homogeneous isotropic turbulent flows. Moreover, the models are applied to predict the effect of particle clustering on turbulent collisions, sedimentation and intensity of microwave radiation as well as to calculate the mean filtered subgrid stress of the particulate phase. Model predictions are compared with direct numerical simulations and experimental measurements.
A Bimodal Hybrid Model for Time-Dependent Probabilistic Seismic Hazard Analysis
NASA Astrophysics Data System (ADS)
Yaghmaei-Sabegh, Saman; Shoaeifar, Nasser; Shoaeifar, Parva
2018-03-01
The evaluation of evidence provided by geological studies and historical catalogs indicates that in some seismic regions and faults, multiple large earthquakes occur in cluster. Then, the occurrences of large earthquakes confront with quiescence and only the small-to-moderate earthquakes take place. Clustering of large earthquakes is the most distinguishable departure from the assumption of constant hazard of random occurrence of earthquakes in conventional seismic hazard analysis. In the present study, a time-dependent recurrence model is proposed to consider a series of large earthquakes that occurs in clusters. The model is flexible enough to better reflect the quasi-periodic behavior of large earthquakes with long-term clustering, which can be used in time-dependent probabilistic seismic hazard analysis with engineering purposes. In this model, the time-dependent hazard results are estimated by a hazard function which comprises three parts. A decreasing hazard of last large earthquake cluster and an increasing hazard of the next large earthquake cluster, along with a constant hazard of random occurrence of small-to-moderate earthquakes. In the final part of the paper, the time-dependent seismic hazard of the New Madrid Seismic Zone at different time intervals has been calculated for illustrative purpose.
Impact-parameter dependence of the energy loss of fast molecular clusters in hydrogen
NASA Astrophysics Data System (ADS)
Fadanelli, R. C.; Grande, P. L.; Schiwietz, G.
2008-03-01
The electronic energy loss of molecular clusters as a function of impact parameter is far less understood than atomic energy losses. For instance, there are no analytical expressions for the energy loss as a function of impact parameter for cluster ions. In this work, we describe two procedures to evaluate the combined energy loss of molecules: Ab initio calculations within the semiclassical approximation and the coupled-channels method using atomic orbitals; and simplified models for the electronic cluster energy loss as a function of the impact parameter, namely the molecular perturbative convolution approximation (MPCA, an extension of the corresponding atomic model PCA) and the molecular unitary convolution approximation (MUCA, a molecular extension of the previous unitary convolution approximation UCA). In this work, an improved ansatz for MPCA is proposed, extending its validity for very compact clusters. For the simplified models, the physical inputs are the oscillators strengths of the target atoms and the target-electron density. The results from these models applied to an atomic hydrogen target yield remarkable agreement with their corresponding ab initio counterparts for different angles between cluster axis and velocity direction at specific energies of 150 and 300 keV/u.
Ng, Edmond S-W; Diaz-Ordaz, Karla; Grieve, Richard; Nixon, Richard M; Thompson, Simon G; Carpenter, James R
2016-10-01
Multilevel models provide a flexible modelling framework for cost-effectiveness analyses that use cluster randomised trial data. However, there is a lack of guidance on how to choose the most appropriate multilevel models. This paper illustrates an approach for deciding what level of model complexity is warranted; in particular how best to accommodate complex variance-covariance structures, right-skewed costs and missing data. Our proposed models differ according to whether or not they allow individual-level variances and correlations to differ across treatment arms or clusters and by the assumed cost distribution (Normal, Gamma, Inverse Gaussian). The models are fitted by Markov chain Monte Carlo methods. Our approach to model choice is based on four main criteria: the characteristics of the data, model pre-specification informed by the previous literature, diagnostic plots and assessment of model appropriateness. This is illustrated by re-analysing a previous cost-effectiveness analysis that uses data from a cluster randomised trial. We find that the most useful criterion for model choice was the deviance information criterion, which distinguishes amongst models with alternative variance-covariance structures, as well as between those with different cost distributions. This strategy for model choice can help cost-effectiveness analyses provide reliable inferences for policy-making when using cluster trials, including those with missing data. © The Author(s) 2013.
A spatial scan statistic for multiple clusters.
Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie
2011-10-01
Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.
Density-based cluster algorithms for the identification of core sets
NASA Astrophysics Data System (ADS)
Lemke, Oliver; Keller, Bettina G.
2016-10-01
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
Estimating and Identifying Unspecified Correlation Structure for Longitudinal Data
Hu, Jianhua; Wang, Peng; Qu, Annie
2014-01-01
Identifying correlation structure is important to achieving estimation efficiency in analyzing longitudinal data, and is also crucial for drawing valid statistical inference for large size clustered data. In this paper, we propose a nonparametric method to estimate the correlation structure, which is applicable for discrete longitudinal data. We utilize eigenvector-based basis matrices to approximate the inverse of the empirical correlation matrix and determine the number of basis matrices via model selection. A penalized objective function based on the difference between the empirical and model approximation of the correlation matrices is adopted to select an informative structure for the correlation matrix. The eigenvector representation of the correlation estimation is capable of reducing the risk of model misspecification, and also provides useful information on the specific within-cluster correlation pattern of the data. We show that the proposed method possesses the oracle property and selects the true correlation structure consistently. The proposed method is illustrated through simulations and two data examples on air pollution and sonar signal studies. PMID:26361433
Coarse cluster enhancing collaborative recommendation for social network systems
NASA Astrophysics Data System (ADS)
Zhao, Yao-Dong; Cai, Shi-Min; Tang, Ming; Shang, Min-Sheng
2017-10-01
Traditional collaborative filtering based recommender systems for social network systems bring very high demands on time complexity due to computing similarities of all pairs of users via resource usages and annotation actions, which thus strongly suppresses recommending speed. In this paper, to overcome this drawback, we propose a novel approach, namely coarse cluster that partitions similar users and associated items at a high speed to enhance user-based collaborative filtering, and then develop a fast collaborative user model for the social tagging systems. The experimental results based on Delicious dataset show that the proposed model is able to dramatically reduce the processing time cost greater than 90 % and relatively improve the accuracy in comparison with the ordinary user-based collaborative filtering, and is robust for the initial parameter. Most importantly, the proposed model can be conveniently extended by introducing more users' information (e.g., profiles) and practically applied for the large-scale social network systems to enhance the recommending speed without accuracy loss.
Nguyen, Hien D; Ullmann, Jeremy F P; McLachlan, Geoffrey J; Voleti, Venkatakaushik; Li, Wenze; Hillman, Elizabeth M C; Reutens, David C; Janke, Andrew L
2018-02-01
Calcium is a ubiquitous messenger in neural signaling events. An increasing number of techniques are enabling visualization of neurological activity in animal models via luminescent proteins that bind to calcium ions. These techniques generate large volumes of spatially correlated time series. A model-based functional data analysis methodology via Gaussian mixtures is suggested for the clustering of data from such visualizations is proposed. The methodology is theoretically justified and a computationally efficient approach to estimation is suggested. An example analysis of a zebrafish imaging experiment is presented.
Effects of additional data on Bayesian clustering.
Yamazaki, Keisuke
2017-10-01
Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.
A clustering algorithm for sample data based on environmental pollution characteristics
NASA Astrophysics Data System (ADS)
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
A curvature-based weighted fuzzy c-means algorithm for point clouds de-noising
NASA Astrophysics Data System (ADS)
Cui, Xin; Li, Shipeng; Yan, Xiutian; He, Xinhua
2018-04-01
In order to remove the noise of three-dimensional scattered point cloud and smooth the data without damnify the sharp geometric feature simultaneity, a novel algorithm is proposed in this paper. The feature-preserving weight is added to fuzzy c-means algorithm which invented a curvature weighted fuzzy c-means clustering algorithm. Firstly, the large-scale outliers are removed by the statistics of r radius neighboring points. Then, the algorithm estimates the curvature of the point cloud data by using conicoid parabolic fitting method and calculates the curvature feature value. Finally, the proposed clustering algorithm is adapted to calculate the weighted cluster centers. The cluster centers are regarded as the new points. The experimental results show that this approach is efficient to different scale and intensities of noise in point cloud with a high precision, and perform a feature-preserving nature at the same time. Also it is robust enough to different noise model.
Wu, Xiao; Shen, Jiong; Li, Yiguo; Lee, Kwang Y
2014-05-01
This paper develops a novel data-driven fuzzy modeling strategy and predictive controller for boiler-turbine unit using fuzzy clustering and subspace identification (SID) methods. To deal with the nonlinear behavior of boiler-turbine unit, fuzzy clustering is used to provide an appropriate division of the operation region and develop the structure of the fuzzy model. Then by combining the input data with the corresponding fuzzy membership functions, the SID method is extended to extract the local state-space model parameters. Owing to the advantages of the both methods, the resulting fuzzy model can represent the boiler-turbine unit very closely, and a fuzzy model predictive controller is designed based on this model. As an alternative approach, a direct data-driven fuzzy predictive control is also developed following the same clustering and subspace methods, where intermediate subspace matrices developed during the identification procedure are utilized directly as the predictor. Simulation results show the advantages and effectiveness of the proposed approach. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Simulating the Birth of Massive Star Clusters: Is Destruction Inevitable?
NASA Astrophysics Data System (ADS)
Rosen, Anna
2013-10-01
Very early in its operation, the Hubble Space Telescope {HST} opened an entirely new frontier: study of the demographics and properties of star clusters far beyond the Milky Way. However, interpretation of HST's observations has proven difficult, and has led to the development of two conflicting models. One view is that most massive star clusters are disrupted during their infancy by feedback from newly formed stars {i.e., "infant mortality"}, independent of cluster mass or environment. The other model is that most star clusters survive their infancy and are disrupted later by mass-dependent dynamical processes. Since observations at present have failed to discriminate between these views, we propose a theoretical investigation to provide new insight. We will perform radiation-hydrodynamic simulations of the formation of massive star clusters, including for the first time a realistic treatment of the most important stellar feedback processes. These simulations will elucidate the physics of stellar feedback, and allow us to determine whether cluster disruption is mass-dependent or -independent. We will also use our simulations to search for observational diagnostics that can distinguish bound from unbound clusters, and to predict how cluster disruption affects the cluster luminosity function in a variety of galactic environments.
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.
Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray
2016-12-01
In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Microbial community pattern detection in human body habitats via ensemble clustering framework.
Yang, Peng; Su, Xiaoquan; Ou-Yang, Le; Chua, Hon-Nian; Li, Xiao-Li; Ning, Kang
2014-01-01
The human habitat is a host where microbial species evolve, function, and continue to evolve. Elucidating how microbial communities respond to human habitats is a fundamental and critical task, as establishing baselines of human microbiome is essential in understanding its role in human disease and health. Recent studies on healthy human microbiome focus on particular body habitats, assuming that microbiome develop similar structural patterns to perform similar ecosystem function under same environmental conditions. However, current studies usually overlook a complex and interconnected landscape of human microbiome and limit the ability in particular body habitats with learning models of specific criterion. Therefore, these methods could not capture the real-world underlying microbial patterns effectively. To obtain a comprehensive view, we propose a novel ensemble clustering framework to mine the structure of microbial community pattern on large-scale metagenomic data. Particularly, we first build a microbial similarity network via integrating 1920 metagenomic samples from three body habitats of healthy adults. Then a novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is proposed and applied onto the network to detect clustering pattern. Extensive experiments are conducted to evaluate the effectiveness of our model on deriving microbial community with respect to body habitat and host gender. From clustering results, we observed that body habitat exhibits a strong bound but non-unique microbial structural pattern. Meanwhile, human microbiome reveals different degree of structural variations over body habitat and host gender. In summary, our ensemble clustering framework could efficiently explore integrated clustering results to accurately identify microbial communities, and provide a comprehensive view for a set of microbial communities. The clustering results indicate that structure of human microbiome is varied systematically across body habitats and host genders. Such trends depict an integrated biography of microbial communities, which offer a new insight towards uncovering pathogenic model of human microbiome.
Microbial community pattern detection in human body habitats via ensemble clustering framework
2014-01-01
Background The human habitat is a host where microbial species evolve, function, and continue to evolve. Elucidating how microbial communities respond to human habitats is a fundamental and critical task, as establishing baselines of human microbiome is essential in understanding its role in human disease and health. Recent studies on healthy human microbiome focus on particular body habitats, assuming that microbiome develop similar structural patterns to perform similar ecosystem function under same environmental conditions. However, current studies usually overlook a complex and interconnected landscape of human microbiome and limit the ability in particular body habitats with learning models of specific criterion. Therefore, these methods could not capture the real-world underlying microbial patterns effectively. Results To obtain a comprehensive view, we propose a novel ensemble clustering framework to mine the structure of microbial community pattern on large-scale metagenomic data. Particularly, we first build a microbial similarity network via integrating 1920 metagenomic samples from three body habitats of healthy adults. Then a novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is proposed and applied onto the network to detect clustering pattern. Extensive experiments are conducted to evaluate the effectiveness of our model on deriving microbial community with respect to body habitat and host gender. From clustering results, we observed that body habitat exhibits a strong bound but non-unique microbial structural pattern. Meanwhile, human microbiome reveals different degree of structural variations over body habitat and host gender. Conclusions In summary, our ensemble clustering framework could efficiently explore integrated clustering results to accurately identify microbial communities, and provide a comprehensive view for a set of microbial communities. The clustering results indicate that structure of human microbiome is varied systematically across body habitats and host genders. Such trends depict an integrated biography of microbial communities, which offer a new insight towards uncovering pathogenic model of human microbiome. PMID:25521415
Robust and fast-converging level set method for side-scan sonar image segmentation
NASA Astrophysics Data System (ADS)
Liu, Yan; Li, Qingwu; Huo, Guanying
2017-11-01
A robust and fast-converging level set method is proposed for side-scan sonar (SSS) image segmentation. First, the noise in each sonar image is removed using the adaptive nonlinear complex diffusion filter. Second, k-means clustering is used to obtain the initial presegmentation image from the denoised image, and then the distance maps of the initial contours are reinitialized to guarantee the accuracy of the numerical calculation used in the level set evolution. Finally, the satisfactory segmentation is achieved using a robust variational level set model, where the evolution control parameters are generated by the presegmentation. The proposed method is successfully applied to both synthetic image with speckle noise and real SSS images. Experimental results show that the proposed method needs much less iteration and therefore is much faster than the fuzzy local information c-means clustering method, the level set method using a gamma observation model, and the enhanced region-scalable fitting method. Moreover, the proposed method can usually obtain more accurate segmentation results compared with other methods.
Ontology-based topic clustering for online discussion data
NASA Astrophysics Data System (ADS)
Wang, Yongheng; Cao, Kening; Zhang, Xiaoming
2013-03-01
With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.
A two-stage model of fracture of rocks
Kuksenko, V.; Tomilin, N.; Damaskinskaya, E.; Lockner, D.
1996-01-01
In this paper we propose a two-stage model of rock fracture. In the first stage, cracks or local regions of failure are uncorrelated occur randomly throughout the rock in response to loading of pre-existing flaws. As damage accumulates in the rock, there is a gradual increase in the probability that large clusters of closely spaced cracks or local failure sites will develop. Based on statistical arguments, a critical density of damage will occur where clusters of flaws become large enough to lead to larger-scale failure of the rock (stage two). While crack interaction and cooperative failure is expected to occur within clusters of closely spaced cracks, the initial development of clusters is predicted based on the random variation in pre-existing Saw populations. Thus the onset of the unstable second stage in the model can be computed from the generation of random, uncorrelated damage. The proposed model incorporates notions of the kinetic (and therefore time-dependent) nature of the strength of solids as well as the discrete hierarchic structure of rocks and the flaw populations that lead to damage accumulation. The advantage offered by this model is that its salient features are valid for fracture processes occurring over a wide range of scales including earthquake processes. A notion of the rank of fracture (fracture size) is introduced, and criteria are presented for both fracture nucleation and the transition of the failure process from one scale to another.
Kwong, C. K.; Fung, K. Y.; Jiang, Huimin; Chan, K. Y.
2013-01-01
Affective design is an important aspect of product development to achieve a competitive edge in the marketplace. A neural-fuzzy network approach has been attempted recently to model customer satisfaction for affective design and it has been proved to be an effective one to deal with the fuzziness and non-linearity of the modeling as well as generate explicit customer satisfaction models. However, such an approach to modeling customer satisfaction has two limitations. First, it is not suitable for the modeling problems which involve a large number of inputs. Second, it cannot adapt to new data sets, given that its structure is fixed once it has been developed. In this paper, a modified dynamic evolving neural-fuzzy approach is proposed to address the above mentioned limitations. A case study on the affective design of mobile phones was conducted to illustrate the effectiveness of the proposed methodology. Validation tests were conducted and the test results indicated that: (1) the conventional Adaptive Neuro-Fuzzy Inference System (ANFIS) failed to run due to a large number of inputs; (2) the proposed dynamic neural-fuzzy model outperforms the subtractive clustering-based ANFIS model and fuzzy c-means clustering-based ANFIS model in terms of their modeling accuracy and computational effort. PMID:24385884
Kwong, C K; Fung, K Y; Jiang, Huimin; Chan, K Y; Siu, Kin Wai Michael
2013-01-01
Affective design is an important aspect of product development to achieve a competitive edge in the marketplace. A neural-fuzzy network approach has been attempted recently to model customer satisfaction for affective design and it has been proved to be an effective one to deal with the fuzziness and non-linearity of the modeling as well as generate explicit customer satisfaction models. However, such an approach to modeling customer satisfaction has two limitations. First, it is not suitable for the modeling problems which involve a large number of inputs. Second, it cannot adapt to new data sets, given that its structure is fixed once it has been developed. In this paper, a modified dynamic evolving neural-fuzzy approach is proposed to address the above mentioned limitations. A case study on the affective design of mobile phones was conducted to illustrate the effectiveness of the proposed methodology. Validation tests were conducted and the test results indicated that: (1) the conventional Adaptive Neuro-Fuzzy Inference System (ANFIS) failed to run due to a large number of inputs; (2) the proposed dynamic neural-fuzzy model outperforms the subtractive clustering-based ANFIS model and fuzzy c-means clustering-based ANFIS model in terms of their modeling accuracy and computational effort.
Huang, Wei; Oh, Sung-Kwun; Pedrycz, Witold
2014-12-01
In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature. Copyright © 2014 Elsevier Ltd. All rights reserved.
A multimodal detection model of dolphins to estimate abundance validated by field experiments.
Akamatsu, Tomonari; Ura, Tamaki; Sugimatsu, Harumi; Bahl, Rajendar; Behera, Sandeep; Panda, Sudarsan; Khan, Muntaz; Kar, S K; Kar, C S; Kimura, Satoko; Sasaki-Yamamoto, Yukiko
2013-09-01
Abundance estimation of marine mammals requires matching of detection of an animal or a group of animal by two independent means. A multimodal detection model using visual and acoustic cues (surfacing and phonation) that enables abundance estimation of dolphins is proposed. The method does not require a specific time window to match the cues of both means for applying mark-recapture method. The proposed model was evaluated using data obtained in field observations of Ganges River dolphins and Irrawaddy dolphins, as examples of dispersed and condensed distributions of animals, respectively. The acoustic detection probability was approximately 80%, 20% higher than that of visual detection for both species, regardless of the distribution of the animals in present study sites. The abundance estimates of Ganges River dolphins and Irrawaddy dolphins fairly agreed with the numbers reported in previous monitoring studies. The single animal detection probability was smaller than that of larger cluster size, as predicted by the model and confirmed by field data. However, dense groups of Irrawaddy dolphins showed difference in cluster sizes observed by visual and acoustic methods. Lower detection probability of single clusters of this species seemed to be caused by the clumped distribution of this species.
TOPTRAC: Topical Trajectory Pattern Mining
Kim, Younghoon; Han, Jiawei; Yuan, Cangzhou
2015-01-01
With the increasing use of GPS-enabled mobile phones, geo-tagging, which refers to adding GPS information to media such as micro-blogging messages or photos, has seen a surge in popularity recently. This enables us to not only browse information based on locations, but also discover patterns in the location-based behaviors of users. Many techniques have been developed to find the patterns of people's movements using GPS data, but latent topics in text messages posted with local contexts have not been utilized effectively. In this paper, we present a latent topic-based clustering algorithm to discover patterns in the trajectories of geo-tagged text messages. We propose a novel probabilistic model to capture the semantic regions where people post messages with a coherent topic as well as the patterns of movement between the semantic regions. Based on the model, we develop an efficient inference algorithm to calculate model parameters. By exploiting the estimated model, we next devise a clustering algorithm to find the significant movement patterns that appear frequently in data. Our experiments on real-life data sets show that the proposed algorithm finds diverse and interesting trajectory patterns and identifies the semantic regions in a finer granularity than the traditional geographical clustering methods. PMID:26709365
A clustering approach applied to time-lapse ERT interpretation - Case study of Lascaux cave
NASA Astrophysics Data System (ADS)
Xu, Shan; Sirieix, Colette; Riss, Joëlle; Malaurent, Philippe
2017-09-01
The Lascaux cave, located in southwest France, is one of the most important prehistoric cave in the world that shows Paleolithic paintings. This study aims to characterize the structure of the weathered epikarst setting located above the cave using Time-Lapse Electrical Resistivity Tomography (ERT) combined with local hydrogeological and climatic environmental data. Twenty ERT profiles were carried out for two years and helped us to record the seasonal and spatial variations of the electrical resistivity of the hydraulic upstream area of the Lascaux cave. The 20 interpreted resistivity models were merged into a single synthetic model using a multidimensional statistical method (Hierarchical Agglomerative Clustering). The individual blocks from the synthetic model associated with a similar resistivity variability were gathered into 7 clusters. We combined the resistivity temporal variations with climatic and hydrogeological data to propose a geo-electrical model that relates to a conceptual geological model. We provide a geological interpretation for each cluster regarding epikarst features. The superficial clusters (no 1 & 2) are linked to effective rainfall and trees, probably a fractured limestone. Another two clusters (no 6 & 7) are linked to detrital formations (sand and clay respectively). The cluster 3 may correspond to a marly limestone that forms a non-permeable horizon. Finally, the electrical behavior of the last two clusters (no 4 & 5) is correlated with the variation of flow rate; they may be a privileged feed zone of the flow in the cave.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Li; Tunega, Daniel; Xu, Lai
2013-08-29
In a previous study (J. Phys. Chem. C 2011, 115, 12403) cluster models for the TiO2 rutile (110) surface and MP2 calculations were used to develop an analytic potential energy function for dimethyl methylphosphonate (DMMP) interacting with this surface. In the work presented here, this analytic potential and MP2 cluster models are compared with DFT "slab" calculations for DMMP interacting with the TiO2 (110) surface and with DFT cluster models for the TiO2 (110) surface. The DFT slab calculations were performed with the PW91 and PBE functionals. The analytic potential gives DMMP/ TiO2 (110) potential energy curves in excellent agreementmore » with those obtained from the slab calculations. The cluster models for the TiO2 (110) surface, used for the MP2 calculations, were extended to DFT calculations with the B3LYP, PW91, and PBE functional. These DFT calculations do not give DMMP/TiO2 (110) interaction energies which agree with those from the DFT slab calculations. Analyses of the wave functions for these cluster models show that they do not accurately represent the HOMO and LUMO for the surface, which should be 2p and 3d orbitals, respectively, and the models also do not give an accurate band gap. The MP2 cluster models do not accurately represent the LUMO and that they give accurate DMMP/TiO2 (110) interaction energies is apparently fortuitous, arising from their highly inaccurate band gaps. Accurate cluster models, consisting of 7, 10, and 15 Ti-atoms and which have the correct HOMO and LUMO properties, are proposed. The work presented here illustrates the care that must be taken in "constructing" cluster models which accurately model surfaces.« less
NASA Astrophysics Data System (ADS)
Sánchez, Clara I.; Hornero, Roberto; Mayo, Agustín; García, María
2009-02-01
Diabetic Retinopathy is one of the leading causes of blindness and vision defects in developed countries. An early detection and diagnosis is crucial to avoid visual complication. Microaneurysms are the first ocular signs of the presence of this ocular disease. Their detection is of paramount importance for the development of a computer-aided diagnosis technique which permits a prompt diagnosis of the disease. However, the detection of microaneurysms in retinal images is a difficult task due to the wide variability that these images usually present in screening programs. We propose a statistical approach based on mixture model-based clustering and logistic regression which is robust to the changes in the appearance of retinal fundus images. The method is evaluated on the public database proposed by the Retinal Online Challenge in order to obtain an objective performance measure and to allow a comparative study with other proposed algorithms.
Ding, Ming; Zhu, Qianlong
2016-01-01
Hardware protection and control action are two kinds of low voltage ride-through technical proposals widely used in a permanent magnet synchronous generator (PMSG). This paper proposes an innovative clustering concept for the equivalent modeling of a PMSG-based wind power plant (WPP), in which the impacts of both the chopper protection and the coordinated control of active and reactive powers are taken into account. First, the post-fault DC link voltage is selected as a concentrated expression of unit parameters, incoming wind and electrical distance to a fault point to reflect the transient characteristics of PMSGs. Next, we provide an effective method for calculating the post-fault DC link voltage based on the pre-fault wind energy and the terminal voltage dip. Third, PMSGs are divided into groups by analyzing the calculated DC link voltages without any clustering algorithm. Finally, PMSGs of the same group are equivalent as one rescaled PMSG to realize the transient equivalent modeling of the PMSG-based WPP. Using the DIgSILENT PowerFactory simulation platform, the efficiency and accuracy of the proposed equivalent model are tested against the traditional equivalent WPP and the detailed WPP. The simulation results show the proposed equivalent model can be used to analyze the offline electromechanical transients in power systems.
NASA Astrophysics Data System (ADS)
Bruynooghe, Michel M.
1998-04-01
In this paper, we present a robust method for automatic object detection and delineation in noisy complex images. The proposed procedure is a three stage process that integrates image segmentation by multidimensional pixel clustering and geometrically constrained optimization of deformable contours. The first step is to enhance the original image by nonlinear unsharp masking. The second step is to segment the enhanced image by multidimensional pixel clustering, using our reducible neighborhoods clustering algorithm that has a very interesting theoretical maximal complexity. Then, candidate objects are extracted and initially delineated by an optimized region merging algorithm, that is based on ascendant hierarchical clustering with contiguity constraints and on the maximization of average contour gradients. The third step is to optimize the delineation of previously extracted and initially delineated objects. Deformable object contours have been modeled by cubic splines. An affine invariant has been used to control the undesired formation of cusps and loops. Non linear constrained optimization has been used to maximize the external energy. This avoids the difficult and non reproducible choice of regularization parameters, that are required by classical snake models. The proposed method has been applied successfully to the detection of fine and subtle microcalcifications in X-ray mammographic images, to defect detection by moire image analysis, and to the analysis of microrugosities of thin metallic films. The later implementation of the proposed method on a digital signal processor associated to a vector coprocessor would allow the design of a real-time object detection and delineation system for applications in medical imaging and in industrial computer vision.
Coupled multipolar interactions in small-particle metallic clusters.
Pustovit, Vitaly N; Sotelo, Juan A; Niklasson, Gunnar A
2002-03-01
We propose a new formalism for computing the optical properties of small clusters of particles. It is a generalization of the coupled dipole-dipole particle-interaction model and allows one in principle to take into account all multipolar interactions in the long-wavelength limit. The method is illustrated by computations of the optical properties of N = 6 particle clusters for different multipolar approximations. We examine the effect of separation between particles and compare the optical spectra with the discrete-dipole approximation and the generalized Mie theory.
Vibration control of a cluster of buildings through the Vibrating Barrier
NASA Astrophysics Data System (ADS)
Tombari, A.; Garcia Espinosa, M.; Alexander, N. A.; Cacciola, P.
2018-02-01
A novel device, called Vibrating Barrier (ViBa), that aims to reduce the vibrations of adjacent structures subjected to ground motion waves has been recently proposed. The ViBa is a structure buried in the soil and detached from surrounding buildings that is able to absorb a significant portion of the dynamic energy arising from the ground motion. The working principle exploits the dynamic interaction among vibrating structures due to the propagation of waves through the soil, namely the structure-soil-structure interaction. In this paper the efficiency of the ViBa is investigated to control the vibrations of a cluster of buildings. To this aim, a discrete model of structures-site interaction involving multiple buildings and the ViBa is developed where the effects of the soil on the structures, i.e. the soil-structure interaction (SSI), the structure-soil-structure interaction (SSSI) as well as the ViBa-soil-structures interaction are taken into account by means of linear elastic springs. Closed-form solutions are derived to design the ViBa in the case of harmonic excitation from the analysis of the discrete model. Advanced finite element numerical simulations are performed in order to assess the efficiency of the ViBa for protecting more than a single building. Parametric studies are also conducted to identify beneficial/adverse effects in the use of the proposed vibration control strategy to protect cluster of buildings. Finally, experimental shake table tests are performed to a prototype of a cluster of two buildings protected by the ViBa device for validating the proposed numerical models.
Distance-Based and Low Energy Adaptive Clustering Protocol for Wireless Sensor Networks
Gani, Abdullah; Anisi, Mohammad Hossein; Ab Hamid, Siti Hafizah; Akhunzada, Adnan; Khan, Muhammad Khurram
2016-01-01
A wireless sensor network (WSN) comprises small sensor nodes with limited energy capabilities. The power constraints of WSNs necessitate efficient energy utilization to extend the overall network lifetime of these networks. We propose a distance-based and low-energy adaptive clustering (DISCPLN) protocol to streamline the green issue of efficient energy utilization in WSNs. We also enhance our proposed protocol into the multi-hop-DISCPLN protocol to increase the lifetime of the network in terms of high throughput with minimum delay time and packet loss. We also propose the mobile-DISCPLN protocol to maintain the stability of the network. The modelling and comparison of these protocols with their corresponding benchmarks exhibit promising results. PMID:27658194
The Scale Sizes of Globular Clusters: Tidal Limits, Evolution, and the Outer Halo
NASA Astrophysics Data System (ADS)
Harris, William
2011-10-01
The physical factors that determine the linear sizes of massive star clusters are not well understood. Their scale sizes were long thought to be governed by the tidal field of the parent galaxy, but major questions are now emerging. Globular clusters, for example, have mean sizes nearly independent of location in the halo. Paradoxically, the recently discovered "anomalous extended clusters" in M31 and elsewhere have scale sizes that fit much better with tidal theory, but they are puzzlingly rare. Lastly, the persistent size difference between metal-poor and metal-rich clusters still lacks a quantitative explanation. Many aspects of these observations call for better modelling of dynamical evolution in the outskirts of clusters, and also their conditions of formation including the early rapid mass loss phase of protoclusters. A new set of accurate measurements of scale sizes and structural parameters, for a large and homogeneous set of globular clusters, would represent a major advance in this subject. We propose to carry out a {WFC3+ACS} imaging survey of the globular clusters in the supergiant Virgo elliptical M87 to cover the complete run of the halo. M87 is an optimum target system because of its huge numbers of clusters and HST's ability to resolve the cluster profiles accurately. We will derive cluster effective radii, central concentrations, luminosities, and colors for more than 4000 clusters using PSF-convolved King-model profile fitting. In parallel, we are developing theoretical tools to model the expected distribution of cluster sizes versus galactocentric distance as functions of cluster mass, concentration, and orbital anisotropy.
Goszczyński, Tomasz M; Kowalski, Konrad; Leśnikowski, Zbigniew J; Boratyński, Janusz
2015-02-01
Boron clusters represent a vast family of boron-rich compounds with extraordinary properties that provide the opportunity of exploitation in different areas of chemistry and biology. In addition, boron clusters are clinically used in boron neutron capture therapy (BNCT) of tumors. In this paper, a novel, in solid state (solvent free), thermal method for protein modification with boron clusters has been proposed. The method is based on a cyclic ether ring opening in oxonium adduct of cyclic ether and a boron cluster with nucleophilic centers of the protein. Lysozyme was used as the model protein, and the physicochemical and biological properties of the obtained conjugates were characterized. The main residues of modification were identified as arginine-128 and threonine-51. No significant changes in the secondary or tertiary structures of the protein after tethering of the boron cluster were found using mass spectrometry and circular dichroism measurements. However, some changes in the intermolecular interactions and hydrodynamic and catalytic properties were observed. To the best of our knowledge, we have described the first example of an application of cyclic ether ring opening in the oxonium adducts of a boron cluster for protein modification. In addition, a distinctive feature of the proposed approach is performing the reaction in solid state and at elevated temperature. The proposed methodology provides a new route to protein modification with boron clusters and extends the range of innovative molecules available for biological and medical testing. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Herega, Alexander; Sukhanov, Volodymyr; Vyrovoy, Valery
2017-12-01
It is known that the multifocal mechanism of genesis of structure of heterogeneous materials provokes intensive formation of internal boundaries. In the present papers, the dependence of the structure and properties of material on the characteristic size and shape, the number and size distribution, and the character of interaction of individual internal boundaries and their clusters is studied. The limitation on the applicability of the material damage coefficient is established; the effective information descriptor of internal boundaries is proposed. An idea of the effect of long-range interaction in irradiated solids on the realization of the second-order phase transition is introduced; a phenomenological percolation model of the effect is proposed.
Vera, José Fernando; de Rooij, Mark; Heiser, Willem J
2014-11-01
In this paper we propose a latent class distance association model for clustering in the predictor space of large contingency tables with a categorical response variable. The rows of such a table are characterized as profiles of a set of explanatory variables, while the columns represent a single outcome variable. In many cases such tables are sparse, with many zero entries, which makes traditional models problematic. By clustering the row profiles into a few specific classes and representing these together with the categories of the response variable in a low-dimensional Euclidean space using a distance association model, a parsimonious prediction model can be obtained. A generalized EM algorithm is proposed to estimate the model parameters and the adjusted Bayesian information criterion statistic is employed to test the number of mixture components and the dimensionality of the representation. An empirical example highlighting the advantages of the new approach and comparing it with traditional approaches is presented. © 2014 The British Psychological Society.
Gay, Emilie; Senoussi, Rachid; Barnouin, Jacques
2007-01-01
Methods for spatial cluster detection dealing with diseases quantified by continuous variables are few, whereas several diseases are better approached by continuous indicators. For example, subclinical mastitis of the dairy cow is evaluated using a continuous marker of udder inflammation, the somatic cell score (SCS). Consequently, this study proposed to analyze spatialized risk and cluster components of herd SCS through a new method based on a spatial hazard model. The dataset included annual SCS for 34 142 French dairy herds for the year 2000, and important SCS risk factors: mean parity, percentage of winter and spring calvings, and herd size. The model allowed the simultaneous estimation of the effects of known risk factors and of potential spatial clusters on SCS, and the mapping of the estimated clusters and their range. Mean parity and winter and spring calvings were significantly associated with subclinical mastitis risk. The model with the presence of 3 clusters was highly significant, and the 3 clusters were attractive, i.e. closeness to cluster center increased the occurrence of high SCS. The three localizations were the following: close to the city of Troyes in the northeast of France; around the city of Limoges in the center-west; and in the southwest close to the city of Tarbes. The semi-parametric method based on spatial hazard modeling applies to continuous variables, and takes account of both risk factors and potential heterogeneity of the background population. This tool allows a quantitative detection but assumes a spatially specified form for clusters.
NASA Astrophysics Data System (ADS)
Li, Hongsong; Lyu, Hang; Liao, Ningfang; Wu, Wenmin
2016-12-01
The bidirectional reflectance distribution function (BRDF) data in the ultraviolet (UV) band are valuable for many applications including cultural heritage, material analysis, surface characterization, and trace detection. We present a BRDF measurement instrument working in the near- and middle-UV spectral range. The instrument includes a collimated UV light source, a rotation stage, a UV imaging spectrometer, and a control computer. The data captured by the proposed instrument describe spatial, spectral, and angular variations of the light scattering from a sample surface. Such a multidimensional dataset of an example sample is captured by the proposed instrument and analyzed by a k-mean clustering algorithm to separate surface regions with same material but different surface roughnesses. The clustering results show that the angular dimension of the dataset can be exploited for surface roughness characterization. The two clustered BRDFs are fitted to a theoretical BRDF model. The fitting results show good agreement between the measurement data and the theoretical model.
Complex networks as a unified framework for descriptive analysis and predictive modeling in climate
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R
The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less
Kee, Kerk F; Sparks, Lisa; Struppa, Daniele C; Mannucci, Mirco A; Damiano, Alberto
2016-01-01
By integrating the simplicial model of social aggregation with existing research on opinion leadership and diffusion networks, this article introduces the constructs of simplicial diffusers (mathematically defined as nodes embedded in simplexes; a simplex is a socially bonded cluster) and simplicial diffusing sets (mathematically defined as minimal covers of a simplicial complex; a simplicial complex is a social aggregation in which socially bonded clusters are embedded) to propose a strategic approach for information diffusion of cancer screenings as a health intervention on Facebook for community cancer prevention and control. This approach is novel in its incorporation of interpersonally bonded clusters, culturally distinct subgroups, and different united social entities that coexist within a larger community into a computational simulation to select sets of simplicial diffusers with the highest degree of information diffusion for health intervention dissemination. The unique contributions of the article also include seven propositions and five algorithmic steps for computationally modeling the simplicial model with Facebook data.
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
Wernisch, Lorenz
2017-01-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.
Gabasova, Evelina; Reid, John; Wernisch, Lorenz
2017-10-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials
Diaz-Ordaz, Karla; Bartlett, Jonathan W
2016-01-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.
Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W
2017-06-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
Harnessing Sparse and Low-Dimensional Structures for Robust Clustering of Imagery Data
ERIC Educational Resources Information Center
Rao, Shankar Ramamohan
2009-01-01
We propose a robust framework for clustering data. In practice, data obtained from real measurement devices can be incomplete, corrupted by gross errors, or not correspond to any assumed model. We show that, by properly harnessing the intrinsic low-dimensional structure of the data, these kinds of practical problems can be dealt with in a uniform…
RELICS: Reionization Lensing Cluster Survey - Discovering Brightly Lensed Distant Galaxies for JWST
NASA Astrophysics Data System (ADS)
Coe, Dan; Bradley, Larry; Salmon, Brett; Avila, Roberto J.; Ogaz, Sara; Bradac, Marusa; Huang, Kuang-Han; Strait, Victoria; Hoag, Austin; Sharon, Keren q.; Cerny, Catherine; Paterno-Mahler, Rachel; Johnson, Traci Lin; Mahler, Guillaume; Zitrin, Adi; Sendra Server, Irene; Acebron, Ana; Cibirka, Nathália; Rodney, Steven; Strolger, Louis; Riess, Adam; Dawson, William; Jones, Christine; Andrade-Santos, Felipe; Lovisari, Lorenzo; Czakon, Nicole; Umetsu, Keiichi; Trenti, Michele; Vulcani, Benedetta; Carrasco, Daniela; Livermore, Rachael; Stark, Daniel P.; Mainali, Ramesh; Frye, Brenda; Oesch, Pascal; Lam, Daniel; Toft, Sune; Ryan, Russell; Peterson, Avery; Past, Matthew; Kikuchihara, Shotaro; Ouchi, Masami; Oguri, Masamune
2018-01-01
The Reionization Lensing Cluster Survey (RELICS) Hubble Treasury Program has completed observations of 41 massive galaxy clusters with 188 orbits of HST ACS and WFC3/IR imaging and 390 hours of Spitzer IRAC imaging. This poster presents an overview of the program and data releases. Reduced images, catalogs, and lens models for all clusters are now available on MAST. RELICS is studying the clusters, supernovae, and lensed high-redshift galaxies. A companion poster presents our high-redshift results: over 300 lensed z ~ 6 - 10 candidates, including some of the brightest known at these redshifts (Salmon et al. 2018). These will be excellent targets for detailed follow-up study in JWST Cycle 1 GO proposals.
SAR image segmentation using skeleton-based fuzzy clustering
NASA Astrophysics Data System (ADS)
Cao, Yun Yi; Chen, Yan Qiu
2003-06-01
SAR image segmentation can be converted to a clustering problem in which pixels or small patches are grouped together based on local feature information. In this paper, we present a novel framework for segmentation. The segmentation goal is achieved by unsupervised clustering upon characteristic descriptors extracted from local patches. The mixture model of characteristic descriptor, which combines intensity and texture feature, is investigated. The unsupervised algorithm is derived from the recently proposed Skeleton-Based Data Labeling method. Skeletons are constructed as prototypes of clusters to represent arbitrary latent structures in image data. Segmentation using Skeleton-Based Fuzzy Clustering is able to detect the types of surfaces appeared in SAR images automatically without any user input.
Structural and Functional Analyses of the Proteins Involved in the Iron-Sulfur Cluster Biosynthesis
NASA Astrophysics Data System (ADS)
Wada, Kei
The iron-sulfur (Fe-S) clusters are ubiquitous prosthetic groups that are required to maintain such fundamental life processes as respiratory chain, photosynthesis and the regulation of gene expression. Assembly of intracellular Fe-S cluster requires the sophisticated biosynthetic systems called ISC and SUF machineries. To shed light on the molecular mechanism of Fe-S cluster assembly mediated by SUF machinery, several structures of the SUF components and their sub-complex were determined. The structural findings together with biochemical characterization of the core-complex (SufB-SufC-SufD complex) have led me to propose a working model for the cluster biosynthesis in the SUF machinery.
Clustered star formation and the origin of stellar masses.
Pudritz, Ralph E
2002-01-04
Star clusters are ubiquitous in galaxies of all types and at all stages of their evolution. We also observe them to be forming in a wide variety of environments, ranging from nearby giant molecular clouds to the supergiant molecular clouds found in starburst and merging galaxies. The typical star in our galaxy and probably in others formed as a member of a star cluster, so star formation is an intrinsically clustered and not an isolated phenomenon. The greatest challenge regarding clustered star formation is to understand why stars have a mass spectrum that appears to be universal. This review examines the observations and models that have been proposed to explain these fundamental issues in stellar formation.
Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement.
Tang, Jinhui; Shu, Xiangbo; Qi, Guo-Jun; Li, Zechao; Wang, Meng; Yan, Shuicheng; Jain, Ramesh
2017-08-01
Social image tag refinement, which aims to improve tag quality by automatically completing the missing tags and rectifying the noise-corrupted ones, is an essential component for social image search. Conventional approaches mainly focus on exploring the visual and tag information, without considering the user information, which often reveals important hints on the (in)correct tags of social images. Towards this end, we propose a novel tri-clustered tensor completion framework to collaboratively explore these three kinds of information to improve the performance of social image tag refinement. Specifically, the inter-relations among users, images and tags are modeled by a tensor, and the intra-relations between users, images and tags are explored by three regularizations respectively. To address the challenges of the super-sparse and large-scale tensor factorization that demands expensive computing and memory cost, we propose a novel tri-clustering method to divide the tensor into a certain number of sub-tensors by simultaneously clustering users, images and tags into a bunch of tri-clusters. And then we investigate two strategies to complete these sub-tensors by considering (in)dependence between the sub-tensors. Experimental results on a real-world social image database demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
Saeed, Isaam; Tang, Sen-Lin; Halgamuge, Saman K.
2012-01-01
An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis. PMID:22180538
Dynamic Evolution Model Based on Social Network Services
NASA Astrophysics Data System (ADS)
Xiong, Xi; Gou, Zhi-Jian; Zhang, Shi-Bin; Zhao, Wen
2013-11-01
Based on the analysis of evolutionary characteristics of public opinion in social networking services (SNS), in the paper we propose a dynamic evolution model, in which opinions are coupled with topology. This model shows the clustering phenomenon of opinions in dynamic network evolution. The simulation results show that the model can fit the data from a social network site. The dynamic evolution of networks accelerates the opinion, separation and aggregation. The scale and the number of clusters are influenced by confidence limit and rewiring probability. Dynamic changes of the topology reduce the number of isolated nodes, while the increased confidence limit allows nodes to communicate more sufficiently. The two effects make the distribution of opinion more neutral. The dynamic evolution of networks generates central clusters with high connectivity and high betweenness, which make it difficult to control public opinions in SNS.
Nuclear Rings in the IR: Hidden Super Star Clusters
NASA Astrophysics Data System (ADS)
Maoz, Dan
1997-07-01
We propose NICMOS broad-band {F160W, F187W} and Paschen Alpha {F187N} imaging of nuclear starburst rings in two nearby galaxies. We already have UV {F220W} FOC data, and are scheduled to obtain WFPC2 images in U, V, I, and Halpha+[NII] of these rings. The rings contain large populations of super star clusters similar to those recently discovered in other types of starburst systems. Nuclear rings contain large numbers of these clusters in relatively unobscured starburst environments. Measurement of the age, size, and stellar contents of the clusters can test the hypothesis that super star clusters are young globular clusters. Together with our UV and optical data, NICMOS images will provide the SED of numerous super star clusters over a decade in wavelength. Our already-approved observations will allow us to estimate, by comparison with evolutionary synthesis models, the masses and ages of the clusters. The proposed IR data will be sensitive to the number of supergiants {1.6 micron} and O-stars {Paschen Alpha} in each of the clusters. The observations will provide an independent determination of the reddening, mass, and age of each cluster. We expect to see in the IR numerous clusters that are obscured in the UV and optical. These clusters may be the younger ones, which are still embedded in their molecular clouds. By measuring the mass, age, and size of a large number of clusters, we can actually obtain an evolutionary picture of these objects at different stages in their lives.
Beyond Hydrodynamic Modeling of AGN Heating in Galaxy Clusters
NASA Astrophysics Data System (ADS)
Yang, Hsiang-Yi Karen
Clusters of galaxies hold a unique position in hierarchical structure formation - they are both powerful cosmological probes and excellent astrophysical laboratories. Accurate modeling of the cluster properties is crucial for reducing systematic uncertainties in cluster cosmology. However, theoretical modeling of the intracluster medium (ICM) has long suffered from the "cooling-flow problem" - clusters with short central times or cool cores (CCs) are predicted to host massive inflows of gas that are not observed. Feedback from active galactic nuclei (AGN) is by far the most promising heating mechanism to counteract radiative cooling. Recent hydrodynamic simulations have made remarkable progress reproducing properties of the CCs. However, there remain two major questions that cannot be probed using purely hydrodynamic models: (1) what are the roles of cosmic rays (CRs)? (2) how is the existing picture altered when the ICM is modeled as weakly collisional plasma? We propose to move beyond limitations of pure hydrodynamics and progress toward a complete understanding of how AGN jet-inflated bubbles interact with their surroundings and provide heat to the ICM. Our objectives include: (1) understand how CR-dominated bubbles heat the ICM; (2) understand bubble evolution and sound-wave dissipation in the ICM with different assumptions of plasma properties, e.g., collisionality of the ICM, with or without anisotropic transport processes; (3) Develop a subgrid model of AGN heating that can be adopted in cosmological simulations based on state-of-the-art isolated simulations. We will use a combination of analytical calculations and idealized simulations to advance our understanding of each individual physical process. We will then perform the first three-dimensional (3D) magnetohydrodynamic (MHD) simulations of self-regulated AGN feedback with relevant CR and anisotropic transport processes in order to quantify the amount and distribution of heating from the AGN. Our proposed work will elucidate the poorly understood CR and anisotropic transport processes in the weakly collisional ICM and shed light on the long-standing mystery of AGN heating in CC clusters. Our investigation, which incorporates plasma effects into fluid models and provides physical foundation for cosmological simulations, will serve as an important bridge between physics on both micro and macro scales. This study will enable robust modeling of the radio-mode feedback of AGN in cosmological simulations of cluster and galaxy formation. It will also directly impact observational studies of clusters including NASA missions such as Chandra, XMM-Newton, Astro-H/Hitomi, Fermi, HST, and Planck.
Yi, Chucai; Tian, Yingli
2012-09-01
In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.
Globular clusters and environmental effects in galaxy clusters
NASA Astrophysics Data System (ADS)
Sales, Laura
2016-10-01
Globular clusters are old compact stellar systems orbiting around galaxies of all types. Tens of thousands of them can also be found populating the intra-cluster regions of nearby galaxy clusters like Virgo and Coma. Thanks to the HST Frontier Fields program, GCs are starting now to be detected also in intermediate redshift clusters. Yet, despite their ubiquity, a theoretical model for the formation and evolution of GCs is still missing, especially within the cosmological context.Here we propose to use cosmological hydrodynamical simulations of 18 galaxy clusters coupled to a post-processing GC formation model to explore the assembly of galaxies in clusters together with their expected GC population. The method, which has already been implemented and tested, will allow us to characterize for the first time the number, radial distribution and kinematics of GCs in clusters, with products directly comparable to observational maps. We will explore cluster-to-cluster variations and also characterize the build up of the intra-cluster component of GCs with time.As the method relies on a detailed study of the star-formation history of galaxies, we will jointly constrain the predicted quenching time-scales for satellites and the occurrence of starburst events associated to infall and orbital pericenters of galaxies in massive clusters. This will inform further studies on the distribution, velocity and properties of post-starburst galaxies in past, ongoing and future HST programs.
Hirose, H
1997-01-01
This paper proposes a new treatment for electrical insulation degradation. Some types of insulation which have been used under various circumstances are considered to degrade at various rates in accordance with their stress circumstances. The cross-linked polyethylene (XLPE) insulated cables inspected by major Japanese electric companies clearly indicate such phenomena. By assuming that the inspected specimen is sampled from one of the clustered groups, a mixed degradation model can be constructed. Since the degradation of the insulation under common circumstances is considered to follow a Weibull distribution, a mixture model and a Weibull power law can be combined. This is called The mixture Weibull power law model. By using the maximum likelihood estimation for the newly proposed model to Japanese 22 and 33 kV insulation class cables, they are clustered into a certain number of groups by using the AIC and the generalized likelihood ratio test method. The reliability of the cables at specified years are assessed.
Chen, You-Shyang; Cheng, Ching-Hsue; Lai, Chien-Jung; Hsu, Cheng-Yi; Syu, Han-Jhou
2012-02-01
Identifying patients in a Target Customer Segment (TCS) is important to determine the demand for, and to appropriately allocate resources for, health care services. The purpose of this study is to propose a two-stage clustering-classification model through (1) initially integrating the RFM attribute and K-means algorithm for clustering the TCS patients and (2) then integrating the global discretization method and the rough set theory for classifying hospitalized departments and optimizing health care services. To assess the performance of the proposed model, a dataset was used from a representative hospital (termed Hospital-A) that was extracted from a database from an empirical study in Taiwan comprised of 183,947 samples that were characterized by 44 attributes during 2008. The proposed model was compared with three techniques, Decision Tree, Naive Bayes, and Multilayer Perceptron, and the empirical results showed significant promise of its accuracy. The generated knowledge-based rules provide useful information to maximize resource utilization and support the development of a strategy for decision-making in hospitals. From the findings, 75 patients in the TCS, three hospital departments, and specific diagnostic items were discovered in the data for Hospital-A. A potential determinant for gender differences was found, and the age attribute was not significant to the hospital departments. Copyright © 2011 Elsevier Ltd. All rights reserved.
Ahn, Kwang Woo; Kosoy, Michael; Chan, Kung-Sik
2014-06-01
We developed a two-strain susceptible-infected-recovered (SIR) model that provides a framework for inferring the cross-immunity between two strains of a bacterial species in the host population with discretely sampled co-infection time-series data. Moreover, the model accounts for seasonality in host reproduction. We illustrate an approach using a dataset describing co-infections by several strains of bacteria circulating within a population of cotton rats (Sigmodon hispidus). Bartonella strains were clustered into three genetically close groups, between which the divergence is correspondent to the accepted level of separate bacterial species. The proposed approach revealed no cross-immunity between genetic clusters while limited cross-immunity might exist between subgroups within the clusters. Copyright © 2014. Published by Elsevier B.V.
MacGregor, James N
2015-10-01
Research on human performance in solving traveling salesman problems typically uses point sets as stimuli, and most models have proposed a processing stage at which stimulus dots are clustered. However, few empirical studies have investigated the effects of clustering on performance. In one recent study, researchers compared the effects of clustered, random, and regular stimuli, and concluded that clustering facilitates performance (Dry, Preiss, & Wagemans, 2012). Another study suggested that these results may have been influenced by the location rather than the degree of clustering (MacGregor, 2013). Two experiments are reported that mark an attempt to disentangle these factors. The first experiment tested several combinations of degree of clustering and cluster location, and revealed mixed evidence that clustering influences performance. In a second experiment, both factors were varied independently, showing that they interact. The results are discussed in terms of the importance of clustering effects, in particular, and perceptual factors, in general, during performance of the traveling salesman problem.
Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
Modeling and Testing Dark Energy and Gravity with Galaxy Cluster Data
NASA Astrophysics Data System (ADS)
Rapetti, David; Cataneo, Matteo; Heneka, Caroline; Mantz, Adam; Allen, Steven W.; Von Der Linden, Anja; Schmidt, Fabian; Lombriser, Lucas; Li, Baojiu; Applegate, Douglas; Kelly, Patrick; Morris, Glenn
2018-06-01
The abundance of galaxy clusters is a powerful probe to constrain the properties of dark energy and gravity at large scales. We employed a self-consistent analysis that includes survey, observable-mass scaling relations and weak gravitational lensing data to obtain constraints on f(R) gravity, which are an order of magnitude tighter than the best previously achieved, as well as on cold dark energy of negligible sound speed. The latter implies clustering of the dark energy fluid at all scales, allowing us to measure the effects of dark energy perturbations at cluster scales. For this study, we recalibrated the halo mass function using the following non-linear characteristic quantities: the spherical collapse threshold, the virial overdensity and an additional mass contribution for cold dark energy. We also presented a new modeling of the f(R) gravity halo mass function that incorporates novel corrections to capture key non-linear effects of the Chameleon screening mechanism, as found in high resolution N-body simulations. All these results permit us to predict, as I will also exemplify, and eventually obtain the next generation of cluster constraints on such models, and provide us with frameworks that can also be applied to other proposed dark energy and modified gravity models using cluster abundance observations.
A Cyber-Attack Detection Model Based on Multivariate Analyses
NASA Astrophysics Data System (ADS)
Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi
In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.
A Detailed Study of Chemical Enrichment History of Galaxy Clusters out to Virial Radius
NASA Astrophysics Data System (ADS)
Loewenstein, Michael
The origin of the metal enrichment of the intracluster medium (ICM) represents a fundamental problem in extragalactic astrophysics, with implications for our understanding of how stars and galaxies form, the nature of Type Ia supernova (SNIa) progenitors, and the thermal history of the ICM. These heavy elements are ultimately synthesized by supernova (SN) explosions; however, the details of the sites of metal production and mechanisms that transport metals to the ICM remain unclear. To make progress, accurate abundance profiles for multiple elements extending from the cluster core out to the virial radius (r180) are required for a significant cluster sample. We propose an X-ray spectroscopic study of a carefully-chosen sample of archival Suzaku and XMM-Newton observations of 23 clusters: XMM-Newton data probe the cluster temperature and abundances out to (0.5-1)r500, while Suzaku data probe the cluster outskirts. A method devised by our team to utilize all elements with emission lines in the X-ray bandpass to measure the relative contributions of supernova explosions by direct modeling of their X-ray spectra will be applied in order to constrain the demographics of the enriching supernova population. In addition we will conduct a stacking analysis of our already existing Suzaku and XMM-Newton cluster spectra to search for weak emssion lines that are important SN diagnostics, and to look for trends with cluster mass and redshift. The funding we propose here will also support the data analysis of our recent Suzaku observations of the archetypal cluster A3112 (200 ks each on the core and outskirts). Our data analysis, intepreted using theoretical models we have developed, will enable us to constrain the star formation history, SN demographics, and nature of SNIa progenitors associated with galaxy cluster stellar populations - and, hence, directly addresess NASA s Strategic Objective 2.4.2 in Astrophysics that aims to improve the understanding of how the Universe works, and explore how it began and evolved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Junghyun; Gangwon, Jo; Jaehoon, Jung
Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined inmore » a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.« less
Multilevel SEM Strategies for Evaluating Mediation in Three-Level Data
ERIC Educational Resources Information Center
Preacher, Kristopher J.
2011-01-01
Strategies for modeling mediation effects in multilevel data have proliferated over the past decade, keeping pace with the demands of applied research. Approaches for testing mediation hypotheses with 2-level clustered data were first proposed using multilevel modeling (MLM) and subsequently using multilevel structural equation modeling (MSEM) to…
Modular and hierarchical structure of social contact networks
NASA Astrophysics Data System (ADS)
Ge, Yuanzheng; Song, Zhichao; Qiu, Xiaogang; Song, Hongbin; Wang, Yong
2013-10-01
Social contact networks exhibit overlapping qualities of communities, hierarchical structure and spatial-correlated nature. We propose a mixing pattern of modular and growing hierarchical structures to reconstruct social contact networks by using an individual’s geospatial distribution information in the real world. The hierarchical structure of social contact networks is defined based on the spatial distance between individuals, and edges among individuals are added in turn from the modular layer to the highest layer. It is a gradual process to construct the hierarchical structure: from the basic modular model up to the global network. The proposed model not only shows hierarchically increasing degree distribution and large clustering coefficients in communities, but also exhibits spatial clustering features of individual distributions. As an evaluation of the method, we reconstruct a hierarchical contact network based on the investigation data of a university. Transmission experiments of influenza H1N1 are carried out on the generated social contact networks, and results show that the constructed network is efficient to reproduce the dynamic process of an outbreak and evaluate interventions. The reproduced spread process exhibits that the spatial clustering of infection is accordant with the clustering of network topology. Moreover, the effect of individual topological character on the spread of influenza is analyzed, and the experiment results indicate that the spread is limited by individual daily contact patterns and local clustering topology rather than individual degree.
A phase field model for segregation and precipitation induced by irradiation in alloys
NASA Astrophysics Data System (ADS)
Badillo, A.; Bellon, P.; Averback, R. S.
2015-04-01
A phase field model is introduced to model the evolution of multicomponent alloys under irradiation, including radiation-induced segregation and precipitation. The thermodynamic and kinetic components of this model are derived using a mean-field model. The mobility coefficient and the contribution of chemical heterogeneity to free energy are rescaled by the cell size used in the phase field model, yielding microstructural evolutions that are independent of the cell size. A new treatment is proposed for point defect clusters, using a mixed discrete-continuous approach to capture the stochastic character of defect cluster production in displacement cascades, while retaining the efficient modeling of the fate of these clusters using diffusion equations. The model is tested on unary and binary alloy systems using two-dimensional simulations. In a unary system, the evolution of point defects under irradiation is studied in the presence of defect clusters, either pre-existing ones or those created by irradiation, and compared with rate theory calculations. Binary alloys with zero and positive heats of mixing are then studied to investigate the effect of point defect clustering on radiation-induced segregation and precipitation in undersaturated solid solutions. Lastly, irradiation conditions and alloy parameters leading to irradiation-induced homogeneous precipitation are investigated. The results are discussed in the context of experimental results reported for Ni-Si and Al-Zn undersaturated solid solutions subjected to irradiation.
NASA Astrophysics Data System (ADS)
Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2017-03-01
The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
Wang, Jin; Sun, Xiangping; Nahavandi, Saeid; Kouzani, Abbas; Wu, Yuchuan; She, Mary
2014-11-01
Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Abu-Jamous, Basel; Fa, Rui; Roberts, David J; Nandi, Asoke K
2015-06-04
Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.
Health state evaluation of shield tunnel SHM using fuzzy cluster method
NASA Astrophysics Data System (ADS)
Zhou, Fa; Zhang, Wei; Sun, Ke; Shi, Bin
2015-04-01
Shield tunnel SHM is in the path of rapid development currently while massive monitoring data processing and quantitative health grading remain a real challenge, since multiple sensors belonging to different types are employed in SHM system. This paper addressed the fuzzy cluster method based on fuzzy equivalence relationship for the health evaluation of shield tunnel SHM. The method was optimized by exporting the FSV map to automatically generate the threshold value. A new holistic health score(HHS) was proposed and its effectiveness was validated by conducting a pilot test. A case study on Nanjing Yangtze River Tunnel was presented to apply this method. Three types of indicators, namely soil pressure, pore pressure and steel strain, were used to develop the evaluation set U. The clustering results were verified by analyzing the engineering geological conditions; the applicability and validity of the proposed method was also demonstrated. Besides, the advantage of multi-factor evaluation over single-factor model was discussed by using the proposed HHS. This investigation indicated the fuzzy cluster method and HHS is capable of characterizing the fuzziness of tunnel health, and it is beneficial to clarify the tunnel health evaluation uncertainties.
Clustering approach for unsupervised segmentation of malarial Plasmodium vivax parasite
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Mohamed, Zeehaida
2017-10-01
Malaria is a global health problem, particularly in Africa and south Asia where it causes countless deaths and morbidity cases. Efficient control and prompt of this disease require early detection and accurate diagnosis due to the large number of cases reported yearly. To achieve this aim, this paper proposes an image segmentation approach via unsupervised pixel segmentation of malaria parasite to automate the diagnosis of malaria. In this study, a modified clustering algorithm namely enhanced k-means (EKM) clustering, is proposed for malaria image segmentation. In the proposed EKM clustering, the concept of variance and a new version of transferring process for clustered members are used to assist the assignation of data to the proper centre during the process of clustering, so that good segmented malaria image can be generated. The effectiveness of the proposed EKM clustering has been analyzed qualitatively and quantitatively by comparing this algorithm with two popular image segmentation techniques namely Otsu's thresholding and k-means clustering. The experimental results show that the proposed EKM clustering has successfully segmented 100 malaria images of P. vivax species with segmentation accuracy, sensitivity and specificity of 99.20%, 87.53% and 99.58%, respectively. Hence, the proposed EKM clustering can be considered as an image segmentation tool for segmenting the malaria images.
Estimating Ω from Galaxy Redshifts: Linear Flow Distortions and Nonlinear Clustering
NASA Astrophysics Data System (ADS)
Bromley, B. C.; Warren, M. S.; Zurek, W. H.
1997-02-01
We propose a method to determine the cosmic mass density Ω from redshift-space distortions induced by large-scale flows in the presence of nonlinear clustering. Nonlinear structures in redshift space, such as fingers of God, can contaminate distortions from linear flows on scales as large as several times the small-scale pairwise velocity dispersion σv. Following Peacock & Dodds, we work in the Fourier domain and propose a model to describe the anisotropy in the redshift-space power spectrum; tests with high-resolution numerical data demonstrate that the model is robust for both mass and biased galaxy halos on translinear scales and above. On the basis of this model, we propose an estimator of the linear growth parameter β = Ω0.6/b, where b measures bias, derived from sampling functions that are tuned to eliminate distortions from nonlinear clustering. The measure is tested on the numerical data and found to recover the true value of β to within ~10%. An analysis of IRAS 1.2 Jy galaxies yields β=0.8+0.4-0.3 at a scale of 1000 km s-1, which is close to optimal given the shot noise and finite size of the survey. This measurement is consistent with dynamical estimates of β derived from both real-space and redshift-space information. The importance of the method presented here is that nonlinear clustering effects are removed to enable linear correlation anisotropy measurements on scales approaching the translinear regime. We discuss implications for analyses of forthcoming optical redshift surveys in which the dispersion is more than a factor of 2 greater than in the IRAS data.
A channel differential EZW coding scheme for EEG data compression.
Dehkordi, Vahid R; Daou, Hoda; Labeau, Fabrice
2011-11-01
In this paper, a method is proposed to compress multichannel electroencephalographic (EEG) signals in a scalable fashion. Correlation between EEG channels is exploited through clustering using a k-means method. Representative channels for each of the clusters are encoded individually while other channels are encoded differentially, i.e., with respect to their respective cluster representatives. The compression is performed using the embedded zero-tree wavelet encoding adapted to 1-D signals. Simulations show that the scalable features of the scheme lead to a flexible quality/rate tradeoff, without requiring detailed EEG signal modeling.
Xia, Shang; Xue, Jing-Bo; Zhang, Xia; Hu, He-Hua; Abe, Eniola Michael; Rollinson, David; Bergquist, Robert; Zhou, Yibiao; Li, Shi-Zhu; Zhou, Xiao-Nong
2017-04-26
The prevalence of schistosomiasis remains a key public health issue in China. Jiangling County in Hubei Province is a typical lake and marshland endemic area. The pattern analysis of schistosomiasis prevalence in Jiangling County is of significant importance for promoting schistosomiasis surveillance and control in the similar endemic areas. The dataset was constructed based on the annual schistosomiasis surveillance as well the socio-economic data in Jiangling County covering the years from 2009 to 2013. A village clustering method modified from the K-mean algorithm was used to identify different types of endemic villages. For these identified village clusters, a matrix-based predictive model was developed by means of exploring the one-step backward temporal correlation inference algorithm aiming to estimate the predicative correlations of schistosomiasis prevalence among different years. Field sampling of faeces from domestic animals, as an indicator of potential schistosomiasis prevalence, was carried out and the results were used to validate the results of proposed models and methods. The prevalence of schistosomiasis in Jiangling County declined year by year. The total of 198 endemic villages in Jiangling County can be divided into four clusters with reference to the 5 years' occurrences of schistosomiasis in human, cattle and snail populations. For each identified village cluster, a predictive matrix was generated to characterize the relationships of schistosomiasis prevalence with the historic infection level as well as their associated impact factors. Furthermore, the results of sampling faeces from the front field agreed with the results of the identified clusters of endemic villages. The results of village clusters and the predictive matrix can be regard as the basis to conduct targeted measures for schistosomiasis surveillance and control. Furthermore, the proposed models and methods can be modified to investigate the schistosomiasis prevalence in other regions as well as be used for investigating other parasitic diseases.
Swarm Intelligence for Urban Dynamics Modelling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghnemat, Rawan; Bertelle, Cyrille; Duchamp, Gerard H. E.
2009-04-16
In this paper, we propose swarm intelligence algorithms to deal with dynamical and spatial organization emergence. The goal is to model and simulate the developement of spatial centers using multi-criteria. We combine a decentralized approach based on emergent clustering mixed with spatial constraints or attractions. We propose an extension of the ant nest building algorithm with multi-center and adaptive process. Typically, this model is suitable to analyse and simulate urban dynamics like gentrification or the dynamics of the cultural equipment in urban area.
Swarm Intelligence for Urban Dynamics Modelling
NASA Astrophysics Data System (ADS)
Ghnemat, Rawan; Bertelle, Cyrille; Duchamp, Gérard H. E.
2009-04-01
In this paper, we propose swarm intelligence algorithms to deal with dynamical and spatial organization emergence. The goal is to model and simulate the developement of spatial centers using multi-criteria. We combine a decentralized approach based on emergent clustering mixed with spatial constraints or attractions. We propose an extension of the ant nest building algorithm with multi-center and adaptive process. Typically, this model is suitable to analyse and simulate urban dynamics like gentrification or the dynamics of the cultural equipment in urban area.
NASA Astrophysics Data System (ADS)
Kim, Chan Moon; Parnichkun, Manukid
2017-11-01
Coagulation is an important process in drinking water treatment to attain acceptable treated water quality. However, the determination of coagulant dosage is still a challenging task for operators, because coagulation is nonlinear and complicated process. Feedback control to achieve the desired treated water quality is difficult due to lengthy process time. In this research, a hybrid of k-means clustering and adaptive neuro-fuzzy inference system ( k-means-ANFIS) is proposed for the settled water turbidity prediction and the optimal coagulant dosage determination using full-scale historical data. To build a well-adaptive model to different process states from influent water, raw water quality data are classified into four clusters according to its properties by a k-means clustering technique. The sub-models are developed individually on the basis of each clustered data set. Results reveal that the sub-models constructed by a hybrid k-means-ANFIS perform better than not only a single ANFIS model, but also seasonal models by artificial neural network (ANN). The finally completed model consisting of sub-models shows more accurate and consistent prediction ability than a single model of ANFIS and a single model of ANN based on all five evaluation indices. Therefore, the hybrid model of k-means-ANFIS can be employed as a robust tool for managing both treated water quality and production costs simultaneously.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Competitive cluster growth in complex networks.
Moreira, André A; Paula, Demétrius R; Costa Filho, Raimundo N; Andrade, José S
2006-06-01
In this work we propose an idealized model for competitive cluster growth in complex networks. Each cluster can be thought of as a fraction of a community that shares some common opinion. Our results show that the cluster size distribution depends on the particular choice for the topology of the network of contacts among the agents. As an application, we show that the cluster size distributions obtained when the growth process is performed on hierarchical networks, e.g., the Apollonian network, have a scaling form similar to what has been observed for the distribution of a number of votes in an electoral process. We suggest that this similarity may be due to the fact that social networks involved in the electoral process may also possess an underlining hierarchical structure.
Where Water Is Oxidized to Dioxygen: Structure of the Photosynthetic Mn4Ca Cluster
Yano, Junko; Kern, Jan; Sauer, Kenneth; Latimer, Matthew J.; Pushkar, Yulia; Biesiadka, Jacek; Loll, Bernhard; Saenger, Wolfram; Messinger, Johannes; Zouni, Athina; Yachandra, Vittal K.
2014-01-01
The oxidation of water to dioxygen is catalyzed within photosystem II (PSII) by a Mn4Ca cluster, the structure of which remains elusive. Polarized extended x-ray absorption fine structure (EXAFS) measurements on PSII single crystals constrain the Mn4Ca cluster geometry to a set of three similar high-resolution structures. Combining polarized EXAFS and x-ray diffraction data, the cluster was placed within PSII, taking into account the overall trend of the electron density of the metal site and the putative ligands. The structure of the cluster from the present study is unlike either the 3.0 or 3.5 angstrom–resolution x-ray structures or other previously proposed models. PMID:17082458
NASA Astrophysics Data System (ADS)
Meng, Xiaocheng; Che, Renfei; Gao, Shi; He, Juntao
2018-04-01
With the advent of large data age, power system research has entered a new stage. At present, the main application of large data in the power system is the early warning analysis of the power equipment, that is, by collecting the relevant historical fault data information, the system security is improved by predicting the early warning and failure rate of different kinds of equipment under certain relational factors. In this paper, a method of line failure rate warning is proposed. Firstly, fuzzy dynamic clustering is carried out based on the collected historical information. Considering the imbalance between the attributes, the coefficient of variation is given to the corresponding weights. And then use the weighted fuzzy clustering to deal with the data more effectively. Then, by analyzing the basic idea and basic properties of the relational analysis model theory, the gray relational model is improved by combining the slope and the Deng model. And the incremental composition and composition of the two sequences are also considered to the gray relational model to obtain the gray relational degree between the various samples. The failure rate is predicted according to the principle of weighting. Finally, the concrete process is expounded by an example, and the validity and superiority of the proposed method are verified.
SEMIPARAMETRIC EFFICIENT ESTIMATION FOR SHARED-FRAILTY MODELS WITH DOUBLY-CENSORED CLUSTERED DATA
Wang, Jane-Ling
2018-01-01
In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed “doubly-censored data”. This model extends current survival literature by broadening the application of frailty models from right-censoring to a more complicated situation with additional left censoring. Our approach is motivated by a recent Hepatitis B study where the sample consists of families. We adopt a likelihood approach that aims at the nonparametric maximum likelihood estimators (NPMLE). A new algorithm is proposed, which not only works well for clustered data but also improve over existing algorithm for independent and doubly-censored data, a special case when the frailty variable is a constant equal to one. This special case is well known to be a computational challenge due to the left censoring feature of the data. The new algorithm not only resolves this challenge but also accommodate the additional frailty variable effectively. Asymptotic properties of the NPMLE are established along with semi-parametric efficiency of the NPMLE for the finite-dimensional parameters. The consistency of Bootstrap estimators for the standard errors of the NPMLE is also discussed. We conducted some simulations to illustrate the numerical performance and robustness of the proposed algorithm, which is also applied to the Hepatitis B data. PMID:29527068
Lopez-Meyer, Paulo; Schuckers, Stephanie; Makeyev, Oleksandr; Fontana, Juan M; Sazonov, Edward
2012-09-01
The number of distinct foods consumed in a meal is of significant clinical concern in the study of obesity and other eating disorders. This paper proposes the use of information contained in chewing and swallowing sequences for meal segmentation by food types. Data collected from experiments of 17 volunteers were analyzed using two different clustering techniques. First, an unsupervised clustering technique, Affinity Propagation (AP), was used to automatically identify the number of segments within a meal. Second, performance of the unsupervised AP method was compared to a supervised learning approach based on Agglomerative Hierarchical Clustering (AHC). While the AP method was able to obtain 90% accuracy in predicting the number of food items, the AHC achieved an accuracy >95%. Experimental results suggest that the proposed models of automatic meal segmentation may be utilized as part of an integral application for objective Monitoring of Ingestive Behavior in free living conditions.
Improving the Statistical Modeling of the TRMM Extreme Precipitation Monitoring System
NASA Astrophysics Data System (ADS)
Demirdjian, L.; Zhou, Y.; Huffman, G. J.
2016-12-01
This project improves upon an existing extreme precipitation monitoring system based on the Tropical Rainfall Measuring Mission (TRMM) daily product (3B42) using new statistical models. The proposed system utilizes a regional modeling approach, where data from similar grid locations are pooled to increase the quality and stability of the resulting model parameter estimates to compensate for the short data record. The regional frequency analysis is divided into two stages. In the first stage, the region defined by the TRMM measurements is partitioned into approximately 27,000 non-overlapping clusters using a recursive k-means clustering scheme. In the second stage, a statistical model is used to characterize the extreme precipitation events occurring in each cluster. Instead of utilizing the block-maxima approach used in the existing system, where annual maxima are fit to the Generalized Extreme Value (GEV) probability distribution at each cluster separately, the present work adopts the peak-over-threshold (POT) method of classifying points as extreme if they exceed a pre-specified threshold. Theoretical considerations motivate the use of the Generalized-Pareto (GP) distribution for fitting threshold exceedances. The fitted parameters can be used to construct simple and intuitive average recurrence interval (ARI) maps which reveal how rare a particular precipitation event is given its spatial location. The new methodology eliminates much of the random noise that was produced by the existing models due to a short data record, producing more reasonable ARI maps when compared with NOAA's long-term Climate Prediction Center (CPC) ground based observations. The resulting ARI maps can be useful for disaster preparation, warning, and management, as well as increased public awareness of the severity of precipitation events. Furthermore, the proposed methodology can be applied to various other extreme climate records.
Cooperativity in self-limiting equilibrium self-associating systems
NASA Astrophysics Data System (ADS)
Freed, Karl F.
2012-11-01
A wide variety of highly cooperative self-assembly processes in biological and synthetic systems involve the assembly of a large number (m) of units into clusters, with m narrowly peaked about a large size m0 ≫ 1 and with a second peak centered about the m = 1 unassembled monomers. While very specific models have been proposed for the assembly of, for example, viral capsids and core-shell micelles of ß-casein, no available theory describes a thermodynamically general mechanism for this double peaked, highly cooperative equilibrium assembly process. This study provides a general mechanism for these cooperative processes by developing a minimal Flory-Huggins type theory. Beginning from the simplest non-cooperative, free association model in which the equilibrium constant for addition of a monomer to a cluster is independent of cluster size, the new model merely allows more favorable growth for clusters of intermediate sizes. The theory is illustrated by computing the phase diagram for cases of self-assembly on cooling or heating and for the mass distribution of the two phases.
Critical exponents of the explosive percolation transition
NASA Astrophysics Data System (ADS)
da Costa, R. A.; Dorogovtsev, S. N.; Goltsev, A. V.; Mendes, J. F. F.
2014-04-01
In a new type of percolation phase transition, which was observed in a set of nonequilibrium models, each new connection between vertices is chosen from a number of possibilities by an Achlioptas-like algorithm. This causes preferential merging of small components and delays the emergence of the percolation cluster. First simulations led to a conclusion that a percolation cluster in this irreversible process is born discontinuously, by a discontinuous phase transition, which results in the term "explosive percolation transition." We have shown that this transition is actually continuous (second order) though with an anomalously small critical exponent of the percolation cluster. Here we propose an efficient numerical method enabling us to find the critical exponents and other characteristics of this second-order transition for a representative set of explosive percolation models with different number of choices. The method is based on gluing together the numerical solutions of evolution equations for the cluster size distribution and power-law asymptotics. For each of the models, with high precision, we obtain critical exponents and the critical point.
Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K
2003-11-01
Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Theory and modeling of particles with DNA-mediated interactions
NASA Astrophysics Data System (ADS)
Licata, Nicholas A.
2008-05-01
In recent years significant attention has been attracted to proposals which utilize DNA for nanotechnological applications. Potential applications of these ideas range from the programmable self-assembly of colloidal crystals, to biosensors and nanoparticle based drug delivery platforms. In Chapter I we introduce the system, which generically consists of colloidal particles functionalized with specially designed DNA markers. The sequence of bases on the DNA markers determines the particle type. Due to the hybridization between complementary single-stranded DNA, specific, type-dependent interactions can be introduced between particles by choosing the appropriate DNA marker sequences. In Chapter II we develop a statistical mechanical description of the aggregation and melting behavior of particles with DNA-mediated interactions. In Chapter III a model is proposed to describe the dynamical departure and diffusion of particles which form reversible key-lock connections. In Chapter IV we propose a method to self-assemble nanoparticle clusters using DNA scaffolds. A natural extension is discussed in Chapter V, the programmable self-assembly of nanoparticle clusters where the desired cluster geometry is encoded using DNA-mediated interactions. In Chapter VI we consider a nanoparticle based drug delivery platform for targeted, cell specific chemotherapy. In Chapter VII we present prospects for future research: the connection between DNA-mediated colloidal crystallization and jamming, and the inverse problem in self-assembly.
Image quality guided approach for adaptive modelling of biometric intra-class variations
NASA Astrophysics Data System (ADS)
Abboud, Ali J.; Jassim, Sabah A.
2010-04-01
The high intra-class variability of acquired biometric data can be attributed to several factors such as quality of acquisition sensor (e.g. thermal), environmental (e.g. lighting), behavioural (e.g. change face pose). Such large fuzziness of biometric data can cause a big difference between an acquired and stored biometric data that will eventually lead to reduced performance. Many systems store multiple templates in order to account for such variations in the biometric data during enrolment stage. The number and typicality of these templates are the most important factors that affect system performance than other factors. In this paper, a novel offline approach is proposed for systematic modelling of intra-class variability and typicality in biometric data by regularly selecting new templates from a set of available biometric images. Our proposed technique is a two stage algorithm whereby in the first stage image samples are clustered in terms of their image quality profile vectors, rather than their biometric feature vectors, and in the second stage a per cluster template is selected from a small number of samples in each clusters to create an ultimate template sets. These experiments have been conducted on five face image databases and their results will demonstrate the effectiveness of proposed quality guided approach.
Identify High-Quality Protein Structural Models by Enhanced K-Means.
Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang
2017-01-01
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.
Identify High-Quality Protein Structural Models by Enhanced K-Means
Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang
2017-01-01
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198
Carbon Fibers Conductivity Studies
NASA Technical Reports Server (NTRS)
Yang, C. Y.; Butkus, A. M.
1980-01-01
In an attempt to understand the process of electrical conduction in polyacrylonitrile (PAN)-based carbon fibers, calculations were carried out on cluster models of the fiber consisting of carbon, nitrogen, and hydrogen atoms using the modified intermediate neglect of differential overlap (MINDO) molecular orbital (MO) method. The models were developed based on the assumption that PAN carbon fibers obtained with heat treatment temperatures (HTT) below 1000 C retain nitrogen in a graphite-like lattice. For clusters modeling an edge nitrogen site, analysis of the occupied MO's indicated an electron distribution similar to that of graphite. A similar analysis for the somewhat less stable interior nitrogen site revealed a partially localized II electron distribution around the nitrogen atom. The differences in bonding trends and structural stability between edge and interior nitrogen clusters led to a two-step process proposed for nitrogen evolution with increasing HTT.
Riemannian multi-manifold modeling and clustering in brain networks
NASA Astrophysics Data System (ADS)
Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.
2017-08-01
This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2018-01-30
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
A stress sensitivity model for the permeability of porous media based on bi-dispersed fractal theory
NASA Astrophysics Data System (ADS)
Tan, X.-H.; Liu, C.-Y.; Li, X.-P.; Wang, H.-Q.; Deng, H.
A stress sensitivity model for the permeability of porous media based on bidispersed fractal theory is established, considering the change of the flow path, the fractal geometry approach and the mechanics of porous media. It is noted that the two fractal parameters of the porous media construction perform differently when the stress changes. The tortuosity fractal dimension of solid cluster DcTσ become bigger with an increase of stress. However, the pore fractal dimension of solid cluster Dcfσ and capillary bundle Dpfσ remains the same with an increase of stress. The definition of normalized permeability is introduced for the analyzation of the impacts of stress sensitivity on permeability. The normalized permeability is related to solid cluster tortuosity dimension, pore fractal dimension, solid cluster maximum diameter, Young’s modulus and Poisson’s ratio. Every parameter has clear physical meaning without the use of empirical constants. Predictions of permeability of the model is accordant with the obtained experimental data. Thus, the proposed model can precisely depict the flow of fluid in porous media under stress.
A soft X-ray map of the Perseus cluster of galaxies
NASA Technical Reports Server (NTRS)
Cash, W.; Malina, R. F.; Wolff, R. S.
1976-01-01
A 0.5-3-keV X-ray map of the Perseus cluster of galaxies is presented. The map shows a region of strong emission centered near NGC 1275 plus a highly elongated emission region which lies along the line of bright galaxies that dominates the core of the cluster. The data are compared with various models that include point and diffuse sources. One model which adequately represents the data is the superposition of a point source at NGC 1275 and an isothermal ellipsoid resulting from the bremsstrahlung emission of cluster gas. The ellipsoid has a major core radius of 20.5 arcmin and a minor core radius of 5.5 arcmin, consistent with the values obtained from galaxy counts. All acceptable models provide evidence for a compact source (less than 3 arcmin FWHM) at NGC 1275 containing about 25% of the total emission. Since the diffuse X-ray and radio components have radically different morphologies, it is unlikely that the emissions arise from a common source, as proposed in inverse-Compton models.
Novel layered clustering-based approach for generating ensemble of classifiers.
Rahman, Ashfaqur; Verma, Brijesh
2011-05-01
This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.
Coevolutionary dynamics with clustering behaviors on cyclic competition
NASA Astrophysics Data System (ADS)
Dong, Linrong; Yang, Guangcan
2012-05-01
We propose a dynamic model for describing clustering behaviors on a cyclic game, in which the same species form a cluster to compete. The rates of consuming the prey depend not only on the individual competing ability v, but also on the two interacting cluster’s sizes. The fragmentation and coagulation rates of the clusters are related to the cohesive strength among the individuals. A new parameter u is introduced to indicate the uniting degree. We find that the probability distribution of the clustering sizes is almost a power law in a large regime specified by the two parameters, which reflects the scale-free behavior in complex systems. In addition, the exponential magnitudes are mostly in the range of real social systems. Our simulation shows that clustering promotes biodiversity. At steady state, the amounts about the three species evolve tempestuously with asymmetric period; the aggregations about big size’s clusters to compete are obvious and on-off intermittence.
Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses
ERIC Educational Resources Information Center
Huang, Guan-Hua; Wang, Su-Mei; Hsu, Chung-Chu
2011-01-01
Statisticians typically estimate the parameters of latent class and latent profile models using the Expectation-Maximization algorithm. This paper proposes an alternative two-stage approach to model fitting. The first stage uses the modified k-means and hierarchical clustering algorithms to identify the latent classes that best satisfy the…
Chen, Qing; Zhang, Jinxiu; Hu, Ze
2017-01-01
This article investigates the dynamic topology control problem of satellite cluster networks (SCNs) in Earth observation (EO) missions by applying a novel metric of stability for inter-satellite links (ISLs). The properties of the periodicity and predictability of satellites’ relative position are involved in the link cost metric which is to give a selection criterion for choosing the most reliable data routing paths. Also, a cooperative work model with reliability is proposed for the situation of emergency EO missions. Based on the link cost metric and the proposed reliability model, a reliability assurance topology control algorithm and its corresponding dynamic topology control (RAT) strategy are established to maximize the stability of data transmission in the SCNs. The SCNs scenario is tested through some numeric simulations of the topology stability of average topology lifetime and average packet loss rate. Simulation results show that the proposed reliable strategy applied in SCNs significantly improves the data transmission performance and prolongs the average topology lifetime. PMID:28241474
Chen, Qing; Zhang, Jinxiu; Hu, Ze
2017-02-23
This article investigates the dynamic topology control problemof satellite cluster networks (SCNs) in Earth observation (EO) missions by applying a novel metric of stability for inter-satellite links (ISLs). The properties of the periodicity and predictability of satellites' relative position are involved in the link cost metric which is to give a selection criterion for choosing the most reliable data routing paths. Also, a cooperative work model with reliability is proposed for the situation of emergency EO missions. Based on the link cost metric and the proposed reliability model, a reliability assurance topology control algorithm and its corresponding dynamic topology control (RAT) strategy are established to maximize the stability of data transmission in the SCNs. The SCNs scenario is tested through some numeric simulations of the topology stability of average topology lifetime and average packet loss rate. Simulation results show that the proposed reliable strategy applied in SCNs significantly improves the data transmission performance and prolongs the average topology lifetime.
Competitive Deep-Belief Networks for Underwater Acoustic Target Recognition
Shen, Sheng; Yao, Xiaohui; Sheng, Meiping; Wang, Chen
2018-01-01
Underwater acoustic target recognition based on ship-radiated noise belongs to the small-sample-size recognition problems. A competitive deep-belief network is proposed to learn features with more discriminative information from labeled and unlabeled samples. The proposed model consists of four stages: (1) A standard restricted Boltzmann machine is pretrained using a large number of unlabeled data to initialize its parameters; (2) the hidden units are grouped according to categories, which provides an initial clustering model for competitive learning; (3) competitive training and back-propagation algorithms are used to update the parameters to accomplish the task of clustering; (4) by applying layer-wise training and supervised fine-tuning, a deep neural network is built to obtain features. Experimental results show that the proposed method can achieve classification accuracy of 90.89%, which is 8.95% higher than the accuracy obtained by the compared methods. In addition, the highest accuracy of our method is obtained with fewer features than other methods. PMID:29570642
A method for analyzing clustered interval-censored data based on Cox's model.
Kor, Chew-Teng; Cheng, Kuang-Fu; Chen, Yi-Hau
2013-02-28
Methods for analyzing interval-censored data are well established. Unfortunately, these methods are inappropriate for the studies with correlated data. In this paper, we focus on developing a method for analyzing clustered interval-censored data. Our method is based on Cox's proportional hazard model with piecewise-constant baseline hazard function. The correlation structure of the data can be modeled by using Clayton's copula or independence model with proper adjustment in the covariance estimation. We establish estimating equations for the regression parameters and baseline hazards (and a parameter in copula) simultaneously. Simulation results confirm that the point estimators follow a multivariate normal distribution, and our proposed variance estimations are reliable. In particular, we found that the approach with independence model worked well even when the true correlation model was derived from Clayton's copula. We applied our method to a family-based cohort study of pandemic H1N1 influenza in Taiwan during 2009-2010. Using the proposed method, we investigate the impact of vaccination and family contacts on the incidence of pH1N1 influenza. Copyright © 2012 John Wiley & Sons, Ltd.
Multilevel covariance regression with correlated random effects in the mean and variance structure.
Quintero, Adrian; Lesaffre, Emmanuel
2017-09-01
Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Theory and modeling of particles with DNA-mediated interactions
NASA Astrophysics Data System (ADS)
Licata, Nicholas A.
In recent years significant attention has been attracted to proposals which utilize DNA for nanotechnological applications. Potential applications of these ideas range from the programmable self-assembly of colloidal crystals, to biosensors and nanoparticle based drug delivery platforms. In Chapter I we introduce the system, which generically consists of colloidal particles functionalized with specially designed DNA markers. The sequence of bases on the DNA markers determines the particle type. Due to the hybridization between complementary single-stranded DNA, specific, type-dependent interactions can be introduced between particles by choosing the appropriate DNA marker sequences. In Chapter II we develop a statistical mechanical description of the aggregation and melting behavior of particles with DNA-mediated interactions. A quantitative comparison between the theory and experiments is made by calculating the experimentally observed melting profile. In Chapter III a model is proposed to describe the dynamical departure and diffusion of particles which form reversible key-lock connections. The model predicts a crossover from localized to diffusive behavior. The random walk statistics for the particles' in plane diffusion is discussed. The lateral motion is analogous to dispersive transport in disordered semiconductors, ranging from standard diffusion with a renormalized diffusion coefficient to anomalous, subdiffusive behavior. In Chapter IV we propose a method to self-assemble nanoparticle clusters using DNA scaffolds. An optimal concentration ratio is determined for the experimental implementation of our self-assembly proposal. A natural extension is discussed in Chapter V, the programmable self-assembly of nanoparticle clusters where the desired cluster geometry is encoded using DNA-mediated interactions. We determine the probability that the system self-assembles the desired cluster geometry, and discuss the connections to jamming in granular and colloidal systems. In Chapter VI we consider a nanoparticle based drug delivery platform for targeted, cell specific chemotherapy. A key-lock model is proposed to describe the results of in-vitro experiments, and the situation in-vivo is discussed. The cooperative binding, and hence the specificity to cancerous cells, is kinetically limited. The implications for optimizing the design of nanoparticle based drug delivery platforms is discussed. In Chapter VII we present prospects for future research: the connection between DNA-mediated colloidal crystallization and jamming, and the inverse problem in self-assembly.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
BiP clustering facilitates protein folding in the endoplasmic reticulum.
Griesemer, Marc; Young, Carissa; Robinson, Anne S; Petzold, Linda
2014-07-01
The chaperone BiP participates in several regulatory processes within the endoplasmic reticulum (ER): translocation, protein folding, and ER-associated degradation. To facilitate protein folding, a cooperative mechanism known as entropic pulling has been proposed to demonstrate the molecular-level understanding of how multiple BiP molecules bind to nascent and unfolded proteins. Recently, experimental evidence revealed the spatial heterogeneity of BiP within the nuclear and peripheral ER of S. cerevisiae (commonly referred to as 'clusters'). Here, we developed a model to evaluate the potential advantages of accounting for multiple BiP molecules binding to peptides, while proposing that BiP's spatial heterogeneity may enhance protein folding and maturation. Scenarios were simulated to gauge the effectiveness of binding multiple chaperone molecules to peptides. Using two metrics: folding efficiency and chaperone cost, we determined that the single binding site model achieves a higher efficiency than models characterized by multiple binding sites, in the absence of cooperativity. Due to entropic pulling, however, multiple chaperones perform in concert to facilitate the resolubilization and ultimate yield of folded proteins. As a result of cooperativity, multiple binding site models used fewer BiP molecules and maintained a higher folding efficiency than the single binding site model. These insilico investigations reveal that clusters of BiP molecules bound to unfolded proteins may enhance folding efficiency through cooperative action via entropic pulling.
A Study Of Anomalous Stars and Binary Populations Within Open Clusters: Tests Of Theoretical Models
NASA Astrophysics Data System (ADS)
Geller, Aaron M.; Mathieu, Robert D.; Braden, Ella; Latham, David W.
2008-08-01
``Anomalous'' stars, such as blue stragglers and more recently sub- subgiants, have been an enduring challenge for stellar evolution theory. Recently it has become clear that in star clusters these systems are closely linked to the binary star populations. Furthermore, through advances in N-body modeling, we have come to realize that stellar dynamical processes play a central role in the formation of such anomalous stars. Indeed, these stars trace the interface between the classical fields of stellar evolution and stellar dynamics. We propose a thesis study to directly probe this interface through high-precision radial-velocity measurements of the anomalous stars and the binary populations in four open clusters. We have selected NGC 188 (7 Gyr), M67 (NGC 2682; 4 Gyr), NGC 6819 (2.4 Gyr), and M35 (NGC 2168; 150 Myr), as these span a wide range in age, are rich enough to provide statistically significant conclusions, and already have an extensive base of kinematic, spectroscopic, and photometric observations from the WIYN Open Cluster Study. Our proposed observations will define the spectroscopic hard binary populations (fraction, frequency distributions of orbital parameters, mass ratios) for orbital periods approaching the hard-soft boundary. These observations will also provide a comprehensive survey for anomalous stars, including secure establishment of their cluster membership. These data will allow us to perform the first detailed comparison to predictions from open cluster simulations of the binary populations among normal and anomalous stars, and thereby to constrain the evolutionary paths from one to the other.
A Study Of Anomalous Stars and Binary Populations Within Open Clusters: Tests Of Theoretical Models
NASA Astrophysics Data System (ADS)
Geller, Aaron M.; Mathieu, Robert D.; Gosnell, Natalie; Latham, David W.
2009-02-01
``Anomalous'' stars, such as blue stragglers and more recently sub- subgiants, have been an enduring challenge for stellar evolution theory. Recently it has become clear that in star clusters these systems are closely linked to the binary star populations. Furthermore, through advances in N-body modeling, we have come to realize that stellar dynamical processes play a central role in the formation of such anomalous stars. Indeed, these stars trace the interface between the classical fields of stellar evolution and stellar dynamics. We propose a thesis study to directly probe this interface through high-precision radial-velocity measurements of the anomalous stars and the binary populations in four open clusters. We have selected NGC 188 (7 Gyr), M67 (NGC 2682; 4 Gyr), NGC 6819 (2.4 Gyr), and M35 (NGC 2168; 150 Myr), as these span a wide range in age, are rich enough to provide statistically significant conclusions, and already have an extensive base of kinematic, spectroscopic, and photometric observations from the WIYN Open Cluster Study. Our proposed observations will define the spectroscopic hard binary populations (fraction, frequency distributions of orbital parameters, mass ratios) for orbital periods approaching the hard-soft boundary. These observations will also provide a comprehensive survey for anomalous stars, including secure establishment of their cluster membership. These data will allow us to perform the first detailed comparison to predictions from open cluster simulations of the binary populations among normal and anomalous stars, and thereby to constrain the evolutionary paths from one to the other.
A Study Of Anomalous Stars and Binary Populations Within Open Clusters: Tests Of Theoretical Models
NASA Astrophysics Data System (ADS)
Geller, Aaron M.; Mathieu, Robert D.; Braden, Ella; Latham, David W.
2008-02-01
``Anomalous'' stars, such as blue stragglers and more recently sub- subgiants, have been an enduring challenge for stellar evolution theory. Recently it has become clear that in star clusters these systems are closely linked to the binary star populations. Furthermore, through advances in N-body modeling, we have come to realize that stellar dynamical processes play a central role in the formation of such anomalous stars. Indeed, these stars trace the interface between the classical fields of stellar evolution and stellar dynamics. We propose a thesis study to directly probe this interface through high-precision radial-velocity measurements of the anomalous stars and the binary populations in four open clusters. We have selected NGC 188 (7 Gyr), M67 (NGC 2682; 4 Gyr), NGC 6819 (2.4 Gyr), and M35 (NGC 2168; 150 Myr), as these span a wide range in age, are rich enough to provide statistically significant conclusions, and already have an extensive base of kinematic, spectroscopic, and photometric observations from the WIYN Open Cluster Study. Our proposed observations will define the spectroscopic hard binary populations (fraction, frequency distributions of orbital parameters, mass ratios) for orbital periods approaching the hard-soft boundary. These observations will also provide a comprehensive survey for anomalous stars, including secure establishment of their cluster membership. These data will allow us to perform the first detailed comparison to predictions from open cluster simulations of the binary populations among normal and anomalous stars, and thereby to constrain the evolutionary paths from one to the other.
Language Acquisition and Machine Learning.
1986-02-01
machine learning and examine its implications for computational models of language acquisition. As a framework for understanding this research, the authors propose four component tasks involved in learning from experience-aggregation, clustering, characterization, and storage. They then consider four common problems studied by machine learning researchers-learning from examples, heuristics learning, conceptual clustering, and learning macro-operators-describing each in terms of our framework. After this, they turn to the problem of grammar
A self-adapting herding model: The agent judge-abilities influence the dynamic behaviors
NASA Astrophysics Data System (ADS)
Dong, Linrong
2008-10-01
We propose a self-adapting herding model, in which the financial markets consist of agent clusters with different sizes and market desires. The ratio of successful exchange and merger depends on the volatility of the market and the market desires of the agent clusters. The desires are assigned in term of the wealth of the agent clusters when they merge. After an exchange, the beneficial cluster’s desire keeps on the same, the losing one’s desire is altered which is correlative with the agent judge-ability. A parameter R is given to all agents to denote the judge-ability. The numerical calculation shows that the dynamic behaviors of the market are influenced distinctly by R, which includes the exponential magnitudes of the probability distribution of sizes of the agent clusters and the volatility autocorrelation of the returns, the intensity and frequency of the volatility.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kristensen, Lars E.; Bergin, Edwin A., E-mail: lkristensen@cfa.harvard.edu
2015-07-10
Most low-mass protostars form in clusters, in particular high-mass clusters; however, how low-mass stars form in high-mass clusters and what the mass distribution is are still open questions both in our own Galaxy and elsewhere. To access the population of forming embedded low-mass protostars observationally, we propose using molecular outflows as tracers. Because the outflow emission scales with mass, the effective contrast between low-mass protostars and their high-mass cousins is greatly lowered. In particular, maps of methanol emission at 338.4 GHz (J = 7{sub 0}–6{sub 0} A{sup +}) in low-mass clusters illustrate that this transition is an excellent probe ofmore » the low-mass population. We present here a model of a forming cluster where methanol emission is assigned to every embedded low-mass protostar. The resulting model image of methanol emission is compared to recent ALMA observations toward a high-mass cluster and the similarity is striking: the toy model reproduces observations to better than a factor of two and suggests that approximately 50% of the total flux originates in low-mass outflows. Future fine-tuning of the model will eventually make it a tool for interpreting the embedded low-mass population of distant regions within our own Galaxy and ultimately higher-redshift starburst galaxies, not just for methanol emission but also water and high-J CO.« less
Asteroid clusters similar to asteroid pairs
NASA Astrophysics Data System (ADS)
Pravec, P.; Fatka, P.; Vokrouhlický, D.; Scheeres, D. J.; Kušnirák, P.; Hornoch, K.; Galád, A.; Vraštil, J.; Pray, D. P.; Krugly, Yu. N.; Gaftonyuk, N. M.; Inasaridze, R. Ya.; Ayvazian, V. R.; Kvaratskhelia, O. I.; Zhuzhunadze, V. T.; Husárik, M.; Cooney, W. R.; Gross, J.; Terrell, D.; Világi, J.; Kornoš, L.; Gajdoš, Š.; Burkhonov, O.; Ehgamberdiev, Sh. A.; Donchev, Z.; Borisov, G.; Bonev, T.; Rumyantsev, V. V.; Molotov, I. E.
2018-04-01
We studied the membership, size ratio and rotational properties of 13 asteroid clusters consisting of between 3 and 19 known members that are on similar heliocentric orbits. By backward integrations of their orbits, we confirmed their cluster membership and estimated times elapsed since separation of the secondaries (the smaller cluster members) from the primary (i.e., cluster age) that are between 105 and a few 106 years. We ran photometric observations for all the cluster primaries and a sample of secondaries and we derived their accurate absolute magnitudes and rotation periods. We found that 11 of the 13 clusters follow the same trend of primary rotation period vs mass ratio as asteroid pairs that was revealed by Pravec et al. (2010). We generalized the model of the post-fission system for asteroid pairs by Pravec et al. (2010) to a system of N components formed by rotational fission and we found excellent agreement between the data for the 11 asteroid clusters and the prediction from the theory of their formation by rotational fission. The two exceptions are the high-mass ratio (q > 0.7) clusters of (18777) Hobson and (22280) Mandragora for which a different formation mechanism is needed. Two candidate mechanisms for formation of more than one secondary by rotational fission were published: the secondary fission process proposed by Jacobson and Scheeres (2011) and a cratering collision event onto a nearly critically rotating primary proposed by Vokrouhlický et al. (2017). It will have to be revealed from future studies which of the clusters were formed by one or the other process. To that point, we found certain further interesting properties and features of the asteroid clusters that place constraints on the theories of their formation, among them the most intriguing being the possibility of a cascade disruption for some of the clusters.
Cluster Dynamics Modeling with Bubble Nucleation, Growth and Coalescence
DOE Office of Scientific and Technical Information (OSTI.GOV)
de Almeida, Valmor F.; Blondel, Sophie; Bernholdt, David E.
The topic of this communication pertains to defect formation in irradiated solids such as plasma-facing tungsten submitted to helium implantation in fusion reactor com- ponents, and nuclear fuel (metal and oxides) submitted to volatile ssion product generation in nuclear reactors. The purpose of this progress report is to describe ef- forts towards addressing the prediction of long-time evolution of defects via continuum cluster dynamics simulation. The di culties are twofold. First, realistic, long-time dynamics in reactor conditions leads to a non-dilute di usion regime which is not accommodated by the prevailing dilute, stressless cluster dynamics theory. Second, long-time dynamics callsmore » for a large set of species (ideally an in nite set) to capture all possible emerging defects, and this represents a computational bottleneck. Extensions beyond the dilute limit is a signi cant undertaking since no model has been advanced to extend cluster dynamics to non-dilute, deformable conditions. Here our proposed approach to model the non-dilute limit is to monitor the appearance of a spatially localized void volume fraction in the solid matrix with a bell shape pro le and insert an explicit geometrical bubble onto the support of the bell function. The newly cre- ated internal moving boundary provides the means to account for the interfacial ux of mobile species into the bubble, and the growth of bubbles allows for coalescence phenomena which captures highly non-dilute interactions. We present a preliminary interfacial kinematic model with associated interfacial di usion transport to follow the evolution of the bubble in any number of spatial dimensions and any number of bubbles, which can be further extended to include a deformation theory. Finally we comment on a computational front-tracking method to be used in conjunction with conventional cluster dynamics simulations in the non-dilute model proposed.« less
Reconciling mass functions with the star-forming main sequence via mergers
NASA Astrophysics Data System (ADS)
Steinhardt, Charles L.; Yurk, Dominic; Capak, Peter
2017-06-01
We combine star formation along the 'main sequence', quiescence and clustering and merging to produce an empirical model for the evolution of individual galaxies. Main-sequence star formation alone would significantly steepen the stellar mass function towards low redshift, in sharp conflict with observation. However, a combination of star formation and merging produces a consistent result for correct choice of the merger rate function. As a result, we are motivated to propose a model in which hierarchical merging is disconnected from environmentally independent star formation. This model can be tested via correlation functions and would produce new constraints on clustering and merging.
Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita
2016-04-01
Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
Emotional disorders: cluster 4 of the proposed meta-structure for DSM-V and ICD-11.
Goldberg, D P; Krueger, R F; Andrews, G; Hobbs, M J
2009-12-01
The extant major psychiatric classifications DSM-IV, and ICD-10, are atheoretical and largely descriptive. Although this achieves good reliability, the validity of a medical diagnosis would be greatly enhanced by an understanding of risk factors and clinical manifestations. In an effort to group mental disorders on the basis of aetiology, five clusters have been proposed. This paper considers the validity of the fourth cluster, emotional disorders, within that proposal. We reviewed the literature in relation to 11 validating criteria proposed by a Study Group of the DSM-V Task Force, as applied to the cluster of emotional disorders. An emotional cluster of disorders identified using the 11 validators is feasible. Negative affectivity is the defining feature of the emotional cluster. Although there are differences between disorders in the remaining validating criteria, there are similarities that support the feasibility of an emotional cluster. Strong intra-cluster co-morbidity may reflect the action of common risk factors and also shared higher-order symptom dimensions in these emotional disorders. Emotional disorders meet many of the salient criteria proposed by the Study Group of the DSM-V Task Force to suggest a classification cluster.
Rajab, Maher I
2011-11-01
Since the introduction of epiluminescence microscopy (ELM), image analysis tools have been extended to the field of dermatology, in an attempt to algorithmically reproduce clinical evaluation. Accurate image segmentation of skin lesions is one of the key steps for useful, early and non-invasive diagnosis of coetaneous melanomas. This paper proposes two image segmentation algorithms based on frequency domain processing and k-means clustering/fuzzy k-means clustering. The two methods are capable of segmenting and extracting the true border that reveals the global structure irregularity (indentations and protrusions), which may suggest excessive cell growth or regression of a melanoma. As a pre-processing step, Fourier low-pass filtering is applied to reduce the surrounding noise in a skin lesion image. A quantitative comparison of the techniques is enabled by the use of synthetic skin lesion images that model lesions covered with hair to which Gaussian noise is added. The proposed techniques are also compared with an established optimal-based thresholding skin-segmentation method. It is demonstrated that for lesions with a range of different border irregularity properties, the k-means clustering and fuzzy k-means clustering segmentation methods provide the best performance over a range of signal to noise ratios. The proposed segmentation techniques are also demonstrated to have similar performance when tested on real skin lesions representing high-resolution ELM images. This study suggests that the segmentation results obtained using a combination of low-pass frequency filtering and k-means or fuzzy k-means clustering are superior to the result that would be obtained by using k-means or fuzzy k-means clustering segmentation methods alone. © 2011 John Wiley & Sons A/S.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
A Multiphase Model for the Intracluster Medium
NASA Technical Reports Server (NTRS)
Nagai, Daisuke; Sulkanen, Martin E.; Evrard, August E.
1999-01-01
Constraints on the clustered mass density of the universe derived from the observed population mean intracluster gas fraction of x-ray clusters may be biased by reliance on a single-phase assumption for the thermodynamic structure of the intracluster medium (ICM). We propose a descriptive model for multiphase structure in which a spherically symmetric ICM contains isobaric density perturbations with a radially dependent variance. Fixing the x-ray emission and emission weighted temperature, we explore two independently observable signatures of the model in the parameter space. For bremsstrahlung dominated emission, the central Sunyaev-Zel'dovich (SZ) decrement in the multiphase case is increased over the single-phase case and multiphase x-ray spectra in the range 0.1-20 keV are flatter in the continuum and exhibit stronger low energy emission lines than their single-phase counterpart. We quantify these effects for a fiducial 10e8 K cluster and demonstrate how the combination of SZ and x-ray spectroscopy can be used to identify a preferred location in the plane of the model parameter space. From these parameters the correct value of mean intracluster gas fraction in the multiphase model results, allowing an unbiased estimate of clustered mass density to he recovered.
Leão, Erico; Montez, Carlos; Moraes, Ricardo; Portugal, Paulo; Vasques, Francisco
2017-01-01
The use of Wireless Sensor Network (WSN) technologies is an attractive option to support wide-scale monitoring applications, such as the ones that can be found in precision agriculture, environmental monitoring and industrial automation. The IEEE 802.15.4/ZigBee cluster-tree topology is a suitable topology to build wide-scale WSNs. Despite some of its known advantages, including timing synchronisation and duty-cycle operation, cluster-tree networks may suffer from severe network congestion problems due to the convergecast pattern of its communication traffic. Therefore, the careful adjustment of transmission opportunities (superframe durations) allocated to the cluster-heads is an important research issue. This paper proposes a set of proportional Superframe Duration Allocation (SDA) schemes, based on well-defined protocol and timing models, and on the message load imposed by child nodes (Load-SDA scheme), or by number of descendant nodes (Nodes-SDA scheme) of each cluster-head. The underlying reasoning is to adequately allocate transmission opportunities (superframe durations) and parametrize buffer sizes, in order to improve the network throughput and avoid typical problems, such as: network congestion, high end-to-end communication delays and discarded messages due to buffer overflows. Simulation assessments show how proposed allocation schemes may clearly improve the operation of wide-scale cluster-tree networks. PMID:28134822
Removal of impulse noise clusters from color images with local order statistics
NASA Astrophysics Data System (ADS)
Ruchay, Alexey; Kober, Vitaly
2017-09-01
This paper proposes a novel algorithm for restoring images corrupted with clusters of impulse noise. The noise clusters often occur when the probability of impulse noise is very high. The proposed noise removal algorithm consists of detection of bulky impulse noise in three color channels with local order statistics followed by removal of the detected clusters by means of vector median filtering. With the help of computer simulation we show that the proposed algorithm is able to effectively remove clustered impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.
Huo, Guanying; Yang, Simon X; Li, Qingwu; Zhou, Yan
2017-04-01
Sidescan sonar image segmentation is a very important issue in underwater object detection and recognition. In this paper, a robust and fast method for sidescan sonar image segmentation is proposed, which deals with both speckle noise and intensity inhomogeneity that may cause considerable difficulties in image segmentation. The proposed method integrates the nonlocal means-based speckle filtering (NLMSF), coarse segmentation using k -means clustering, and fine segmentation using an improved region-scalable fitting (RSF) model. The NLMSF is used before the segmentation to effectively remove speckle noise while preserving meaningful details such as edges and fine features, which can make the segmentation easier and more accurate. After despeckling, a coarse segmentation is obtained by using k -means clustering, which can reduce the number of iterations. In the fine segmentation, to better deal with possible intensity inhomogeneity, an edge-driven constraint is combined with the RSF model, which can not only accelerate the convergence speed but also avoid trapping into local minima. The proposed method has been successfully applied to both noisy and inhomogeneous sonar images. Experimental and comparative results on real and synthetic sonar images demonstrate that the proposed method is robust against noise and intensity inhomogeneity, and is also fast and accurate.
Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization.
Mitra, Adway; Biswas, Soma; Bhattacharyya, Chiranjib
2017-03-01
A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.
Tantalum induced butterfly-like clusters on Si (111)-7 × 7 surface: STM/STS study at low coverage
NASA Astrophysics Data System (ADS)
Shukrynau, Pavel; Mutombo, Pingo; Švec, Martin; Hietschold, Michael; Cháb, Vladimír
2012-02-01
The adsorption of the small amounts of tantalum on Si (111)-7 × 7 reconstructed surface is investigated systematically using scanning tunneling microscopy and tunneling spectroscopy combined with first-principles density functional theory calculations. We find out that the moderate annealing of the Ta covered surface results in the formation of clusters of the butterfly-like shape. The clusters are sporadically distributed over the surface and their density is metal coverage dependent. Filled and empty state STM images of the clusters differ strongly suggesting the existence of covalent bonds within the cluster. Tunneling spectroscopy measurements reveal small energy gap, showing semiconductor-like behavior of the constituent atoms. The cluster model based on experimental images and theoretical calculations has been proposed and discussed. Presented results show that Ta joins the family of adsorbates, that are known to form magic clusters on Si (111)-7 × 7, but its magic cluster has the structural and electronic properties that are different from those reported before.
NASA Astrophysics Data System (ADS)
Vathsala, H.; Koolagudi, Shashidhar G.
2017-01-01
In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.
NASA Astrophysics Data System (ADS)
Zhang, Ying; Moges, Semu; Block, Paul
2018-01-01
Prediction of seasonal precipitation can provide actionable information to guide management of various sectoral activities. For instance, it is often translated into hydrological forecasts for better water resources management. However, many studies assume homogeneity in precipitation across an entire study region, which may prove ineffective for operational and local-level decisions, particularly for locations with high spatial variability. This study proposes advancing local-level seasonal precipitation predictions by first conditioning on regional-level predictions, as defined through objective cluster analysis, for western Ethiopia. To our knowledge, this is the first study predicting seasonal precipitation at high resolution in this region, where lives and livelihoods are vulnerable to precipitation variability given the high reliance on rain-fed agriculture and limited water resources infrastructure. The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances in prediction skill and resolution, as compared with previous studies. The statistical model improves versus the non-clustered case or dynamical models for a number of specific clusters in northwestern Ethiopia, with clusters having regional average correlation and ranked probability skill score (RPSS) values of up to 0.5 and 33 %, respectively. The general skill (after bias correction) of the two best-performing dynamical models over the entire study region is superior to that of the statistical models, although the dynamical models issue predictions at a lower resolution and the raw predictions require bias correction to guarantee comparable skills.
A first packet processing subdomain cluster model based on SDN
NASA Astrophysics Data System (ADS)
Chen, Mingyong; Wu, Weimin
2017-08-01
For the current controller cluster packet processing performance bottlenecks and controller downtime problems. An SDN controller is proposed to allocate the priority of each device in the SDN (Software Defined Network) network, and the domain contains several network devices and Controller, the controller is responsible for managing the network equipment within the domain, the switch performs data delivery based on the load of the controller, processing network equipment data. The experimental results show that the model can effectively solve the risk of single point failure of the controller, and can solve the performance bottleneck of the first packet processing.
Hensman, James; Lawrence, Neil D; Rattray, Magnus
2013-08-20
Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.
a Probabilistic Embedding Clustering Method for Urban Structure Detection
NASA Astrophysics Data System (ADS)
Lin, X.; Li, H.; Zhang, Y.; Gao, L.; Zhao, L.; Deng, M.
2017-09-01
Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by "learning" via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.
Operating room scheduling using hybrid clustering priority rule and genetic algorithm
NASA Astrophysics Data System (ADS)
Santoso, Linda Wahyuni; Sinawan, Aisyah Ashrinawati; Wijaya, Andi Rahadiyan; Sudiarso, Andi; Masruroh, Nur Aini; Herliansyah, Muhammad Kusumawan
2017-11-01
Operating room is a bottleneck resource in most hospitals so that operating room scheduling system will influence the whole performance of the hospitals. This research develops a mathematical model of operating room scheduling for elective patients which considers patient priority with limit number of surgeons, operating rooms, and nurse team. Clustering analysis was conducted to the data of surgery durations using hierarchical and non-hierarchical methods. The priority rule of each resulting cluster was determined using Shortest Processing Time method. Genetic Algorithm was used to generate daily operating room schedule which resulted in the lowest values of patient waiting time and nurse overtime. The computational results show that this proposed model reduced patient waiting time by approximately 32.22% and nurse overtime by approximately 32.74% when compared to actual schedule.
Molecular dynamics study of the melting of a supported 887-atom Pd decahedron.
Schebarchov, D; Hendy, S C; Polak, W
2009-04-08
We employ classical molecular dynamics simulations to investigate the melting behaviour of a decahedral Pd(887) cluster on a single layer of graphite (graphene). The interaction between Pd atoms is modelled with an embedded-atom potential, while the adhesion of Pd atoms to the substrate is approximated with a Lennard-Jones potential. We find that the decahedral structure persists at temperatures close to the melting point, but that just below the melting transition, the cluster accommodates to the substrate by means of complete melting and then recrystallization into an fcc structure. These structural changes are in qualitative agreement with recently proposed models, and they verify the existence of an energy barrier preventing softly deposited clusters from 'wetting' the substrate at temperatures below the melting point.
Computational Modeling of Radiation Phenomenon in SiC for Nuclear Applications
NASA Astrophysics Data System (ADS)
Ko, Hyunseok
Silicon carbide (SiC) material has been investigated for promising nuclear materials owing to its superior thermo-mechanical properties, and low neutron cross-section. While the interest in SiC has been increasing, the lack of fundamental understanding in many radiation phenomena is an important issue. More specifically, these phenomena in SiC include the fission gas transport, radiation induced defects and its evolution, radiation effects on the mechanical stability, matrix brittleness of SiC composites, and low thermal conductivities of SiC composites. To better design SiC and SiC composite materials for various nuclear applications, understanding each phenomenon and its significance under specific reactor conditions is important. In this thesis, we used various modeling approaches to understand the fundamental radiation phenomena in SiC for nuclear applications in three aspects: (a) fission product diffusion through SiC, (b) optimization of thermodynamic stable self-interstitial atom clusters, (c) interface effect in SiC composite and their change upon radiation. In (a) fission product transport work, we proposed that Ag/Cs diffusion in high energy grain boundaries may be the upper boundary in unirradiated SiC at relevant temperature, and radiation enhanced diffusion is responsible for fast diffusion measured in post-irradiated fuel particles. For (b) the self-interstitial cluster work, thermodynamically stable clusters are identified as a function of cluster size, shape, and compositions using a genetic algorithm. We found that there are compositional and configurational transitions for stable clusters as the cluster size increases. For (c) the interface effect in SiC composite, we investigated recently proposed interface, which is CNT reinforced SiC composite. The analytical model suggests that CNT/SiC composites have attractive mechanical and thermal properties, and these fortify the argument that SiC composites are good candidate materials for the cladding. We used grand canonical monte carlo to optimize the interface, as a part of the stepping stone for further study using the interface.
Gooya, Ali; Lekadir, Karim; Alba, Xenia; Swift, Andrew J; Wild, Jim M; Frangi, Alejandro F
2015-01-01
Construction of Statistical Shape Models (SSMs) from arbitrary point sets is a challenging problem due to significant shape variation and lack of explicit point correspondence across the training data set. In medical imaging, point sets can generally represent different shape classes that span healthy and pathological exemplars. In such cases, the constructed SSM may not generalize well, largely because the probability density function (pdf) of the point sets deviates from the underlying assumption of Gaussian statistics. To this end, we propose a generative model for unsupervised learning of the pdf of point sets as a mixture of distinctive classes. A Variational Bayesian (VB) method is proposed for making joint inferences on the labels of point sets, and the principal modes of variations in each cluster. The method provides a flexible framework to handle point sets with no explicit point-to-point correspondences. We also show that by maximizing the marginalized likelihood of the model, the optimal number of clusters of point sets can be determined. We illustrate this work in the context of understanding the anatomical phenotype of the left and right ventricles in heart. To this end, we use a database containing hearts of healthy subjects, patients with Pulmonary Hypertension (PH), and patients with Hypertrophic Cardiomyopathy (HCM). We demonstrate that our method can outperform traditional PCA in both generalization and specificity measures.
NASA Astrophysics Data System (ADS)
Arévalo, Germán. V.; Hincapié, Roberto C.; Sierra, Javier E.
2015-09-01
UDWDM PON is a leading technology oriented to provide ultra-high bandwidth to final users while profiting the physical channels' capability. One of the main drawbacks of UDWDM technique is the fact that the nonlinear effects, like FWM, become stronger due to the close spectral proximity among channels. This work proposes a model for the optimal deployment of this type of networks taking into account the fiber length limitations imposed by physical restrictions related with the fiber's data transmission as well as the users' asymmetric distribution in a provided region. The proposed model employs the data transmission related effects in UDWDM PON as restrictions in the optimization problem and also considers the user's asymmetric clustering and the subdivision of the users region though a Voronoi geometric partition technique. Here it is considered de Voronoi dual graph, it is the Delaunay Triangulation, as the planar graph for resolving the problem related with the minimum weight of the fiber links.
Su, Jin-He; Piao, Ying-Chao; Luo, Ze; Yan, Bao-Ping
2018-04-26
With the application of various data acquisition devices, a large number of animal movement data can be used to label presence data in remote sensing images and predict species distribution. In this paper, a two-stage classification approach for combining movement data and moderate-resolution remote sensing images was proposed. First, we introduced a new density-based clustering method to identify stopovers from migratory birds’ movement data and generated classification samples based on the clustering result. We split the remote sensing images into 16 × 16 patches and labeled them as positive samples if they have overlap with stopovers. Second, a multi-convolution neural network model is proposed for extracting the features from temperature data and remote sensing images, respectively. Then a Support Vector Machines (SVM) model was used to combine the features together and predict classification results eventually. The experimental analysis was carried out on public Landsat 5 TM images and a GPS dataset was collected on 29 birds over three years. The results indicated that our proposed method outperforms the existing baseline methods and was able to achieve good performance in habitat suitability prediction.
Patel, Vidushi S; Ezaz, Tariq; Deakin, Janine E; Graves, Jennifer A Marshall
2010-12-01
The haemoglobin protein, required for oxygen transportation in the body, is encoded by α- and β-globin genes that are arranged in clusters. The transpositional model for the evolution of distinct α-globin and β-globin clusters in amniotes is much simpler than the previously proposed whole genome duplication model. According to this model, all jawed vertebrates share one ancient region containing α- and β-globin genes and several flanking genes in the order MPG-C16orf35-(α-β)-GBY-LUC7L that has been conserved for more than 410 million years, whereas amniotes evolved a distinct β-globin cluster by insertion of a transposed β-globin gene from this ancient region into a cluster of olfactory receptors flanked by CCKBR and RRM1. It could not be determined whether this organisation is conserved in all amniotes because of the paucity of information from non-avian reptiles. To fill in this gap, we examined globin gene organisation in a squamate reptile, the Australian bearded dragon lizard, Pogona vitticeps (Agamidae). We report here that the α-globin cluster (HBK, HBA) is flanked by C16orf35 and GBY and is located on a pair of microchromosomes, whereas the β-globin cluster is flanked by RRM1 on the 3' end and is located on the long arm of chromosome 3. However, the CCKBR gene that flanks the β-globin cluster on the 5' end in other amniotes is located on the short arm of chromosome 5 in P. vitticeps, indicating that a chromosomal break between the β-globin cluster and CCKBR occurred at least in the agamid lineage. Our data from a reptile species provide further evidence to support the transpositional model for the evolution of β-globin gene cluster in amniotes.
Origin of the high velocity gas in NGC 6231
NASA Astrophysics Data System (ADS)
Massa, Derck
2017-08-01
It is well known that clusters of massive stars are influenced by the presence of strong winds, that they are sources of diffuse X-rays from shocked gas, and that this gas can be vented into the surrounding region or the halo, forming a critical element in the process of galactic feedback. However, the details of how these different environments interact and evolve are far from complete. Recently, Massa (2017) showed that the peculiar C IV 1550 Ang absorption seen in several otherwise normal main sequence B stars in NGC 6231 is not intrinsic to the stars. Instead, this absorption, which extends to more than -2000 km/s, is due to intervening carbon rich, high speed gas in the cluster environment. In this proposal, we seek to identify the origin of the high speed gas. The proposed observations will enable us to determine whether it is due to the outer wind of the WC star WR79, or to a collective cluster wind, enriched by carbon from the wind of WR79. If it is due to the wind of WR79, then the new data will furnish a novel, less model dependent estimate of the mass loss rate of a WC star. If it is due to a collective wind from the cluster, then we could be witnessing an important stage of galactic feedback. In either case, the proposed observations will provide a unique and significant insight on how massive, open clusters evolve - insight that can only be obtained through UV spectroscopy.
Neurocognitive disorders: cluster 1 of the proposed meta-structure for DSM-V and ICD-11.
Sachdev, P; Andrews, G; Hobbs, M J; Sunderland, M; Anderson, T M
2009-12-01
In an effort to group mental disorders on the basis of aetiology, five clusters have been proposed. In this paper, we consider the validity of the first cluster, neurocognitive disorders, within this proposal. These disorders are categorized as 'Dementia, Delirium, and Amnestic and Other Cognitive Disorders' in DSM-IV and 'Organic, including Symptomatic Mental Disorders' in ICD-10. We reviewed the literature in relation to 11 validating criteria proposed by a Study Group of the DSM-V Task Force as applied to the cluster of neurocognitive disorders. 'Neurocognitive' replaces the previous terms 'cognitive' and 'organic' used in DSM-IV and ICD-10 respectively as the descriptor for disorders in this cluster. Although cognitive/organic problems are present in other disorders, this cluster distinguishes itself by the demonstrable neural substrate abnormalities and the salience of cognitive symptoms and deficits. Shared biomarkers, co-morbidity and course offer less persuasive evidence for a valid cluster of neurocognitive disorders. The occurrence of these disorders subsequent to normal brain development sets this cluster apart from neurodevelopmental disorders. The aetiology of the disorders is varied, but the neurobiological underpinnings are better understood than for mental disorders in any other cluster. Neurocognitive disorders meet some of the salient criteria proposed by the Study Group of the DSM-V Task Force to suggest a classification cluster. Further developments in the aetiopathogenesis of these disorders will enhance the clinical utility of this cluster.
Tsai, Jack; Harpaz-Rotem, Ilan; Armour, Cherie; Southwick, Steven M; Krystal, John H; Pietrzak, Robert H
2015-05-01
To evaluate the prevalence of DSM-5 posttraumatic stress disorder (PTSD) and factor structure of PTSD symptomatology in a nationally representative sample of US veterans and examine how PTSD symptom clusters are related to depression, anxiety, suicidal ideation, hostility, physical and mental health-related functioning, and quality of life. Data were analyzed from the National Health and Resilience in Veterans Study, a nationally representative survey of 1,484 US veterans conducted from September through October 2013. Confirmatory factor analyses were conducted to evaluate the factor structure of PTSD symptoms, and structural equation models were constructed to examine the association between PTSD symptom clusters and external correlates. 12.0% of veterans screened positive for lifetime PTSD and 5.2% for past-month PTSD. A 5-factor dysphoric arousal model and a newly proposed 6-factor model both fit the data significantly better than the 4-factor model of DSM-5. The 6-factor model fit the data best in the full sample, as well as in subsamples of female veterans and veterans with lifetime PTSD. The emotional numbing symptom cluster was more strongly related to depression (P < .001) and worse mental health-related functioning (P < .001) than other symptom clusters, while the externalizing behavior symptom cluster was more strongly related to hostility (P < .001). A total of 5.2% of US veterans screened positive for past-month DSM-5 PTSD. A 6-factor model of DSM-5 PTSD symptoms, which builds on extant models and includes a sixth externalizing behavior factor, provides the best dimensional representation of DSM-5 PTSD symptom clusters and demonstrates validity in assessing health outcomes of interest in this population. © Copyright 2015 Physicians Postgraduate Press, Inc.
Semantic-based surveillance video retrieval.
Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve
2007-04-01
Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.
Soft context clustering for F0 modeling in HMM-based speech synthesis
NASA Astrophysics Data System (ADS)
Khorram, Soheil; Sameti, Hossein; King, Simon
2015-12-01
This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.
Group Facilitation: Functions and Skills.
ERIC Educational Resources Information Center
Anderson, L. Frances; Robertson, Sharon E.
1985-01-01
Discusses a model based on a specific set of assumptions about causality and effectiveness in interactional groups. Discusses personal qualities of group facilitators and proposes five major functions and seven skill clusters central to effective group facilitation. (Author/BH)
Abedini, Mohammad; Moradi, Mohammad H; Hosseinian, S M
2016-03-01
This paper proposes a novel method to address reliability and technical problems of microgrids (MGs) based on designing a number of self-adequate autonomous sub-MGs via adopting MGs clustering thinking. In doing so, a multi-objective optimization problem is developed where power losses reduction, voltage profile improvement and reliability enhancement are considered as the objective functions. To solve the optimization problem a hybrid algorithm, named HS-GA, is provided, based on genetic and harmony search algorithms, and a load flow method is given to model different types of DGs as droop controller. The performance of the proposed method is evaluated in two case studies. The results provide support for the performance of the proposed method. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Stability and dynamic of strain mediated adatom superlattices on Cu<111 >
NASA Astrophysics Data System (ADS)
Kappus, Wolfgang
2013-03-01
Substrate strain mediated adatom equilibrium density distributions have been calculated for Cu<111 > surfaces using two complementing methods. A hexagonal adatom superlattice in a coverage range up to 0.045 ML is derived for repulsive short range interactions. For zero short range interactions a hexagonal superstructure of adatom clusters is derived in a coverage range about 0.08 ML. Conditions for the stability of the superlattice against formation of dimers or clusters and degradation are analyzed using simple neighborhood models. Such models are also used to investigate the dynamic of adatoms within their superlattice neighborhood. Collective modes of adatom diffusion are proposed from the analogy with bulk lattice dynamics and methods for measurement are suggested. The recently put forward explanation of surface state mediated interactions for superstructures found in scanning tunneling microscopy experiments is put in question and strain mediated interactions are proposed as an alternative.
Multilayer Statistical Intrusion Detection in Wireless Networks
NASA Astrophysics Data System (ADS)
Hamdi, Mohamed; Meddeb-Makhlouf, Amel; Boudriga, Noureddine
2008-12-01
The rapid proliferation of mobile applications and services has introduced new vulnerabilities that do not exist in fixed wired networks. Traditional security mechanisms, such as access control and encryption, turn out to be inefficient in modern wireless networks. Given the shortcomings of the protection mechanisms, an important research focuses in intrusion detection systems (IDSs). This paper proposes a multilayer statistical intrusion detection framework for wireless networks. The architecture is adequate to wireless networks because the underlying detection models rely on radio parameters and traffic models. Accurate correlation between radio and traffic anomalies allows enhancing the efficiency of the IDS. A radio signal fingerprinting technique based on the maximal overlap discrete wavelet transform (MODWT) is developed. Moreover, a geometric clustering algorithm is presented. Depending on the characteristics of the fingerprinting technique, the clustering algorithm permits to control the false positive and false negative rates. Finally, simulation experiments have been carried out to validate the proposed IDS.
Bayesian multivariate hierarchical transformation models for ROC analysis.
O'Malley, A James; Zou, Kelly H
2006-02-15
A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.
Bayesian multivariate hierarchical transformation models for ROC analysis
O'Malley, A. James; Zou, Kelly H.
2006-01-01
SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836
Lagrangian analysis by clustering. An example in the Nordic Seas.
NASA Astrophysics Data System (ADS)
Koszalka, Inga; Lacasce, Joseph H.
2010-05-01
We propose a new method for obtaining average velocities and eddy diffusivities from Lagrangian data. Rather than grouping the drifter-derived velocities in uniform geographical bins, as is commonly done, we group a specified number of nearest-neighbor velocities. This is done via a clustering algorithm operating on the instantaneous positions of the drifters. Thus it is the data distribution itself which determines the positions of the averages and the areal extent of the clusters. A major advantage is that because the number of members is essentially the same for all clusters, the statistical accuracy is more uniform than with geographical bins. We illustrate the technique using synthetic data from a stochastic model, employing a realistic mean flow. The latter is an accurate representation of the surface currents in the Nordic Seas and is strongly inhomogeneous in space. We use the clustering algorithm to extract the mean velocities and diffusivities (both of which are known from the stochastic model). We also compare the results to those obtained with fixed geographical bins. Clustering is more successful at capturing spatial variability of the mean flow and also improves convergence in the eddy diffusivity estimates. We discuss both the future prospects and shortcomings of the new method.
Wu, Changsheng; Ichinose, Koji; Choi, Young Hae; van Wezel, Gilles P
2017-07-18
The biosynthesis of aromatic polyketides derived from type II polyketide synthases (PKSs) is complex, and it is not uncommon that highly similar gene clusters give rise to diverse structural architectures. The act biosynthetic gene cluster (BGC) of the model actinomycete Streptomyces coelicolor A3(2) is an archetypal type II PKS. Here we show that the act BGC also specifies the aromatic polyketide GTRI-02 (1) and propose a mechanism for the biogenesis of its 3,4-dihydronaphthalen-1(2H)-one backbone. Polyketide 1 was also produced by Streptomyces sp. MBT76 after activation of the act-like qin gene cluster by overexpression of the pathway-specific activator. Mining of this strain also identified dehydroxy-GTRI-02 (2), which most likely originated from dehydration of 1 during the isolation process. This work shows that even extensively studied model gene clusters such as act of S. coelicolor can still produce new chemistry, offering new perspectives for drug discovery. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Examination of evidence for collinear cluster tri-partition
NASA Astrophysics Data System (ADS)
Pyatkov, Yu. V.; Kamanin, D. V.; Alexandrov, A. A.; Alexandrova, I. A.; Goryainova, Z. I.; Malaza, V.; Mkaza, N.; Kuznetsova, E. A.; Strekalovsky, A. O.; Strekalovsky, O. V.; Zhuchko, V. E.
2017-12-01
Background: In a series of experiments at different time-of-flight spectrometers of heavy ions we have observed manifestations of a new at least ternary decay channel of low excited heavy nuclei. Due to specific features of the effect, it was called collinear cluster tri-partition (CCT). The obtained experimental results have initiated a number of theoretical articles dedicated to different aspects of the CCT. Special attention was paid to kinematics constraints and stability of collinearity. Purpose: To compare theoretical predictions with our experimental data, only partially published so far. To develop the model of one of the most populated CCT modes that gives rise to the so-called "Ni-bump." Method: The fission events under analysis form regular two-dimensional linear structures in the mass correlation distributions of the fission fragments. The structures were revealed both at a highly statistically reliable level but on the background substrate, and at the low statistics in almost noiseless distribution. The structures are bounded by the known magic fragments and were reproduced at different spectrometers. All this provides high reliability of our experimental findings. The model of the CCT proposed here is based on theoretical results, published recently, and the detailed analysis of all available experimental data. Results: Under our model, the CCT mode giving rise to the Ni bump occurs as a two-stage breakup of the initial three body chain like the nuclear configuration with an elongated central cluster. After the first scission at the touching point with one of the side clusters, the predominantly heavier one, the deformation energy of the central cluster allows the emission of up to four neutrons flying apart isotropically. The heavy side cluster and a dinuclear system, consisting of the light side cluster and the central one, relaxed to a less elongated shape, are accelerated in the mutual Coulomb field. The "tip" of the dinuclear system at the moment of its rupture faces the heavy fragment or the opposite direction due to a single turn of the system around its center of gravity. Conclusions: Additional experimental information regarding the energies of the CCT partners and the proposed model of the process respond to criticisms concerning the kinematic constraints and the stability of collinearity in the CCT. The octupole deformed system formed after the first scission is oriented along the fission axis, and its rupture occurs predominantly after the full acceleration. Noncollinear true ternary fission and far asymmetric binary fission, observed earlier, appear to be the special cases of the decay of the prescission configuration leading to the CCT. Detection of the Ni-7268 fission fragments with a kinetic energy E <25 MeV at the mass-separator Lohengrin is proposed for an independent experimental verification of the CCT.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Shao-Gang; Liao, Ji-Hai; Zhao, Yu-Jun
The unique electronic property induced diversified structure of boron (B) cluster has attracted much interest from experimentalists and theorists. B{sub 30–40} were reported to be planar fragments of triangular lattice with proper concentrations of vacancies recently. Here, we have performed high-throughput screening for possible B clusters through the first-principles calculations, including various shapes and distributions of vacancies. As a result, we have determined the structures of B{sub n} clusters with n = 30–51 and found a stable planar cluster of B{sub 49} with a double-hexagon vacancy. Considering the 8-electron rule and the electron delocalization, a concise model for the distributionmore » of the 2c–2e and 3c–2e bonds has been proposed to explain the stability of B planar clusters, as well as the reported B cages.« less
NASA Astrophysics Data System (ADS)
Christou, Michalis; Christoudias, Theodoros; Morillo, Julián; Alvarez, Damian; Merx, Hendrik
2016-09-01
We examine an alternative approach to heterogeneous cluster-computing in the many-core era for Earth system models, using the European Centre for Medium-Range Weather Forecasts Hamburg (ECHAM)/Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model as a pilot application on the Dynamical Exascale Entry Platform (DEEP). A set of autonomous coprocessors interconnected together, called Booster, complements a conventional HPC Cluster and increases its computing performance, offering extra flexibility to expose multiple levels of parallelism and achieve better scalability. The EMAC model atmospheric chemistry code (Module Efficiently Calculating the Chemistry of the Atmosphere (MECCA)) was taskified with an offload mechanism implemented using OmpSs directives. The model was ported to the MareNostrum 3 supercomputer to allow testing with Intel Xeon Phi accelerators on a production-size machine. The changes proposed in this paper are expected to contribute to the eventual adoption of Cluster-Booster division and Many Integrated Core (MIC) accelerated architectures in presently available implementations of Earth system models, towards exploiting the potential of a fully Exascale-capable platform.
Structure of S-shaped growth in innovation diffusion
NASA Astrophysics Data System (ADS)
Shimogawa, Shinsuke; Shinno, Miyuki; Saito, Hiroshi
2012-05-01
A basic question on innovation diffusion is why the growth curve of the adopter population in a large society is often S shaped. From macroscopic, microscopic, and mesoscopic viewpoints, the growth of the adopter population is observed as the growth curve, individual adoptions, and differences among individual adoptions, respectively. The S shape can be explained if an empirical model of the growth curve can be deduced from models of microscopic and mesoscopic structures. However, even the structure of growth curve has not been revealed yet because long-term extrapolations by proposed models of S-shaped curves are unstable and it has been very difficult to predict the long-term growth and final adopter population. This paper studies the S-shaped growth from the viewpoint of social regularities. Simple methods to analyze power laws enable us to extract the structure of the growth curve directly from the growth data of recent basic telecommunication services. This empirical model of growth curve is singular at the inflection point and a logarithmic function of time after this point, which explains the unstable extrapolations obtained using previously proposed models and the difficulty in predicting the final adopter population. Because the empirical S curve can be expressed in terms of two power laws of the regularity found in social performances of individuals, we propose the hypothesis that the S shape represents the heterogeneity of the adopter population, and the heterogeneity parameter is distributed under the regularity in social performances of individuals. This hypothesis is so powerful as to yield models of microscopic and mesoscopic structures. In the microscopic model, each potential adopter adopts the innovation when the information accumulated by the learning about the innovation exceeds a threshold. The accumulation rate of information is heterogeneous among the adopter population, whereas the threshold is a constant, which is the opposite of previously proposed models. In the mesoscopic model, flows of innovation information incoming to individuals are organized as dimorphic and partially clustered. These microscopic and mesoscopic models yield the empirical model of the S curve and explain the S shape as representing the regularities of information flows generated through a social self-organization. To demonstrate the validity and importance of the hypothesis, the models of three level structures are applied to reveal the mechanism determining and differentiating diffusion speeds. The empirical model of S curves implies that the coefficient of variation of the flow rates determines the diffusion speed for later adopters. Based on this property, a model describing the inside of information flow clusters can be given, which provides a formula interconnecting the diffusion speed, cluster populations, and a network topological parameter of the flow clusters. For two recent basic telecommunication services in Japan, the formula represents the variety of speeds in different areas and enables us to explain speed gaps between urban and rural areas and between the two services. Furthermore, the formula provides a method to estimate the final adopter population.
A Bayesian cluster analysis method for single-molecule localization microscopy data.
Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick
2016-12-01
Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.
Diffusion-limited aggregation in two dimensions
NASA Astrophysics Data System (ADS)
Hurd, Alan J.; Schaefer, Dale W.
1985-03-01
We have studied the aggregation of silica microspheres confined to two dimensions at an air-water interface. Under microscopic observation, both monomers and clusters are seen to aggregate by a diffusion-limited process. The clusters' fractal dimension is 1.20+/-0.15, smaller than values obtained from current models of aggregation. We propose that anisotropic repulsive interactions account for the low dimensionality by more effectively repelling particles from the side of an existing dendrite than from the end.
Continuous Human Action Recognition Using Depth-MHI-HOG and a Spotter Model
Eum, Hyukmin; Yoon, Changyong; Lee, Heejin; Park, Mignon
2015-01-01
In this paper, we propose a new method for spotting and recognizing continuous human actions using a vision sensor. The method is comprised of depth-MHI-HOG (DMH), action modeling, action spotting, and recognition. First, to effectively separate the foreground from background, we propose a method called DMH. It includes a standard structure for segmenting images and extracting features by using depth information, MHI, and HOG. Second, action modeling is performed to model various actions using extracted features. The modeling of actions is performed by creating sequences of actions through k-means clustering; these sequences constitute HMM input. Third, a method of action spotting is proposed to filter meaningless actions from continuous actions and to identify precise start and end points of actions. By employing the spotter model, the proposed method improves action recognition performance. Finally, the proposed method recognizes actions based on start and end points. We evaluate recognition performance by employing the proposed method to obtain and compare probabilities by applying input sequences in action models and the spotter model. Through various experiments, we demonstrate that the proposed method is efficient for recognizing continuous human actions in real environments. PMID:25742172
Search For Cosmic-Ray-Induced Gamma-Ray Emission In Galaxy Clusters
Ackermann, M.
2014-04-30
Current theories predict relativistic hadronic particle populations in clusters of galaxies in addition to the already observed relativistic leptons. In these scenarios hadronic interactions give rise to neutral pions which decay into rays that are potentially observable with the Large Area Telescope (LAT) on board the Fermi space telescope. We present a joint likelihood analysis searching for spatially extended γ-ray emission at the locations of 50 galaxy clusters in 4 years of Fermi-LAT data under the assumption of the universal cosmic-ray model proposed by Pinzke & Pfrommer (2010). We find an excess at a significance of 2.7 σ which uponmore » closer inspection is however correlated to individual excess emission towards three galaxy clusters: Abell 400, Abell 1367 and Abell 3112. We discuss these cases in detail and conservatively attribute the emission to unmodeled background (for example, radio galaxies within the clusters). Through the combined analysis of 50 clusters we exclude hadronic injection efficiencies in simple hadronic models above 21% and establish limits on the cosmic-ray to thermal pressure ratio within the virial radius, R200, to be below 1.2-1.4% depending on the morphological classification. In addition we derive new limits on the γ-ray flux from individual clusters in our sample.« less
Search for Cosmic-Ray-Induced Gamma-Ray Emission in Galaxy Clusters
NASA Technical Reports Server (NTRS)
Ackermann, M.; Ajello, M.; Albert, A.; Allafort, A.; Atwood, W. B.; Baldini, L.; Ballet, J.; Barbiellini, G.; Bastieri, D.; Bechtol, K.;
2014-01-01
Current theories predict relativistic hadronic particle populations in clusters of galaxies in addition to the already observed relativistic leptons. In these scenarios hadronic interactions give rise to neutral pions which decay into gamma rays that are potentially observable with the Large Area Telescope (LAT) on board the Fermi space telescope. We present a joint likelihood analysis searching for spatially extended gamma-ray emission at the locations of 50 galaxy clusters in four years of Fermi-LAT data under the assumption of the universal cosmic-ray (CR) model proposed by Pinzke & Pfrommer. We find an excess at a significance of 2.7 delta, which upon closer inspection, however, is correlated to individual excess emission toward three galaxy clusters: A400, A1367, and A3112. We discuss these cases in detail and conservatively attribute the emission to unmodeled background systems (for example, radio galaxies within the clusters).Through the combined analysis of 50 clusters, we exclude hadronic injection efficiencies in simple hadronic models above 21% and establish limits on the CR to thermal pressure ratio within the virial radius, R(sub 200), to be below 1.25%-1.4% depending on the morphological classification. In addition, we derive new limits on the gamma-ray flux from individual clusters in our sample.
Vagne, Quentin; Turner, Matthew S.; Sens, Pierre
2015-01-01
The formation of dynamical clusters of proteins is ubiquitous in cellular membranes and is in part regulated by the recycling of membrane components. We show, using stochastic simulations and analytic modeling, that the out-of-equilibrium cluster size distribution of membrane components undergoing continuous recycling is strongly influenced by lateral confinement. This result has significant implications for the clustering of plasma membrane proteins whose mobility is hindered by cytoskeletal “corrals” and for protein clustering in cellular organelles of limited size that generically support material fluxes. We show how the confinement size can be sensed through its effect on the size distribution of clusters of membrane heterogeneities and propose that this could be regulated to control the efficiency of membrane-bound reactions. To illustrate this, we study a chain of enzymatic reactions sensitive to membrane protein clustering. The reaction efficiency is found to be a non-monotonic function of the system size, and can be optimal for sizes comparable to those of cellular organelles. PMID:26656912
NASA Astrophysics Data System (ADS)
El-Sebakhy, Emad A.
2009-09-01
Pressure-volume-temperature properties are very important in the reservoir engineering computations. There are many empirical approaches for predicting various PVT properties based on empirical correlations and statistical regression models. Last decade, researchers utilized neural networks to develop more accurate PVT correlations. These achievements of neural networks open the door to data mining techniques to play a major role in oil and gas industry. Unfortunately, the developed neural networks correlations are often limited, and global correlations are usually less accurate compared to local correlations. Recently, adaptive neuro-fuzzy inference systems have been proposed as a new intelligence framework for both prediction and classification based on fuzzy clustering optimization criterion and ranking. This paper proposes neuro-fuzzy inference systems for estimating PVT properties of crude oil systems. This new framework is an efficient hybrid intelligence machine learning scheme for modeling the kind of uncertainty associated with vagueness and imprecision. We briefly describe the learning steps and the use of the Takagi Sugeno and Kang model and Gustafson-Kessel clustering algorithm with K-detected clusters from the given database. It has featured in a wide range of medical, power control system, and business journals, often with promising results. A comparative study will be carried out to compare their performance of this new framework with the most popular modeling techniques, such as neural networks, nonlinear regression, and the empirical correlations algorithms. The results show that the performance of neuro-fuzzy systems is accurate, reliable, and outperform most of the existing forecasting techniques. Future work can be achieved by using neuro-fuzzy systems for clustering the 3D seismic data, identification of lithofacies types, and other reservoir characterization.
A cloud-based framework for large-scale traditional Chinese medical record retrieval.
Liu, Lijun; Liu, Li; Fu, Xiaodong; Huang, Qingsong; Zhang, Xianwen; Zhang, Yin
2018-01-01
Electronic medical records are increasingly common in medical practice. The secondary use of medical records has become increasingly important. It relies on the ability to retrieve the complete information about desired patient populations. How to effectively and accurately retrieve relevant medical records from large- scale medical big data is becoming a big challenge. Therefore, we propose an efficient and robust framework based on cloud for large-scale Traditional Chinese Medical Records (TCMRs) retrieval. We propose a parallel index building method and build a distributed search cluster, the former is used to improve the performance of index building, and the latter is used to provide high concurrent online TCMRs retrieval. Then, a real-time multi-indexing model is proposed to ensure the latest relevant TCMRs are indexed and retrieved in real-time, and a semantics-based query expansion method and a multi- factor ranking model are proposed to improve retrieval quality. Third, we implement a template-based visualization method for displaying medical reports. The proposed parallel indexing method and distributed search cluster can improve the performance of index building and provide high concurrent online TCMRs retrieval. The multi-indexing model can ensure the latest relevant TCMRs are indexed and retrieved in real-time. The semantics expansion method and the multi-factor ranking model can enhance retrieval quality. The template-based visualization method can enhance the availability and universality, where the medical reports are displayed via friendly web interface. In conclusion, compared with the current medical record retrieval systems, our system provides some advantages that are useful in improving the secondary use of large-scale traditional Chinese medical records in cloud environment. The proposed system is more easily integrated with existing clinical systems and be used in various scenarios. Copyright © 2017. Published by Elsevier Inc.
Kurczynska, Monika; Kotulska, Malgorzata
2018-01-01
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters-the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.
Salim, Shelly; Moh, Sangman; Choi, Dongmin; Chung, Ilyong
2014-08-11
A cognitive radio sensor network (CRSN) is a wireless sensor network whose sensor nodes are equipped with cognitive radio capability. Clustering is one of the most challenging issues in CRSNs, as all sensor nodes, including the cluster head, have to use the same frequency band in order to form a cluster. However, due to the nature of heterogeneous channels in cognitive radio, it is difficult for sensor nodes to find a cluster head. This paper proposes a novel energy-efficient and compact clustering scheme named clustering with temporary support nodes (CENTRE). CENTRE efficiently achieves a compact cluster formation by adopting two-phase cluster formation with fixed duration. By introducing a novel concept of temporary support nodes to improve the cluster formation, the proposed scheme enables sensor nodes in a network to find a cluster head efficiently. The performance study shows that not only is the clustering process efficient and compact but it also results in remarkable energy savings that prolong the overall network lifetime. In addition, the proposed scheme decreases both the clustering overhead and the average distance between cluster heads and their members.
Salim, Shelly; Moh, Sangman; Choi, Dongmin; Chung, Ilyong
2014-01-01
A cognitive radio sensor network (CRSN) is a wireless sensor network whose sensor nodes are equipped with cognitive radio capability. Clustering is one of the most challenging issues in CRSNs, as all sensor nodes, including the cluster head, have to use the same frequency band in order to form a cluster. However, due to the nature of heterogeneous channels in cognitive radio, it is difficult for sensor nodes to find a cluster head. This paper proposes a novel energy-efficient and compact clustering scheme named clustering with temporary support nodes (CENTRE). CENTRE efficiently achieves a compact cluster formation by adopting two-phase cluster formation with fixed duration. By introducing a novel concept of temporary support nodes to improve the cluster formation, the proposed scheme enables sensor nodes in a network to find a cluster head efficiently. The performance study shows that not only is the clustering process efficient and compact but it also results in remarkable energy savings that prolong the overall network lifetime. In addition, the proposed scheme decreases both the clustering overhead and the average distance between cluster heads and their members. PMID:25116905
K, Jalal Deen; R, Ganesan; A, Merline
2017-07-27
Objective: Accurate segmentation of abnormal and healthy lungs is very crucial for a steadfast computer-aided disease diagnostics. Methods: For this purpose a stack of chest CT scans are processed. In this paper, novel methods are proposed for segmentation of the multimodal grayscale lung CT scan. In the conventional methods using Markov–Gibbs Random Field (MGRF) model the required regions of interest (ROI) are identified. Result: The results of proposed FCM and CNN based process are compared with the results obtained from the conventional method using MGRF model. The results illustrate that the proposed method can able to segment the various kinds of complex multimodal medical images precisely. Conclusion: However, in this paper, to obtain an exact boundary of the regions, every empirical dispersion of the image is computed by Fuzzy C-Means Clustering segmentation. A classification process based on the Convolutional Neural Network (CNN) classifier is accomplished to distinguish the normal tissue and the abnormal tissue. The experimental evaluation is done using the Interstitial Lung Disease (ILD) database. Creative Commons Attribution License
K, Jalal Deen; R, Ganesan; A, Merline
2017-01-01
Objective: Accurate segmentation of abnormal and healthy lungs is very crucial for a steadfast computer-aided disease diagnostics. Methods: For this purpose a stack of chest CT scans are processed. In this paper, novel methods are proposed for segmentation of the multimodal grayscale lung CT scan. In the conventional methods using Markov–Gibbs Random Field (MGRF) model the required regions of interest (ROI) are identified. Result: The results of proposed FCM and CNN based process are compared with the results obtained from the conventional method using MGRF model. The results illustrate that the proposed method can able to segment the various kinds of complex multimodal medical images precisely. Conclusion: However, in this paper, to obtain an exact boundary of the regions, every empirical dispersion of the image is computed by Fuzzy C-Means Clustering segmentation. A classification process based on the Convolutional Neural Network (CNN) classifier is accomplished to distinguish the normal tissue and the abnormal tissue. The experimental evaluation is done using the Interstitial Lung Disease (ILD) database. PMID:28749127
Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno
2016-01-22
Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-05-19
.... Clustering and Effective Date i. Terra-Gen Tariff Provisions 15. Terra-Gen proposes provisions to address clustering of transmission system impact studies, consistent with the guidance provided in the January 14... on how Terra-Gen may cluster studies.\\22\\ Terra-Gen's proposed clustering provisions provide, among...
Outcome-Driven Cluster Analysis with Application to Microarray Data.
Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A
2015-01-01
One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.
Galaxy clustering dependence on the [O II] emission line luminosity in the local Universe
NASA Astrophysics Data System (ADS)
Favole, Ginevra; Rodríguez-Torres, Sergio A.; Comparat, Johan; Prada, Francisco; Guo, Hong; Klypin, Anatoly; Montero-Dorta, Antonio D.
2017-11-01
We study the galaxy clustering dependence on the [O II] emission line luminosity in the SDSS DR7 Main galaxy sample at mean redshift z ∼ 0.1. We select volume-limited samples of galaxies with different [O II] luminosity thresholds and measure their projected, monopole and quadrupole two-point correlation functions. We model these observations using the 1 h-1 Gpc MultiDark-Planck cosmological simulation and generate light cones with the SUrvey GenerAtoR algorithm. To interpret our results, we adopt a modified (Sub)Halo Abundance Matching scheme, accounting for the stellar mass incompleteness of the emission line galaxies. The satellite fraction constitutes an extra parameter in this model and allows to optimize the clustering fit on both small and intermediate scales (i.e. rp ≲ 30 h-1 Mpc), with no need of any velocity bias correction. We find that, in the local Universe, the [O II] luminosity correlates with all the clustering statistics explored and with the galaxy bias. This latter quantity correlates more strongly with the SDSS r-band magnitude than [O II] luminosity. In conclusion, we propose a straightforward method to produce reliable clustering models, entirely built on the simulation products, which provides robust predictions of the typical ELG host halo masses and satellite fraction values. The SDSS galaxy data, MultiDark mock catalogues and clustering results are made publicly available.
Yin, Zhong; Zhang, Jianhua
2014-07-01
Identifying the abnormal changes of mental workload (MWL) over time is quite crucial for preventing the accidents due to cognitive overload and inattention of human operators in safety-critical human-machine systems. It is known that various neuroimaging technologies can be used to identify the MWL variations. In order to classify MWL into a few discrete levels using representative MWL indicators and small-sized training samples, a novel EEG-based approach by combining locally linear embedding (LLE), support vector clustering (SVC) and support vector data description (SVDD) techniques is proposed and evaluated by using the experimentally measured data. The MWL indicators from different cortical regions are first elicited by using the LLE technique. Then, the SVC approach is used to find the clusters of these MWL indicators and thereby to detect MWL variations. It is shown that the clusters can be interpreted as the binary class MWL. Furthermore, a trained binary SVDD classifier is shown to be capable of detecting slight variations of those indicators. By combining the two schemes, a SVC-SVDD framework is proposed, where the clear-cut (smaller) cluster is detected by SVC first and then a subsequent SVDD model is utilized to divide the overlapped (larger) cluster into two classes. Finally, three-class MWL levels (low, normal and high) can be identified automatically. The experimental data analysis results are compared with those of several existing methods. It has been demonstrated that the proposed framework can lead to acceptable computational accuracy and has the advantages of both unsupervised and supervised training strategies. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Comparison of four statistical and machine learning methods for crash severity prediction.
Iranitalab, Amirfarrokh; Khattak, Aemal
2017-11-01
Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.
Clustering gene expression data based on predicted differential effects of GV interaction.
Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu
2005-02-01
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.
Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H
2017-10-25
Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
TWave: High-Order Analysis of Functional MRI
Barnathan, Michael; Megalooikonomou, Vasileios; Faloutsos, Christos; Faro, Scott; Mohamed, Feroze B.
2011-01-01
The traditional approach to functional image analysis models images as matrices of raw voxel intensity values. Although such a representation is widely utilized and heavily entrenched both within neuroimaging and in the wider data mining community, the strong interactions among space, time, and categorical modes such as subject and experimental task inherent in functional imaging yield a dataset with “high-order” structure, which matrix models are incapable of exploiting. Reasoning across all of these modes of data concurrently requires a high-order model capable of representing relationships between all modes of the data in tandem. We thus propose to model functional MRI data using tensors, which are high-order generalizations of matrices equivalent to multidimensional arrays or data cubes. However, several unique challenges exist in the high-order analysis of functional medical data: naïve tensor models are incapable of exploiting spatiotemporal locality patterns, standard tensor analysis techniques exhibit poor efficiency, and mixtures of numeric and categorical modes of data are very often present in neuroimaging experiments. Formulating the problem of image clustering as a form of Latent Semantic Analysis and using the WaveCluster algorithm as a baseline, we propose a comprehensive hybrid tensor and wavelet framework for clustering, concept discovery, and compression of functional medical images which successfully addresses these challenges. Our approach reduced runtime and dataset size on a 9.3 GB finger opposition motor task fMRI dataset by up to 98% while exhibiting improved spatiotemporal coherence relative to standard tensor, wavelet, and voxel-based approaches. Our clustering technique was capable of automatically differentiating between the frontal areas of the brain responsible for task-related habituation and the motor regions responsible for executing the motor task, in contrast to a widely used fMRI analysis program, SPM, which only detected the latter region. Furthermore, our approach discovered latent concepts suggestive of subject handedness nearly 100x faster than standard approaches. These results suggest that a high-order model is an integral component to accurate scalable functional neuroimaging. PMID:21729758
Dishion, Thomas J.; Ha, Thao; Véronneau, Marie-Hélène
2012-01-01
This study proposes the inclusion of peer relationships in a life history perspective on adolescent problem behavior. Longitudinal analyses were used to examine deviant peer clustering as the mediating link between attenuated family ties, peer marginalization, and social disadvantage in early adolescence and sexual promiscuity in middle adolescence and childbearing by early adulthood. Specifically, 998 youth and their families were assessed at age 11 years and periodically through age 24 years. Structural equation modeling revealed that the peer-enhanced life history model provided a good fit to the longitudinal data, with deviant peer clustering strongly predicting adolescent sexual promiscuity and other correlated problem behaviors. Sexual promiscuity, as expected, also strongly predicted the number of children by age 22–24 years. Consistent with a life history perspective, family social disadvantage directly predicted deviant peer clustering and number of children in early adulthood, controlling for all other variables in the model. These data suggest that deviant peer clustering is a core dimension of a fast life history strategy, with strong links to sexual activity and childbearing. The implications of these findings are discussed with respect to the need to integrate an evolutionary-based model of self-organized peer groups in developmental and intervention science. PMID:22409765
Dishion, Thomas J; Ha, Thao; Véronneau, Marie-Hélène
2012-05-01
The authors propose that peer relationships should be included in a life history perspective on adolescent problem behavior. Longitudinal analyses were used to examine deviant peer clustering as the mediating link between attenuated family ties, peer marginalization, and social disadvantage in early adolescence and sexual promiscuity in middle adolescence and childbearing by early adulthood. Specifically, 998 youths, along with their families, were assessed at age 11 years and periodically through age 24 years. Structural equation modeling revealed that the peer-enhanced life history model provided a good fit to the longitudinal data, with deviant peer clustering strongly predicting adolescent sexual promiscuity and other correlated problem behaviors. Sexual promiscuity, as expected, also strongly predicted the number of children by ages 22-24 years. Consistent with a life history perspective, family social disadvantage directly predicted deviant peer clustering and number of children in early adulthood, controlling for all other variables in the model. These data suggest that deviant peer clustering is a core dimension of a fast life history strategy, with strong links to sexual activity and childbearing. The implications of these findings are discussed with respect to the need to integrate an evolutionary-based model of self-organized peer groups in developmental and intervention science.
Wu, K; Daruwalla, Z J; Wong, K L; Murphy, D; Ren, H
2015-08-01
The commercial humeral implants based on the Western population are currently not entirely compatible with Asian patients, due to differences in bone size, shape and structure. Surgeons may have to compromise or use different implants that are less conforming, which may cause complications of as well as inconvenience to the implant position. The construction of Asian humerus atlases of different clusters has therefore been proposed to eradicate this problem and to facilitate planning minimally invasive surgical procedures [6,31]. According to the features of the atlases, new implants could be designed specifically for different patients. Furthermore, an automatic implant selection algorithm has been proposed as well in order to reduce the complications caused by implant and bone mismatch. Prior to the design of the implant, data clustering and extraction of the relevant features were carried out on the datasets of each gender. The fuzzy C-means clustering method is explored in this paper. Besides, two new schemes of implant selection procedures, namely the Procrustes analysis-based scheme and the group average distance-based scheme, were proposed to better search for the matching implants for new coming patients from the database. Both these two algorithms have not been used in this area, while they turn out to have excellent performance in implant selection. Additionally, algorithms to calculate the matching scores between various implants and the patient data are proposed in this paper to assist the implant selection procedure. The results obtained have indicated the feasibility of the proposed development and selection scheme. The 16 sets of male data were divided into two clusters with 8 and 8 subjects, respectively, and the 11 female datasets were also divided into two clusters with 5 and 6 subjects, respectively. Based on the features of each cluster, the implants designed by the proposed algorithm fit very well on their reference humeri and the proposed implant selection procedure allows for a scenario of treating a patient with merely a preoperative anatomical model in order to correctly select the implant that has the best fit. Based on the leave-one-out validation, it can be concluded that both the PA-based method and GAD-based method are able to achieve excellent performance when dealing with the problem of implant selection. The accuracy and average execution time for the PA-based method were 100 % and 0.132 s, respectively, while those of the GAD- based method were 100 % and 0.058 s. Therefore, the GAD-based method outperformed the PA-based method in terms of execution speed. The primary contributions of this paper include the proposal of methods for development of Asian-, gender- and cluster-specific implants based on shape features and selection of the best fit implants for future patients according to their features. To the best of our knowledge, this is the first work that proposes implant design and selection for Asian patients automatically based on features extracted from cluster-specific statistical atlases.
Schneider, Bradley B.; Coy, Stephen L.; Krylov, Evgeny V.; Nazarov, Erkinjon G.
2013-01-01
Differential mobility spectrometry (DMS) separates ions on the basis of the difference in their migration rates under high versus low electric fields. Several models describing the physical nature of this field mobility dependence have been proposed but emerging as a dominant effect is the clusterization model sometimes referred to as the dynamic cluster-decluster model. DMS resolution and peak capacity is strongly influenced by the addition of modifiers which results in the formation and dissociation of clusters. This process increases selectivity due to the unique chemical interactions that occur between an ion and neutral gas phase molecules. It is thus imperative to bring the parameters influencing the chemical interactions under control and find ways to exploit them in order to improve the analytical utility of the device. In this paper we describe three important areas that need consideration in order to stabilize and capitalize on the chemical processes that dominate a DMS separation. The first involves means of controlling the dynamic equilibrium of the clustering reactions with high concentrations of specific reagents. The second area involves a means to deal with the unwanted heterogeneous cluster ion populations emitted from the electrospray ionization process that degrade resolution and sensitivity. The third involves fine control of parameters that affect the fundamental collision processes, temperature and pressure. PMID:20065515
A Study of The Binary and Anomalous Stellar Populations in Two Intermediate-Aged Open Clusters
NASA Astrophysics Data System (ADS)
Mathieu, Robert D.; Milliman, Katelyn; Geller, Aaron M.; Gosnell, Natalie
2010-08-01
``Anomalous'' stars, such as blue stragglers and more recently sub- subgiants, have been an enduring challenge for stellar evolution theory. It is now clear that in star clusters these systems are closely linked to the binary star populations. Furthermore, sophisticated N-body models show that stellar dynamical processes play a central role in the formation of such anomalous stars. These stars trace the interface between the classical fields of stellar evolution and stellar dynamics. We propose to expand our highly successful radial-velocity survey to include two new rich open clusters NGC 7789 (1.8 Gyr, -0.1 dex) and NGC 2506 (2.1 Gyr, -0.4 dex) as part of the WIYN Open Cluster Study (WOCS). Though these two clusters are both of intermediate age and of similar richness, they have quite different blue straggler populations. NGC 2506 has only 10 known blue stragglers, while NGC 7789 has at least 27, among the largest known populations of blue stragglers in an open cluster. Defining the hard-binary populations in these two clusters is critical for understanding the factors that determine blue straggler production rates. Our proposed observations will establish the hard- binary fraction and frequency distributions of orbital parameters (periods, eccentricities, mass-ratios, etc.) for orbital periods approaching the hard-soft boundary, and will provide a comprehensive survey of the blue stragglers and other anomalous stars, including secure cluster memberships and binary properties. These data will then form direct constraints for detailed N-body open cluster simulations from which we will study the impact of the hard-binary population on the production rates and mechanisms of blue stragglers.
NASA Astrophysics Data System (ADS)
Philit, S.; Soliva, R.; Chemenda, A. I.
2017-12-01
Because sandstones form good reservoirs for hydrocarbon, water or C02 storage, the understanding of the deformation processes in sandstones is major. The deformation band clusters result from the localization of the deformation in porous sandstones under the form of gathered low-permeability cataclastic deformation bands. It has recently been shown that this localization is favored in extensional tectonics. The clusters measure tens to hundreds of meters in extent and propagate vertically as long as the sandstone is clean. Because the clusters can form several kilometers long networks, they are likely to hamper fluid flow during reservoir exploitation. Yet, the processes of band accumulation linked to the evolution of the clusters to a potential faulting are poorly understood. An integrated study coupling a microscopic analysis of the deformed granular material in clusters from 7 sites in the world and distinct element numerical modeling permits to propose a model for cluster growth. Our microscopic analysis reveals that the clusters display varying degree of cataclasis, with the most important degrees in the bands. This cataclasis is accompanied by porosity reduction (more reduced in thrust Andersonian regime), and increased Particle Size Distribution. This testifies of an important packing and implies an increased number of particle coordination. During deformation, the grain shape is both smoothened and roughened; the averaged values of the roundness and circularity indicate a rapid roughening of the clasts at the first stages of deformation followed by a slight smoothening. The roughening of the clasts in densely packed material induces high friction and strengthens the material. High residual porosity at some band edges suggests a local dilatant behavior of sheared material. Our distinct element numerical models and other particle models in the literature confirm this observation. The development of force chains with low particle coordination at these locations would weaken the stress resistance at the contact points. Hence, the cluster growth would be promoted by the successive localization of bands the edges of preexisting bands. Faulting could occur at any stage of the cluster development, probably favored along interfaces of minimized strength with smooth geometry.
Nielsen, J D; Dean, C B
2008-09-01
A flexible semiparametric model for analyzing longitudinal panel count data arising from mixtures is presented. Panel count data refers here to count data on recurrent events collected as the number of events that have occurred within specific follow-up periods. The model assumes that the counts for each subject are generated by mixtures of nonhomogeneous Poisson processes with smooth intensity functions modeled with penalized splines. Time-dependent covariate effects are also incorporated into the process intensity using splines. Discrete mixtures of these nonhomogeneous Poisson process spline models extract functional information from underlying clusters representing hidden subpopulations. The motivating application is an experiment to test the effectiveness of pheromones in disrupting the mating pattern of the cherry bark tortrix moth. Mature moths arise from hidden, but distinct, subpopulations and monitoring the subpopulation responses was of interest. Within-cluster random effects are used to account for correlation structures and heterogeneity common to this type of data. An estimating equation approach to inference requiring only low moment assumptions is developed and the finite sample properties of the proposed estimating functions are investigated empirically by simulation.
Formation and Assembly of Massive Star Clusters
NASA Astrophysics Data System (ADS)
McMillan, Stephen
The formation of stars and star clusters is a major unresolved problem in astrophysics. It is central to modeling stellar populations and understanding galaxy luminosity distributions in cosmological models. Young massive clusters are major components of starburst galaxies, while globular clusters are cornerstones of the cosmic distance scale and represent vital laboratories for studies of stellar dynamics and stellar evolution. Yet how these clusters form and how rapidly and efficiently they expel their natal gas remain unclear, as do the consequences of this gas expulsion for cluster structure and survival. Also unclear is how the properties of low-mass clusters, which form from small-scale instabilities in galactic disks and inform much of our understanding of cluster formation and star-formation efficiency, differ from those of more massive clusters, which probably formed in starburst events driven by fast accretion at high redshift, or colliding gas flows in merging galaxies. Modeling cluster formation requires simulating many simultaneous physical processes, placing stringent demands on both software and hardware. Simulations of galaxies evolving in cosmological contexts usually lack the numerical resolution to simulate star formation in detail. They do not include detailed treatments of important physical effects such as magnetic fields, radiation pressure, ionization, and supernova feedback. Simulations of smaller clusters include these effects, but fall far short of the mass of even single young globular clusters. With major advances in computing power and software, we can now directly address this problem. We propose to model the formation of massive star clusters by integrating the FLASH adaptive mesh refinement magnetohydrodynamics (MHD) code into the Astrophysical Multi-purpose Software Environment (AMUSE) framework, to work with existing stellar-dynamical and stellar evolution modules in AMUSE. All software will be freely distributed on-line, allowing open access to state-of- the-art simulation techniques within a modern, modular software environment. We will follow the gravitational collapse of 0.1-10 million-solar mass gas clouds through star formation and coalescence into a star cluster, modeling in detail the coupling of the gas and the newborn stars. We will study the effects of star formation by detecting accreting regions of gas in self-gravitating, turbulent, MHD, FLASH models that we will translate into collisional dynamical systems of stars modeled with an N-body code, coupled together in the AMUSE framework. Our FLASH models will include treatments of radiative transfer from the newly formed stars, including heating and radiative acceleration of the surrounding gas. Specific questions to be addressed are: (1) How efficiently does the gas in a star forming region form stars, how does this depend on mass, metallicity, and other parameters, and what terminates star formation? What observational predictions can be made to constrain our models? (2) How important are different mechanisms for driving turbulence and removing gas from a cluster: accretion, radiative feedback, and mechanical feedback? (3) How does the infant mortality rate of young clusters depend on the initial properties of the parent cloud? (4) What are the characteristic formation timescales of massive star clusters, and what observable imprints does the assembly process leave on their structure at an age of 10-20 Myr, when formation is essentially complete and many clusters can be observed? These studies are directly relevant to NASA missions at many electromagnetic wavelengths, including Chandra, GALEX, Hubble, and Spitzer. Each traces different aspects of cluster formation and evolution: X-rays trace supernovae, ultraviolet traces young stars, visible colors can distinguish between young blue stars and older red stars, and the infrared directly shows young embedded star clusters.
Cluster preformation law for heavy and superheavy nuclei
NASA Astrophysics Data System (ADS)
Wei, K.; Zhang, H. F.
2017-08-01
The concept of cluster radioactivity has been extended to allow emitted particles with ZC>28 for superheavy nuclei by nuclear theory [Poenaru et al., Phys. Rev. Lett. 107, 062503 (2011), 10.1103/PhysRevLett.107.062503]. The preformation and emission mechanics of heavy-ion particles must be examined again before the fascinating radioactivity is observed for superheavy nuclei in laboratory. We extract the cluster preformation factor for heavy and superheavy nuclei within a preformed cluster model, in which the decay constant is the product of the preformation factor, assault frequency, and penetration probability. The calculated results show that the cluster penetration probability for superheavy nuclei is larger than that for actinide elements. The preformation factor depends on the nuclear structures of the emitted cluster and mother nucleus, and the well-known cluster preformation law S (AC) =S (α) (AC-1 )/3 [Blendowske and Walliser, Phys. Rev. Lett. 61, 1930 (1988), 10.1103/PhysRevLett.61.1930] will break down when the mass number of the emitted cluster Ac>28 , and new preformation formulas are proposed to estimate the preformation factor for heavy and superheavy nuclei.
Clustering PPI data by combining FA and SHC method.
Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Ying Wah, Teh
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
NASA Astrophysics Data System (ADS)
Zhang, Jiangjiang; Lin, Guang; Li, Weixuan; Wu, Laosheng; Zeng, Lingzao
2018-03-01
Ensemble smoother (ES) has been widely used in inverse modeling of hydrologic systems. However, for problems where the distribution of model parameters is multimodal, using ES directly would be problematic. One popular solution is to use a clustering algorithm to identify each mode and update the clusters with ES separately. However, this strategy may not be very efficient when the dimension of parameter space is high or the number of modes is large. Alternatively, we propose in this paper a very simple and efficient algorithm, i.e., the iterative local updating ensemble smoother (ILUES), to explore multimodal distributions of model parameters in nonlinear hydrologic systems. The ILUES algorithm works by updating local ensembles of each sample with ES to explore possible multimodal distributions. To achieve satisfactory data matches in nonlinear problems, we adopt an iterative form of ES to assimilate the measurements multiple times. Numerical cases involving nonlinearity and multimodality are tested to illustrate the performance of the proposed method. It is shown that overall the ILUES algorithm can well quantify the parametric uncertainties of complex hydrologic models, no matter whether the multimodal distribution exists.
NASA Astrophysics Data System (ADS)
Lépinoux, J.; Sigli, C.
2018-01-01
In a recent paper, the authors showed how the clusters free energies are constrained by the coagulation probability, and explained various anomalies observed during the precipitation kinetics in concentrated alloys. This coagulation probability appeared to be a too complex function to be accurately predicted knowing only the cluster distribution in Cluster Dynamics (CD). Using atomistic Monte Carlo (MC) simulations, it is shown that during a transformation at constant temperature, after a short transient regime, the transformation occurs at quasi-equilibrium. It is proposed to use MC simulations until the system quasi-equilibrates then to switch to CD which is mean field but not limited by a box size like MC. In this paper, we explain how to take into account the information available before the quasi-equilibrium state to establish guidelines to safely predict the cluster free energies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Lester; Nachman, Benjamin; Schwartzman, Ariel
Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets . To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets , are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variablesmore » in boosted topologies. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.« less
An Island Grouping Genetic Algorithm for Fuzzy Partitioning Problems
Salcedo-Sanz, S.; Del Ser, J.; Geem, Z. W.
2014-01-01
This paper presents a novel fuzzy clustering technique based on grouping genetic algorithms (GGAs), which are a class of evolutionary algorithms especially modified to tackle grouping problems. Our approach hinges on a GGA devised for fuzzy clustering by means of a novel encoding of individuals (containing elements and clusters sections), a new fitness function (a superior modification of the Davies Bouldin index), specially tailored crossover and mutation operators, and the use of a scheme based on a local search and a parallelization process, inspired from an island-based model of evolution. The overall performance of our approach has been assessed over a number of synthetic and real fuzzy clustering problems with different objective functions and distance measures, from which it is concluded that the proposed approach shows excellent performance in all cases. PMID:24977235
Mackey, Lester; Nachman, Benjamin; Schwartzman, Ariel; ...
2016-06-01
Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets . To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets , are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variablesmore » in boosted topologies. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.« less
Externalizing disorders: cluster 5 of the proposed meta-structure for DSM-V and ICD-11.
Krueger, R F; South, S C
2009-12-01
The extant major psychiatric classifications DSM-IV and ICD-10 are purportedly atheoretical and largely descriptive. Although this achieves good reliability, the validity of a medical diagnosis is greatly enhanced by an understanding of the etiology. In an attempt to group mental disorders on the basis of etiology, five clusters have been proposed. We consider the validity of the fifth cluster, externalizing disorders, within this proposal. We reviewed the literature in relation to 11 validating criteria proposed by the Study Group of the DSM-V Task Force, in terms of the extent to which these criteria support the idea of a coherent externalizing spectrum of disorders. This cluster distinguishes itself by the central role of disinhibitory personality in mental disorders spread throughout sections of the current classifications, including substance dependence, antisocial personality disorder and conduct disorder. Shared biomarkers, co-morbidity and course offer additional evidence for a valid cluster of externalizing disorders. Externalizing disorders meet many of the salient criteria proposed by the Study Group of the DSM-V Task Force to suggest a classification cluster.
Towards accurate modeling of noncovalent interactions for protein rigidity analysis.
Fox, Naomi; Streinu, Ileana
2013-01-01
Protein rigidity analysis is an efficient computational method for extracting flexibility information from static, X-ray crystallography protein data. Atoms and bonds are modeled as a mechanical structure and analyzed with a fast graph-based algorithm, producing a decomposition of the flexible molecule into interconnected rigid clusters. The result depends critically on noncovalent atomic interactions, primarily on how hydrogen bonds and hydrophobic interactions are computed and modeled. Ongoing research points to the stringent need for benchmarking rigidity analysis software systems, towards the goal of increasing their accuracy and validating their results, either against each other and against biologically relevant (functional) parameters. We propose two new methods for modeling hydrogen bonds and hydrophobic interactions that more accurately reflect a mechanical model, without being computationally more intensive. We evaluate them using a novel scoring method, based on the B-cubed score from the information retrieval literature, which measures how well two cluster decompositions match. To evaluate the modeling accuracy of KINARI, our pebble-game rigidity analysis system, we use a benchmark data set of 20 proteins, each with multiple distinct conformations deposited in the Protein Data Bank. Cluster decompositions for them were previously determined with the RigidFinder method from Gerstein's lab and validated against experimental data. When KINARI's default tuning parameters are used, an improvement of the B-cubed score over a crude baseline is observed in 30% of this data. With our new modeling options, improvements were observed in over 70% of the proteins in this data set. We investigate the sensitivity of the cluster decomposition score with case studies on pyruvate phosphate dikinase and calmodulin. To substantially improve the accuracy of protein rigidity analysis systems, thorough benchmarking must be performed on all current systems and future extensions. We have measured the gain in performance by comparing different modeling methods for noncovalent interactions. We showed that new criteria for modeling hydrogen bonds and hydrophobic interactions can significantly improve the results. The two new methods proposed here have been implemented and made publicly available in the current version of KINARI (v1.3), together with the benchmarking tools, which can be downloaded from our software's website, http://kinari.cs.umass.edu.
Towards accurate modeling of noncovalent interactions for protein rigidity analysis
2013-01-01
Background Protein rigidity analysis is an efficient computational method for extracting flexibility information from static, X-ray crystallography protein data. Atoms and bonds are modeled as a mechanical structure and analyzed with a fast graph-based algorithm, producing a decomposition of the flexible molecule into interconnected rigid clusters. The result depends critically on noncovalent atomic interactions, primarily on how hydrogen bonds and hydrophobic interactions are computed and modeled. Ongoing research points to the stringent need for benchmarking rigidity analysis software systems, towards the goal of increasing their accuracy and validating their results, either against each other and against biologically relevant (functional) parameters. We propose two new methods for modeling hydrogen bonds and hydrophobic interactions that more accurately reflect a mechanical model, without being computationally more intensive. We evaluate them using a novel scoring method, based on the B-cubed score from the information retrieval literature, which measures how well two cluster decompositions match. Results To evaluate the modeling accuracy of KINARI, our pebble-game rigidity analysis system, we use a benchmark data set of 20 proteins, each with multiple distinct conformations deposited in the Protein Data Bank. Cluster decompositions for them were previously determined with the RigidFinder method from Gerstein's lab and validated against experimental data. When KINARI's default tuning parameters are used, an improvement of the B-cubed score over a crude baseline is observed in 30% of this data. With our new modeling options, improvements were observed in over 70% of the proteins in this data set. We investigate the sensitivity of the cluster decomposition score with case studies on pyruvate phosphate dikinase and calmodulin. Conclusion To substantially improve the accuracy of protein rigidity analysis systems, thorough benchmarking must be performed on all current systems and future extensions. We have measured the gain in performance by comparing different modeling methods for noncovalent interactions. We showed that new criteria for modeling hydrogen bonds and hydrophobic interactions can significantly improve the results. The two new methods proposed here have been implemented and made publicly available in the current version of KINARI (v1.3), together with the benchmarking tools, which can be downloaded from our software's website, http://kinari.cs.umass.edu. PMID:24564209
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.
Kong, Xiang-Zhen; Liu, Jin-Xing; Zheng, Chun-Hou; Hou, Mi-Xiao; Wang, Juan
2017-07-01
High dimensionality has become a typical feature of biomolecular data. In this paper, a novel dimension reduction method named p-norm singular value decomposition (PSVD) is proposed to seek the low-rank approximation matrix to the biomolecular data. To enhance the robustness to outliers, the Lp-norm is taken as the error function and the Schatten p-norm is used as the regularization function in the optimization model. To evaluate the performance of PSVD, the Kmeans clustering method is then employed for tumor clustering based on the low-rank approximation matrix. Extensive experiments are carried out on five gene expression data sets including two benchmark data sets and three higher dimensional data sets from the cancer genome atlas. The experimental results demonstrate that the PSVD-based method outperforms many existing methods. Especially, it is experimentally proved that the proposed method is more efficient for processing higher dimensional data with good robustness, stability, and superior time performance.
Robust subspace clustering via joint weighted Schatten-p norm and Lq norm minimization
NASA Astrophysics Data System (ADS)
Zhang, Tao; Tang, Zhenmin; Liu, Qing
2017-05-01
Low-rank representation (LRR) has been successfully applied to subspace clustering. However, the nuclear norm in the standard LRR is not optimal for approximating the rank function in many real-world applications. Meanwhile, the L21 norm in LRR also fails to characterize various noises properly. To address the above issues, we propose an improved LRR method, which achieves low rank property via the new formulation with weighted Schatten-p norm and Lq norm (WSPQ). Specifically, the nuclear norm is generalized to be the Schatten-p norm and different weights are assigned to the singular values, and thus it can approximate the rank function more accurately. In addition, Lq norm is further incorporated into WSPQ to model different noises and improve the robustness. An efficient algorithm based on the inexact augmented Lagrange multiplier method is designed for the formulated problem. Extensive experiments on face clustering and motion segmentation clearly demonstrate the superiority of the proposed WSPQ over several state-of-the-art methods.
Sefuba, Maria; Walingo, Tom; Takawira, Fambirai
2015-09-18
This paper presents an Energy Efficient Medium Access Control (MAC) protocol for clustered wireless sensor networks that aims to improve energy efficiency and delay performance. The proposed protocol employs an adaptive cross-layer intra-cluster scheduling and an inter-cluster relay selection diversity. The scheduling is based on available data packets and remaining energy level of the source node (SN). This helps to minimize idle listening on nodes without data to transmit as well as reducing control packet overhead. The relay selection diversity is carried out between clusters, by the cluster head (CH), and the base station (BS). The diversity helps to improve network reliability and prolong the network lifetime. Relay selection is determined based on the communication distance, the remaining energy and the channel quality indicator (CQI) for the relay cluster head (RCH). An analytical framework for energy consumption and transmission delay for the proposed MAC protocol is presented in this work. The performance of the proposed MAC protocol is evaluated based on transmission delay, energy consumption, and network lifetime. The results obtained indicate that the proposed MAC protocol provides improved performance than traditional cluster based MAC protocols.
Sefuba, Maria; Walingo, Tom; Takawira, Fambirai
2015-01-01
This paper presents an Energy Efficient Medium Access Control (MAC) protocol for clustered wireless sensor networks that aims to improve energy efficiency and delay performance. The proposed protocol employs an adaptive cross-layer intra-cluster scheduling and an inter-cluster relay selection diversity. The scheduling is based on available data packets and remaining energy level of the source node (SN). This helps to minimize idle listening on nodes without data to transmit as well as reducing control packet overhead. The relay selection diversity is carried out between clusters, by the cluster head (CH), and the base station (BS). The diversity helps to improve network reliability and prolong the network lifetime. Relay selection is determined based on the communication distance, the remaining energy and the channel quality indicator (CQI) for the relay cluster head (RCH). An analytical framework for energy consumption and transmission delay for the proposed MAC protocol is presented in this work. The performance of the proposed MAC protocol is evaluated based on transmission delay, energy consumption, and network lifetime. The results obtained indicate that the proposed MAC protocol provides improved performance than traditional cluster based MAC protocols. PMID:26393608
The peculiar velocities of rich clusters in the hot and cold dark matter scenarios
NASA Technical Reports Server (NTRS)
Rhee, George F.; West, Michael J.; Villumsen, Jens V.
1993-01-01
We present the results of a study of the peculiar velocities of rich clusters of galaxies. The peculiar motion of rich clusters in various cosmological scenarios is of interest for a number of reasons. Observationally, one can measure the peculiar motion of clusters to greater distances than galaxies because cluster peculiar motions can be determined to greater accuracy. One can also test the slope of distance indicator relations using clusters to see if galaxy properties vary with environment. We have used N-body simulations to measure the amplitude and rms cluster peculiar velocity as a function of bias parameter in the hot and cold dark matter scenarios. In addition to measuring the mean and rms peculiar velocity of clusters in the two models, we determined whether the peculiar velocity vector of a given cluster is well aligned with the gravity vector due to all the particles in the simulation and the gravity vector due to the particles present only in the clusters. We have investigated the peculiar velocities of rich clusters of galaxies in the cold dark matter and hot dark matter galaxy formation scenarios. We have derived peculiar velocities and associated errors for the scenarios using four values of the bias parameter ranging from b = 1 to b = 2.5. The growth of the mean peculiar velocity with scale factor has been determined and compared to that predicted by linear theory. In addition, we have compared the orientation of force and velocity in these simulations to see if a program such as that proposed by Bertschinger and Dekel (1989) for elliptical galaxy peculiar motions can be applied to clusters. The method they describe enables one to recover the density field from large scale redshift distance samples. The method makes it possible to do this when only radial velocities are known by assuming that the velocity field is curl free. Our analysis suggests that this program if applied to clusters is only realizable for models with a low value of the bias parameter, i.e., models in which the peculiar velocities of clusters are large enough that the errors do not render the analysis impracticable.
The degree-related clustering coefficient and its application to link prediction
NASA Astrophysics Data System (ADS)
Liu, Yangyang; Zhao, Chengli; Wang, Xiaojie; Huang, Qiangjuan; Zhang, Xue; Yi, Dongyun
2016-07-01
Link prediction plays a significant role in explaining the evolution of networks. However it is still a challenging problem that has been addressed only with topological information in recent years. Based on the belief that network nodes with a great number of common neighbors are more likely to be connected, many similarity indices have achieved considerable accuracy and efficiency. Motivated by the natural assumption that the effect of missing links on the estimation of a node's clustering ability could be related to node degree, in this paper, we propose a degree-related clustering coefficient index to quantify the clustering ability of nodes. Unlike the classical clustering coefficient, our new coefficient is highly robust when the observed bias of links is considered. Furthermore, we propose a degree-related clustering ability path (DCP) index, which applies the proposed coefficient to the link prediction problem. Experiments on 12 real-world networks show that our proposed method is highly accurate and robust compared with four common-neighbor-based similarity indices (Common Neighbors(CN), Adamic-Adar(AA), Resource Allocation(RA), and Preferential Attachment(PA)), and the recently introduced clustering ability (CA) index.
An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images.
Chin Neoh, Siew; Srisukkham, Worawut; Zhang, Li; Todryk, Stephen; Greystoke, Brigit; Peng Lim, Chee; Alamgir Hossain, Mohammed; Aslam, Nauman
2015-10-09
This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method.
An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images
Chin Neoh, Siew; Srisukkham, Worawut; Zhang, Li; Todryk, Stephen; Greystoke, Brigit; Peng Lim, Chee; Alamgir Hossain, Mohammed; Aslam, Nauman
2015-01-01
This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method. PMID:26450665
Deletion Diagnostics for Alternating Logistic Regressions
Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.
2013-01-01
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960
Query Results Clustering by Extending SPARQL with CLUSTER BY
NASA Astrophysics Data System (ADS)
Ławrynowicz, Agnieszka
The task of dynamic clustering of the search results proved to be useful in the Web context, where the user often does not know the granularity of the search results in advance. The goal of this paper is to provide a declarative way for invoking dynamic clustering of the results of queries submitted over Semantic Web data. To achieve this goal the paper proposes an approach that extends SPARQL by clustering abilities. The approach introduces a new statement, CLUSTER BY, into the SPARQL grammar and proposes semantics for such extension.
Li, Jinyan; Fong, Simon; Sung, Yunsick; Cho, Kyungeun; Wong, Raymond; Wong, Kelvin K L
2016-01-01
An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Blessy, S A Praylin Selva; Sulochana, C Helen
2015-01-01
Segmentation of brain tumor from Magnetic Resonance Imaging (MRI) becomes very complicated due to the structural complexities of human brain and the presence of intensity inhomogeneities. To propose a method that effectively segments brain tumor from MR images and to evaluate the performance of unsupervised optimal fuzzy clustering (UOFC) algorithm for segmentation of brain tumor from MR images. Segmentation is done by preprocessing the MR image to standardize intensity inhomogeneities followed by feature extraction, feature fusion and clustering. Different validation measures are used to evaluate the performance of the proposed method using different clustering algorithms. The proposed method using UOFC algorithm produces high sensitivity (96%) and low specificity (4%) compared to other clustering methods. Validation results clearly show that the proposed method with UOFC algorithm effectively segments brain tumor from MR images.
NASA Astrophysics Data System (ADS)
Seo, Junyeong; Sung, Youngchul
2018-06-01
In this paper, an efficient transmit beam design and user scheduling method is proposed for multi-user (MU) multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) downlink, based on Pareto-optimality. The proposed beam design and user scheduling method groups simultaneously-served users into multiple clusters with practical two users in each cluster, and then applies spatical zeroforcing (ZF) across clusters to control inter-cluster interference (ICI) and Pareto-optimal beam design with successive interference cancellation (SIC) to two users in each cluster to remove interference to strong users and leverage signal-to-interference-plus-noise ratios (SINRs) of interference-experiencing weak users. The proposed method has flexibility to control the rates of strong and weak users and numerical results show that the proposed method yields good performance.
A Web service substitution method based on service cluster nets
NASA Astrophysics Data System (ADS)
Du, YuYue; Gai, JunJing; Zhou, MengChu
2017-11-01
Service substitution is an important research topic in the fields of Web services and service-oriented computing. This work presents a novel method to analyse and substitute Web services. A new concept, called a Service Cluster Net Unit, is proposed based on Web service clusters. A service cluster is converted into a Service Cluster Net Unit. Then it is used to analyse whether the services in the cluster can satisfy some service requests. Meanwhile, the substitution methods of an atomic service and a composite service are proposed. The correctness of the proposed method is proved, and the effectiveness is shown and compared with the state-of-the-art method via an experiment. It can be readily applied to e-commerce service substitution to meet the business automation needs.
Locally Weighted Ensemble Clustering.
Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang
2018-05-01
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
Cluster Correspondence Analysis.
van de Velden, M; D'Enza, A Iodice; Palumbo, F
2017-03-01
A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
NASA Astrophysics Data System (ADS)
Mulchaey, John
Most galaxy formation models predict that massive low-redshift disk galaxies are embedded in extended hot halos of externally accreted gas. Such gas appears necessary to maintain ongoing star formation in isolated spirals like the Milky Way. To explain the large population of red galaxies in rich groups and clusters, most galaxy evolution models assume that these hot gas halos are stripped completely when a galaxy enters a denser environment. This simple model has been remarkably successful at reproducing many observed properties of galaxies. Although theoretical arguments suggest hot gas halos are an important component in galaxies, we know very little about this gas from an observational standpoint. In fact, previous observations have failed to detect soft X-ray emission from such halos in disk galaxies. Furthermore, the assumption that hot gas halos are stripped completely when a galaxy enters a group or cluster has not been verified. We propose to combine proprietary and archival XMM-Newton observations of galaxies in the field, groups and clusters to study how hot gas halos are impacted by environment. Our proposed program has three components: 1) The deepest search to date for a hot gas halo in a quiescent spiral galaxy. A detection will confirm a basic tenet of disk galaxy formation models, whereas a non-detection will seriously challenge these models and impose new constraints on the growth mode and feedback history of disk galaxies. 2) A detailed study of the hot gas halos properties of field early-type galaxies. As environmental processes such as stripping are not expected to be important in the field, a study of hot gas halos in this environment will allow us to better understand how feedback and other internal processes impact hot gas halos. 3) A study of hot gas halos in the outskirts of groups and clusters. By comparing observations with our suite of simulations we can begin to understand what role the stripping of hot gas halos plays in galaxy evolution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Geller, Aaron M.; Hurley, Jarrod R.; Mathieu, Robert D., E-mail: a-geller@northwestern.edu, E-mail: mathieu@astro.wisc.edu, E-mail: jhurley@astro.swin.edu.au
2013-01-01
Following on from a recently completed radial-velocity survey of the old (7 Gyr) open cluster NGC 188 in which we studied in detail the solar-type hard binaries and blue stragglers of the cluster, here we investigate the dynamical evolution of NGC 188 through a sophisticated N-body model. Importantly, we employ the observed binary properties of the young (180 Myr) open cluster M35, where possible, to guide our choices for parameters of the initial binary population. We apply pre-main-sequence tidal circularization and a substantial increase to the main-sequence tidal circularization rate, both of which are necessary to match the observed tidalmore » circularization periods in the literature, including that of NGC 188. At 7 Gyr the main-sequence solar-type hard-binary population in the model matches that of NGC 188 in both binary frequency and distributions of orbital parameters. This agreement between the model and observations is in a large part due to the similarities between the NGC 188 and M35 solar-type binaries. Indeed, among the 7 Gyr main-sequence binaries in the model, only those with P {approx}> 1000 days begin to show potentially observable evidence for modifications by dynamical encounters, even after 7 Gyr of evolution within the star cluster. This emphasizes the importance of defining accurate initial conditions for star cluster models, which we propose is best accomplished through comparisons with observations of young open clusters like M35. Furthermore, this finding suggests that observations of the present-day binaries in even old open clusters can provide valuable information on their primordial binary populations. However, despite the model's success at matching the observed solar-type main-sequence population, the model underproduces blue stragglers and produces an overabundance of long-period circular main-sequence-white-dwarf binaries as compared with the true cluster. We explore several potential solutions to the paucity of blue stragglers and conclude that the model dramatically underproduces blue stragglers through mass-transfer processes. We suggest that common-envelope evolution may have been incorrectly imposed on the progenitors of the spurious long-period circular main-sequence-white-dwarf binaries, which perhaps instead should have gone through stable mass transfer to create blue stragglers, thereby bringing both the number and binary frequency of the blue straggler population in the model into agreement with the true blue stragglers in NGC 188. Thus, improvements in the physics of mass transfer and common-envelope evolution employed in the model may in fact solve both discrepancies with the observations. This project highlights the unique accessibility of open clusters to both comprehensive observational surveys and full-scale N-body simulations, both of which have only recently matured sufficiently to enable such a project, and underscores the importance of open clusters to the study of star cluster dynamics.« less
Intergalactic stellar populations in intermediate redshift clusters
NASA Astrophysics Data System (ADS)
Melnick, J.; Giraud, E.; Toledo, I.; Selman, F.; Quintana, H.
2012-11-01
A substantial fraction of the total stellar mass in rich clusters of galaxies resides in a diffuse intergalactic component usually referred to as the intracluster light (ICL). Theoretical models indicate that these intergalactic stars originate mostly from the tidal interaction of the cluster galaxies during the assembly history of the cluster, and that a significant fraction of these stars could have formed in situ from the late infall of cold metal-poor gas clouds on to the cluster. However, these models also overpredict the fraction of stellar mass in the ICL by a substantial margin, something that is still not well understood. The models also make predictions about the age distribution of the ICL stars, which may provide additional observational constraints. Here we present population synthesis models for the ICL of an intermediate redshift (z = 0.29) X-ray cluster that we have extensively studied in previous papers. The advantage of observing intermediate redshift clusters rather than nearby ones is that the former fit the field of view of multi-object spectrographs in 8-m telescopes and therefore permit us to encompass most of the ICL with only a few well-placed slits. In this paper we show that by stacking spectra at different locations within the ICL it is possible to reach sufficiently high signal-to-noise ratios to fit population synthesis models and derive meaningful results. The models provide ages and metallicities for the dominant populations at several different locations within the ICL and the brightest cluster galaxies (BCG) halo, as well as measures of the kinematics of the stars as a function of distance from the BCG. We thus find that the ICL in our cluster is dominated by old metal-rich stars, at odds with what has been found in nearby clusters where the stars that dominate the ICL are old and metal poor. While we see weak evidence of a young, metal-poor component, if real, these young stars would amount to less than 1 per cent of the total ICL mass, much less than the up to 30 per cent predicted by the models. We propose that the very metal-rich (i.e. 2.5× solar) stars in the ICL of our cluster, which comprise ˜40 per cent of the total mass, originate mostly from the central dumb-bell galaxy, while the remaining solar and metal-poor stars come from spiral, post-starburst (E+A) and metal-poor dwarf galaxies. About 16 per cent of the ICL stars are old and metal poor.
Weighted community detection and data clustering using message passing
NASA Astrophysics Data System (ADS)
Shi, Cheng; Liu, Yanchen; Zhang, Pan
2018-03-01
Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.
Color analysis and image rendering of woodblock prints with oil-based ink
NASA Astrophysics Data System (ADS)
Horiuchi, Takahiko; Tanimoto, Tetsushi; Tominaga, Shoji
2012-01-01
This paper proposes a method for analyzing the color characteristics of woodblock prints having oil-based ink and rendering realistic images based on camera data. The analysis results of woodblock prints show some characteristic features in comparison with oil paintings: 1) A woodblock print can be divided into several cluster areas, each with similar surface spectral reflectance; and 2) strong specular reflection from the influence of overlapping paints arises only in specific cluster areas. By considering these properties, we develop an effective rendering algorithm by modifying our previous algorithm for oil paintings. A set of surface spectral reflectances of a woodblock print is represented by using only a small number of average surface spectral reflectances and the registered scaling coefficients, whereas the previous algorithm for oil paintings required surface spectral reflectances of high dimension at all pixels. In the rendering process, in order to reproduce the strong specular reflection in specific cluster areas, we use two sets of parameters in the Torrance-Sparrow model for cluster areas with or without strong specular reflection. An experiment on a woodblock printing with oil-based ink was performed to demonstrate the feasibility of the proposed method.
NASA Astrophysics Data System (ADS)
Shen, Fei; Chen, Chao; Yan, Ruqiang
2017-05-01
Classical bearing fault diagnosis methods, being designed according to one specific task, always pay attention to the effectiveness of extracted features and the final diagnostic performance. However, most of these approaches suffer from inefficiency when multiple tasks exist, especially in a real-time diagnostic scenario. A fault diagnosis method based on Non-negative Matrix Factorization (NMF) and Co-clustering strategy is proposed to overcome this limitation. Firstly, some high-dimensional matrixes are constructed using the Short-Time Fourier Transform (STFT) features, where the dimension of each matrix equals to the number of target tasks. Then, the NMF algorithm is carried out to obtain different components in each dimension direction through optimized matching, such as Euclidean distance and divergence distance. Finally, a Co-clustering technique based on information entropy is utilized to realize classification of each component. To verity the effectiveness of the proposed approach, a series of bearing data sets were analysed in this research. The tests indicated that although the diagnostic performance of single task is comparable to traditional clustering methods such as K-mean algorithm and Guassian Mixture Model, the accuracy and computational efficiency in multi-tasks fault diagnosis are improved.
Fulton, Kara A.; Liu, Danping; Haynie, Denise L.; Albert, Paul S.
2016-01-01
The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian–Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored. PMID:26937263
The Next Generation of Numerical Modeling in Mergers- Constraining the Star Formation Law
NASA Astrophysics Data System (ADS)
Chien, Li-Hsin
2010-09-01
Spectacular images of colliding galaxies like the "Antennae", taken with the Hubble Space Telescope, have revealed that a burst of star/cluster formation occurs whenever gas-rich galaxies interact. A?The ages and locations of these clusters reveal the interaction history and provide crucial clues to the process of star formation in galaxies. A?We propose to carry out state-of-the-art numerical simulations to model six nearby galaxy mergers {Arp 256, NGC 7469, NGC 4038/39, NGC 520, NGC 2623, NGC 3256}, hence increasing the number with this level of sophistication by a factor of 3. These simulations provide specific predictions for the age and spatial distributions of young star clusters. The comparison between these simulation results and the observations will allow us to answer a number of fundamental questions including: 1} is shock-induced or density-dependent star formation the dominant mechanism; 2} are the demographics {i.e. mass and age distributions} of the clusters in different mergers similar, i.e. "universal", or very different; and 3} will it be necessary to include other mechanisms, e.g., locally triggered star formation, in the models to better match the observations?
NASA Astrophysics Data System (ADS)
van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap
2018-04-01
In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.
A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters.
Tango, Toshiro; Takahashi, Kunihiko
2012-12-30
Spatial scan statistics are widely used tools for detection of disease clusters. Especially, the circular spatial scan statistic proposed by Kulldorff (1997) has been utilized in a wide variety of epidemiological studies and disease surveillance. However, as it cannot detect noncircular, irregularly shaped clusters, many authors have proposed different spatial scan statistics, including the elliptic version of Kulldorff's scan statistic. The flexible spatial scan statistic proposed by Tango and Takahashi (2005) has also been used for detecting irregularly shaped clusters. However, this method sets a feasible limitation of a maximum of 30 nearest neighbors for searching candidate clusters because of heavy computational load. In this paper, we show a flexible spatial scan statistic implemented with a restricted likelihood ratio proposed by Tango (2008) to (1) eliminate the limitation of 30 nearest neighbors and (2) to have surprisingly much less computational time than the original flexible spatial scan statistic. As a side effect, it is shown to be able to detect clusters with any shape reasonably well as the relative risk of the cluster becomes large via Monte Carlo simulation. We illustrate the proposed spatial scan statistic with data on mortality from cerebrovascular disease in the Tokyo Metropolitan area, Japan. Copyright © 2012 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Zhang, Han; Chen, Xuefeng; Du, Zhaohui; Li, Xiang; Yan, Ruqiang
2016-04-01
Fault information of aero-engine bearings presents two particular phenomena, i.e., waveform distortion and impulsive feature frequency band dispersion, which leads to a challenging problem for current techniques of bearing fault diagnosis. Moreover, although many progresses of sparse representation theory have been made in feature extraction of fault information, the theory also confronts inevitable performance degradation due to the fact that relatively weak fault information has not sufficiently prominent and sparse representations. Therefore, a novel nonlocal sparse model (coined NLSM) and its algorithm framework has been proposed in this paper, which goes beyond simple sparsity by introducing more intrinsic structures of feature information. This work adequately exploits the underlying prior information that feature information exhibits nonlocal self-similarity through clustering similar signal fragments and stacking them together into groups. Within this framework, the prior information is transformed into a regularization term and a sparse optimization problem, which could be solved through block coordinate descent method (BCD), is formulated. Additionally, the adaptive structural clustering sparse dictionary learning technique, which utilizes k-Nearest-Neighbor (kNN) clustering and principal component analysis (PCA) learning, is adopted to further enable sufficient sparsity of feature information. Moreover, the selection rule of regularization parameter and computational complexity are described in detail. The performance of the proposed framework is evaluated through numerical experiment and its superiority with respect to the state-of-the-art method in the field is demonstrated through the vibration signals of experimental rig of aircraft engine bearings.
A smart checkpointing scheme for improving the reliability of clustering routing protocols.
Min, Hong; Jung, Jinman; Kim, Bongjae; Cho, Yookun; Heo, Junyoung; Yi, Sangho; Hong, Jiman
2010-01-01
In wireless sensor networks, system architectures and applications are designed to consider both resource constraints and scalability, because such networks are composed of numerous sensor nodes with various sensors and actuators, small memories, low-power microprocessors, radio modules, and batteries. Clustering routing protocols based on data aggregation schemes aimed at minimizing packet numbers have been proposed to meet these requirements. In clustering routing protocols, the cluster head plays an important role. The cluster head collects data from its member nodes and aggregates the collected data. To improve reliability and reduce recovery latency, we propose a checkpointing scheme for the cluster head. In the proposed scheme, backup nodes monitor and checkpoint the current state of the cluster head periodically. We also derive the checkpointing interval that maximizes reliability while using the same amount of energy consumed by clustering routing protocols that operate without checkpointing. Experimental comparisons with existing non-checkpointing schemes show that our scheme reduces both energy consumption and recovery latency.
A Smart Checkpointing Scheme for Improving the Reliability of Clustering Routing Protocols
Min, Hong; Jung, Jinman; Kim, Bongjae; Cho, Yookun; Heo, Junyoung; Yi, Sangho; Hong, Jiman
2010-01-01
In wireless sensor networks, system architectures and applications are designed to consider both resource constraints and scalability, because such networks are composed of numerous sensor nodes with various sensors and actuators, small memories, low-power microprocessors, radio modules, and batteries. Clustering routing protocols based on data aggregation schemes aimed at minimizing packet numbers have been proposed to meet these requirements. In clustering routing protocols, the cluster head plays an important role. The cluster head collects data from its member nodes and aggregates the collected data. To improve reliability and reduce recovery latency, we propose a checkpointing scheme for the cluster head. In the proposed scheme, backup nodes monitor and checkpoint the current state of the cluster head periodically. We also derive the checkpointing interval that maximizes reliability while using the same amount of energy consumed by clustering routing protocols that operate without checkpointing. Experimental comparisons with existing non-checkpointing schemes show that our scheme reduces both energy consumption and recovery latency. PMID:22163389
Competitive aggregation dynamics using phase wave signals.
Sakaguchi, Hidetsugu; Maeyama, Satomi
2014-10-21
Coupled equations of the phase equation and the equation of cell concentration n are proposed for competitive aggregation dynamics of slime mold in two dimensions. Phase waves are used as tactic signals of aggregation in this model. Several aggregation clusters are formed initially, and target patterns appear around the localized aggregation clusters. Owing to the competition among target patterns, the number of the localized aggregation clusters decreases, and finally one dominant localized pattern survives. If the phase equation is replaced with the complex Ginzburg-Landau equation, several spiral patterns appear, and n is localized near the center of the spiral patterns. After the competition among spiral patterns, one dominant spiral survives. Copyright © 2014 Elsevier Ltd. All rights reserved.
Arcs from gravitational lensing
NASA Technical Reports Server (NTRS)
Grossman, Scott A.; Narayan, Ramesh
1988-01-01
The proposal made by Paczynski (1987) that the arcs of blue light found recently in two cluster cores are gravitationally lensed elongated images of background galaxies is investigated. It is shown that lenses that are circularly symmetric in projection produce pairs of arcs, in conflict with the observations. However, more realistic asymmetric lenses produce single arcs, which can become as elongated as the observed ones whenever the background galaxy is located on or close to a cusp caustic. Detailed computer simulations of lensing by clusters using a reasonable model of the mass distribution are presented. Elongated and curved lensed images longer than 10 arcsec occur in 12 percent of the simulated clusters. It is concluded that the lensing hypothesis must be taken seriously.
A critical assessment of models for the origin of multiple populations in globular clusters
NASA Astrophysics Data System (ADS)
Bastian, Nate
2017-03-01
A number of scenarios have been put forward to explain the origin of the chemical anomalies (and resulting complex colour-magnitude diagrams) observed in globular clusters (GCs), namely the AGB, Fast Rotating Massive Star, Very Massive Star, and Early Disc Accretion scenarios. We compare the predictions of these scenarios with a range of observations (including young massive clusters (YMCs), chemical patterns, and GC population properties) and find that all models are inconsistent with observations. In particular, YMCs do not show evidence for multiple epochs of star-formation and appear to be gas free by an age of ~ 3 Myr. Also, the chemical patterns displayed in GCs vary from one to the next in such a way that cannot be reproduced by standard nucleosynthetic yields. Finally, we show that the ``mass budget problem'' for the scenarios cannot be solved by invoking heavy cluster mass loss (i.e. that clusters were 10-100 times more massive at birth) as this solution makes basic predictions about the GC population that are inconsistent with observations. We conclude that none of the proposed scenarios can explain the multiple population phenomenon, hence alternative theories are needed.
Mechanisms behind overshoots in mean cluster size profiles in aggregation-breakup processes.
Sadegh-Vaziri, Ramiar; Ludwig, Kristin; Sundmacher, Kai; Babler, Matthaus U
2018-05-26
Aggregation and breakup of small particles in stirred suspensions often shows an overshoot in the time evolution of the mean cluster size: Starting from a suspension of primary particles the mean cluster size first increases before going through a maximum beyond which a slow relaxation sets in. Such behavior was observed in various systems, including polymeric latices, inorganic colloids, asphaltenes, proteins, and, as shown by independent experiments in this work, in the flocculation of microalgae. This work aims at investigating possible mechanism to explain this phenomenon using detailed population balance modeling that incorporates refined rate models for aggregation and breakup of small particles in turbulence. Four mechanisms are considered: (1) restructuring, (2) decay of aggregate strength, (3) deposition of large clusters, and (4) primary particle aggregation where only aggregation events between clusters and primary particles are permitted. We show that all four mechanisms can lead to an overshoot in the mean size profile, while in contrast, aggregation and breakup alone lead to a monotonic, "S"-shaped size evolution profile. In order to distinguish between the different mechanisms simple protocols based on variations of the shear rate during the aggregation-breakup process are proposed. Copyright © 2018 Elsevier Inc. All rights reserved.
Applying Petri nets to modeling the chemical stage of radiobiological mechanism
NASA Astrophysics Data System (ADS)
Barilla, J.; Lokajíček, M.; Pisaková, H.; Simr, P.
2015-03-01
The chemical stage represents important part of radiological mechanism as double strand breaks of DNA molecules represent main damages leading to final biological effect. These breaks are formed mainly by water radicals arising in clusters formed by densely ionizing ends of primary or secondary charged particles in neighborhood of a DNA molecule. The given effect may be significantly influenced by other species present in water, which may depend on the size and diffusion of corresponding clusters. We have already proposed a model describing the corresponding process (i.e., the combined effect of cluster diffusion and chemical reactions) running in individual radical clusters and influencing the formation probability of main damages (i.e., DSBs). Now a full number of corresponding species will be considered. With the help of Continuous Petri nets it will then be possible to follow the time evolution of corresponding species in individual clusters, which might be important especially in the case of studying the biological effect of very low-LET radiation. The results in deoxygenated water will be presented; the ratio of final and initial contents of corresponding species being in good agreement with values established experimentally.
Constrained variation in Jastrow method at high density
DOE Office of Scientific and Technical Information (OSTI.GOV)
Owen, J.C.; Bishop, R.F.; Irvine, J.M.
1976-11-01
A method is derived for constraining the correlation function in a Jastrow variational calculation which permits the truncation of the cluster expansion after two-body terms, and which permits exact minimization of the two-body cluster by functional variation. This method is compared with one previously proposed by Pandharipande and is found to be superior both theoretically and practically. The method is tested both on liquid /sup 3/He, by using the Lennard--Jones potential, and on the model system of neutrons treated as Boltzmann particles (''homework'' problem). Good agreement is found both with experiment and with other calculations involving the explicit evaluation ofmore » higher-order terms in the cluster expansion. The method is then applied to a more realistic model of a neutron gas up to a density of 4 neutrons per F/sup 3/, and is found to give ground-state energies considerably lower than those of Pandharipande. (AIP)« less
Stochastic competitive learning in complex networks.
Silva, Thiago Christiano; Zhao, Liang
2012-03-01
Competitive learning is an important machine learning approach which is widely employed in artificial neural networks. In this paper, we present a rigorous definition of a new type of competitive learning scheme realized on large-scale networks. The model consists of several particles walking within the network and competing with each other to occupy as many nodes as possible, while attempting to reject intruder particles. The particle's walking rule is composed of a stochastic combination of random and preferential movements. The model has been applied to solve community detection and data clustering problems. Computer simulations reveal that the proposed technique presents high precision of community and cluster detections, as well as low computational complexity. Moreover, we have developed an efficient method for estimating the most likely number of clusters by using an evaluator index that monitors the information generated by the competition process itself. We hope this paper will provide an alternative way to the study of competitive learning..
Origin of the pre-tropical storm Debby (2006) African easterly wave-mesoscale convective system
NASA Astrophysics Data System (ADS)
Lin, Yuh-Lang; Liu, Liping; Tang, Guoqing; Spinks, James; Jones, Wilson
2013-05-01
The origins of the pre-Debby (2006) mesoscale convective system (MCS) and African easterly wave (AEW) and their precursors were traced back to the southwest Arabian Peninsula, Asir Mountains (AS), and Ethiopian Highlands (EH) in the vicinity of the ITCZ using satellite imagery, GFS analysis data and ARW model. The sources of the convective cloud clusters and vorticity perturbations were attributed to the cyclonic convergence of northeasterly Shamal wind and the Somali jet, especially when the Mediterranean High shifted toward east and the Indian Ocean high strengthened and its associated Somali jet penetrated farther to the north. The cyclonic vorticity perturbations were strengthened by the vorticity stretching associated with convective cloud clusters in the genesis region—southwest Arabian Peninsula. A conceptual model was proposed to explain the genesis of convective cloud clusters and cyclonic vorticity perturbations preceding the pre-Debby (2006) AEW-MCS system.
The Solar-Type Hard-Binary Frequency and Distributions of Orbital Parameters in the Open Cluster M37
NASA Astrophysics Data System (ADS)
Geller, Aaron M.; Meibom, Soren; Barnes, Sydney A.; Mathieu, Robert D.
2014-02-01
Binary stars, and particularly the short-period ``hard'' binaries, govern the dynamical evolution of star clusters and determine the formation rates and mechanisms for exotic stars like blue stragglers and X-ray sources. Understanding the near-primordial hard-binary population of star clusters is of primary importance for dynamical models of star clusters, which have the potential to greatly advance our understanding of star cluster evolution. Yet the binary frequencies and distributions of binary orbital parameters (period, eccentricity, etc.) for young coeval stellar populations are poorly known, due to a lack of necessary observations. The young (~540 Myr) open cluster M37 hosts a rich binary population that can be used to empirically define these initial conditions. Importantly, this cluster has been the target of a comprehensive WIYN/Hydra radial-velocity (RV) survey, from which we have already identified a nearly complete sample of 329 solar-type (1.5 <=M [M_⊙] <=1.0) members in M37. Of these stars, 82 show significant RV variability, indicative of a binary companion. We propose to build upon these data with a multi-epoch RV survey using WIYN/Hydra to derive kinematic orbital solutions for these 82 binaries in M37. This project was granted time in 2013B and scheduled for later this year. We anticipate that about half of the detected binaries in M37 will acquire enough RV measurements (>=10) in 2013B to begin searching for orbital solutions. With this proposal and perhaps one additional semester we should achieve >=10 RV measurements for the remaining binaries.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less
Exploring multicollinearity using a random matrix theory approach.
Feher, Kristen; Whelan, James; Müller, Samuel
2012-01-01
Clustering of gene expression data is often done with the latent aim of dimension reduction, by finding groups of genes that have a common response to potentially unknown stimuli. However, what is poorly understood to date is the behaviour of a low dimensional signal embedded in high dimensions. This paper introduces a multicollinear model which is based on random matrix theory results, and shows potential for the characterisation of a gene cluster's correlation matrix. This model projects a one dimensional signal into many dimensions and is based on the spiked covariance model, but rather characterises the behaviour of the corresponding correlation matrix. The eigenspectrum of the correlation matrix is empirically examined by simulation, under the addition of noise to the original signal. The simulation results are then used to propose a dimension estimation procedure of clusters from data. Moreover, the simulation results warn against considering pairwise correlations in isolation, as the model provides a mechanism whereby a pair of genes with `low' correlation may simply be due to the interaction of high dimension and noise. Instead, collective information about all the variables is given by the eigenspectrum.
NASA Astrophysics Data System (ADS)
Ren, Fei; Li, Sai-Ping; Liu, Chuang
2017-03-01
Recently, there is a growing interest in the modeling and simulation based on real social networks among researchers in multi-disciplines. Using an empirical social network constructed from the calling records of a Chinese mobile service provider, we here propose a new model to simulate the information spreading process. This model takes into account two important ingredients that exist in real human behaviors: information prevalence and preferential spreading. The fraction of informed nodes when the system reaches an asymptotically stable state is primarily determined by information prevalence, and the heterogeneity of link weights would slow down the information diffusion. Moreover, the sizes of blind clusters which consist of connected uninformed nodes show a power-law distribution, and these uninformed nodes correspond to a particular portion of nodes which are located at special positions in the network, namely at the edges of large clusters or inside the clusters connected through weak links. Since the simulations are performed on a real world network, the results should be useful in the understanding of the influences of social network structures and human behaviors on information propagation.
Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.
Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si
2017-07-01
Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.
MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks.
Keel, Brittney N; Deng, Bo; Moriyama, Etsuko N
2018-04-15
Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. emoriyama2@unl.edu. Supplementary data are available at Bioinformatics online.
Unsupervised classification of multivariate geostatistical data: Two algorithms
NASA Astrophysics Data System (ADS)
Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques
2015-12-01
With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.
Yang, Yang; Saleemi, Imran; Shah, Mubarak
2013-07-01
This paper proposes a novel representation of articulated human actions and gestures and facial expressions. The main goals of the proposed approach are: 1) to enable recognition using very few examples, i.e., one or k-shot learning, and 2) meaningful organization of unlabeled datasets by unsupervised clustering. Our proposed representation is obtained by automatically discovering high-level subactions or motion primitives, by hierarchical clustering of observed optical flow in four-dimensional, spatial, and motion flow space. The completely unsupervised proposed method, in contrast to state-of-the-art representations like bag of video words, provides a meaningful representation conducive to visual interpretation and textual labeling. Each primitive action depicts an atomic subaction, like directional motion of limb or torso, and is represented by a mixture of four-dimensional Gaussian distributions. For one--shot and k-shot learning, the sequence of primitive labels discovered in a test video are labeled using KL divergence, and can then be represented as a string and matched against similar strings of training videos. The same sequence can also be collapsed into a histogram of primitives or be used to learn a Hidden Markov model to represent classes. We have performed extensive experiments on recognition by one and k-shot learning as well as unsupervised action clustering on six human actions and gesture datasets, a composite dataset, and a database of facial expressions. These experiments confirm the validity and discriminative nature of the proposed representation.
An information model for use in software management estimation and prediction
NASA Technical Reports Server (NTRS)
Li, Ningda R.; Zelkowitz, Marvin V.
1993-01-01
This paper describes the use of cluster analysis for determining the information model within collected software engineering development data at the NASA/GSFC Software Engineering Laboratory. We describe the Software Management Environment tool that allows managers to predict development attributes during early phases of a software project and the modifications we propose to allow it to develop dynamic models for better predictions of these attributes.
Data Mining Technologies Inspired from Visual Principle
NASA Astrophysics Data System (ADS)
Xu, Zongben
In this talk we review the recent work done by our group on data mining (DM) technologies deduced from simulating visual principle. Through viewing a DM problem as a cognition problems and treading a data set as an image with each light point located at a datum position, we developed a series of high efficient algorithms for clustering, classification and regression via mimicking visual principles. In pattern recognition, human eyes seem to possess a singular aptitude to group objects and find important structure in an efficient way. Thus, a DM algorithm simulating visual system may solve some basic problems in DM research. From this point of view, we proposed a new approach for data clustering by modeling the blurring effect of lateral retinal interconnections based on scale space theory. In this approach, as the data image blurs, smaller light blobs merge into large ones until the whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process then generates a family of clustering along the hierarchy. The proposed approach provides unique solutions to many long standing problems, such as the cluster validity and the sensitivity to initialization problems, in clustering. We extended such an approach to classification and regression problems, through combatively employing the Weber's law in physiology and the cell response classification facts. The resultant classification and regression algorithms are proven to be very efficient and solve the problems of model selection and applicability to huge size of data set in DM technologies. We finally applied the similar idea to the difficult parameter setting problem in support vector machine (SVM). Viewing the parameter setting problem as a recognition problem of choosing a visual scale at which the global and local structures of a data set can be preserved, and the difference between the two structures be maximized in the feature space, we derived a direct parameter setting formula for the Gaussian SVM. The simulations and applications show that the suggested formula significantly outperforms the known model selection methods in terms of efficiency and precision.
Lin, Yu-Ching; Yu, Nan-Ying; Jiang, Ching-Fen; Chang, Shao-Hsia
2018-06-02
In this paper, we introduce a newly developed multi-scale wavelet model for the interpretation of surface electromyography (SEMG) signals and validate the model's capability to characterize changes in neuromuscular activation in cases with myofascial pain syndrome (MPS) via machine learning methods. The SEMG data collected from normal (N = 30; 27 women, 3 men) and MPS subjects (N = 26; 22 women, 4 men) were adopted for this retrospective analysis. SMEGs were measured from the taut-band loci on both sides of the trapezius muscle on the upper back while he/she conducted a cyclic bilateral backward shoulder extension movement within 1 min. Classification accuracy of the SEMG model to differentiate MPS patients from normal subjects was 77% using template matching and 60% using K-means clustering. Classification consistency between the two machine learning methods was 87% in the normal group and 93% in the MPS group. The 2D feature graphs derived from the proposed multi-scale model revealed distinct patterns between normal subjects and MPS patients. The classification consistency using template matching and K-means clustering suggests the potential of using the proposed model to characterize interference pattern changes induced by MPS. Copyright © 2018. Published by Elsevier Ltd.
Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla
2013-12-01
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
Clustering Coefficients for Correlation Networks.
Masuda, Naoki; Sakaki, Michiko; Ezaki, Takahiro; Watanabe, Takamitsu
2018-01-01
Graph theory is a useful tool for deciphering structural and functional networks of the brain on various spatial and temporal scales. The clustering coefficient quantifies the abundance of connected triangles in a network and is a major descriptive statistics of networks. For example, it finds an application in the assessment of small-worldness of brain networks, which is affected by attentional and cognitive conditions, age, psychiatric disorders and so forth. However, it remains unclear how the clustering coefficient should be measured in a correlation-based network, which is among major representations of brain networks. In the present article, we propose clustering coefficients tailored to correlation matrices. The key idea is to use three-way partial correlation or partial mutual information to measure the strength of the association between the two neighboring nodes of a focal node relative to the amount of pseudo-correlation expected from indirect paths between the nodes. Our method avoids the difficulties of previous applications of clustering coefficient (and other) measures in defining correlational networks, i.e., thresholding on the correlation value, discarding of negative correlation values, the pseudo-correlation problem and full partial correlation matrices whose estimation is computationally difficult. For proof of concept, we apply the proposed clustering coefficient measures to functional magnetic resonance imaging data obtained from healthy participants of various ages and compare them with conventional clustering coefficients. We show that the clustering coefficients decline with the age. The proposed clustering coefficients are more strongly correlated with age than the conventional ones are. We also show that the local variants of the proposed clustering coefficients (i.e., abundance of triangles around a focal node) are useful in characterizing individual nodes. In contrast, the conventional local clustering coefficients were strongly correlated with and therefore may be confounded by the node's connectivity. The proposed methods are expected to help us to understand clustering and lack thereof in correlational brain networks, such as those derived from functional time series and across-participant correlation in neuroanatomical properties.
Clustering Coefficients for Correlation Networks
Masuda, Naoki; Sakaki, Michiko; Ezaki, Takahiro; Watanabe, Takamitsu
2018-01-01
Graph theory is a useful tool for deciphering structural and functional networks of the brain on various spatial and temporal scales. The clustering coefficient quantifies the abundance of connected triangles in a network and is a major descriptive statistics of networks. For example, it finds an application in the assessment of small-worldness of brain networks, which is affected by attentional and cognitive conditions, age, psychiatric disorders and so forth. However, it remains unclear how the clustering coefficient should be measured in a correlation-based network, which is among major representations of brain networks. In the present article, we propose clustering coefficients tailored to correlation matrices. The key idea is to use three-way partial correlation or partial mutual information to measure the strength of the association between the two neighboring nodes of a focal node relative to the amount of pseudo-correlation expected from indirect paths between the nodes. Our method avoids the difficulties of previous applications of clustering coefficient (and other) measures in defining correlational networks, i.e., thresholding on the correlation value, discarding of negative correlation values, the pseudo-correlation problem and full partial correlation matrices whose estimation is computationally difficult. For proof of concept, we apply the proposed clustering coefficient measures to functional magnetic resonance imaging data obtained from healthy participants of various ages and compare them with conventional clustering coefficients. We show that the clustering coefficients decline with the age. The proposed clustering coefficients are more strongly correlated with age than the conventional ones are. We also show that the local variants of the proposed clustering coefficients (i.e., abundance of triangles around a focal node) are useful in characterizing individual nodes. In contrast, the conventional local clustering coefficients were strongly correlated with and therefore may be confounded by the node's connectivity. The proposed methods are expected to help us to understand clustering and lack thereof in correlational brain networks, such as those derived from functional time series and across-participant correlation in neuroanatomical properties. PMID:29599714
A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition
NASA Astrophysics Data System (ADS)
Oh, Yoo Rhee; Kim, Hong Kook
In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.
Zhang, Junfeng; Chen, Wei; Gao, Mingyi; Shen, Gangxiang
2017-10-30
In this work, we proposed two k-means-clustering-based algorithms to mitigate the fiber nonlinearity for 64-quadrature amplitude modulation (64-QAM) signal, the training-sequence assisted k-means algorithm and the blind k-means algorithm. We experimentally demonstrated the proposed k-means-clustering-based fiber nonlinearity mitigation techniques in 75-Gb/s 64-QAM coherent optical communication system. The proposed algorithms have reduced clustering complexity and low data redundancy and they are able to quickly find appropriate initial centroids and select correctly the centroids of the clusters to obtain the global optimal solutions for large k value. We measured the bit-error-ratio (BER) performance of 64-QAM signal with different launched powers into the 50-km single mode fiber and the proposed techniques can greatly mitigate the signal impairments caused by the amplified spontaneous emission noise and the fiber Kerr nonlinearity and improve the BER performance.
Decategorizing Teacher Preparation in Special Education.
ERIC Educational Resources Information Center
Stephens, Thomas M.; Joseph, Ellis A.
1982-01-01
The authors propose a model for preparing special education teachers which accepts those differences among students that require categorical decisions for instructional pruposes but not those categories existing as mere historical artifacts. Three noncategorical teacher training programs are described, and six clusters of teacher competencies are…
Data-driven heterogeneity in mathematical learning disabilities based on the triple code model.
Peake, Christian; Jiménez, Juan E; Rodríguez, Cristina
2017-12-01
Many classifications of heterogeneity in mathematical learning disabilities (MLD) have been proposed over the past four decades, however no empirical research has been conducted until recently, and none of the classifications are derived from Triple Code Model (TCM) postulates. The TCM proposes MLD as a heterogeneous disorder, with two distinguishable profiles: a representational subtype and a verbal subtype. A sample of elementary school 3rd to 6th graders was divided into two age cohorts (3rd - 4th grades, and 5th - 6th grades). Using data-driven strategies, based on the cognitive classification variables predicted by the TCM, our sample of children with MLD clustered as expected: a group with representational deficits and a group with number-fact retrieval deficits. In the younger group, a spatial subtype also emerged, while in both cohorts a non-specific cluster was produced whose profile could not be explained by this theoretical approach. Copyright © 2017 Elsevier Ltd. All rights reserved.
Simultaneous Two-Way Clustering of Multiple Correspondence Analysis
ERIC Educational Resources Information Center
Hwang, Heungsun; Dillon, William R.
2010-01-01
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…
Experimental Tests of the Algebraic Cluster Model
NASA Astrophysics Data System (ADS)
Gai, Moshe
2018-02-01
The Algebraic Cluster Model (ACM) of Bijker and Iachello that was proposed already in 2000 has been recently applied to 12C and 16O with much success. We review the current status in 12C with the outstanding observation of the ground state rotational band composed of the spin-parity states of: 0+, 2+, 3-, 4± and 5-. The observation of the 4± parity doublet is a characteristic of (tri-atomic) molecular configuration where the three alpha- particles are arranged in an equilateral triangular configuration of a symmetric spinning top. We discuss future measurement with electron scattering, 12C(e,e’) to test the predicted B(Eλ) of the ACM.
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Interactive Inverse Groundwater Modeling - Addressing User Fatigue
NASA Astrophysics Data System (ADS)
Singh, A.; Minsker, B. S.
2006-12-01
This paper builds on ongoing research on developing an interactive and multi-objective framework to solve the groundwater inverse problem. In this work we solve the classic groundwater inverse problem of estimating a spatially continuous conductivity field, given field measurements of hydraulic heads. The proposed framework is based on an interactive multi-objective genetic algorithm (IMOGA) that not only considers quantitative measures such as calibration error and degree of regularization, but also takes into account expert knowledge about the structure of the underlying conductivity field expressed as subjective rankings of potential conductivity fields by the expert. The IMOGA converges to the optimal Pareto front representing the best trade- off among the qualitative as well as quantitative objectives. However, since the IMOGA is a population-based iterative search it requires the user to evaluate hundreds of solutions. This leads to the problem of 'user fatigue'. We propose a two step methodology to combat user fatigue in such interactive systems. The first step is choosing only a few highly representative solutions to be shown to the expert for ranking. Spatial clustering is used to group the search space based on the similarity of the conductivity fields. Sampling is then carried out from different clusters to improve the diversity of solutions shown to the user. Once the expert has ranked representative solutions from each cluster a machine learning model is used to 'learn user preference' and extrapolate these for the solutions not ranked by the expert. We investigate different machine learning models such as Decision Trees, Bayesian learning model, and instance based weighting to model user preference. In addition, we also investigate ways to improve the performance of these models by providing information about the spatial structure of the conductivity fields (which is what the expert bases his or her rank on). Results are shown for each of these machine learning models and the advantages and disadvantages for each approach are discussed. These results indicate that using the proposed two-step methodology leads to significant reduction in user-fatigue without deteriorating the solution quality of the IMOGA.
NASA Astrophysics Data System (ADS)
Miyama, Masamichi J.; Hukushima, Koji
2018-04-01
A sparse modeling approach is proposed for analyzing scanning tunneling microscopy topography data, which contain numerous peaks originating from the electron density of surface atoms and/or impurities. The method, based on the relevance vector machine with L1 regularization and k-means clustering, enables separation of the peaks and peak center positioning with accuracy beyond the resolution of the measurement grid. The validity and efficiency of the proposed method are demonstrated using synthetic data in comparison with the conventional least-squares method. An application of the proposed method to experimental data of a metallic oxide thin-film clearly indicates the existence of defects and corresponding local lattice distortions.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.
Motion estimation in the frequency domain using fuzzy c-planes clustering.
Erdem, C E; Karabulut, G Z; Yanmaz, E; Anarim, E
2001-01-01
A recent work explicitly models the discontinuous motion estimation problem in the frequency domain where the motion parameters are estimated using a harmonic retrieval approach. The vertical and horizontal components of the motion are independently estimated from the locations of the peaks of respective periodogram analyses and they are paired to obtain the motion vectors using a procedure proposed. In this paper, we present a more efficient method that replaces the motion component pairing task and hence eliminates the problems of the pairing method described. The method described in this paper uses the fuzzy c-planes (FCP) clustering approach to fit planes to three-dimensional (3-D) frequency domain data obtained from the peaks of the periodograms. Experimental results are provided to demonstrate the effectiveness of the proposed method.
NASA Astrophysics Data System (ADS)
Ghaffarian, Saman; Ghaffarian, Salar
2014-11-01
This paper proposes an improved FastICA model named as Purposive FastICA (PFICA) with initializing by a simple color space transformation and a novel masking approach to automatically detect buildings from high resolution Google Earth imagery. ICA and FastICA algorithms are defined as Blind Source Separation (BSS) techniques for unmixing source signals using the reference data sets. In order to overcome the limitations of the ICA and FastICA algorithms and make them purposeful, we developed a novel method involving three main steps: 1-Improving the FastICA algorithm using Moore-Penrose pseudo inverse matrix model, 2-Automated seeding of the PFICA algorithm based on LUV color space and proposed simple rules to split image into three regions; shadow + vegetation, baresoil + roads and buildings, respectively, 3-Masking out the final building detection results from PFICA outputs utilizing the K-means clustering algorithm with two number of clusters and conducting simple morphological operations to remove noises. Evaluation of the results illustrates that buildings detected from dense and suburban districts with divers characteristics and color combinations using our proposed method have 88.6% and 85.5% overall pixel-based and object-based precision performances, respectively.
A space-time scan statistic for detecting emerging outbreaks.
Tango, Toshiro; Takahashi, Kunihiko; Kohriyama, Kazuaki
2011-03-01
As a major analytical method for outbreak detection, Kulldorff's space-time scan statistic (2001, Journal of the Royal Statistical Society, Series A 164, 61-72) has been implemented in many syndromic surveillance systems. Since, however, it is based on circular windows in space, it has difficulty correctly detecting actual noncircular clusters. Takahashi et al. (2008, International Journal of Health Geographics 7, 14) proposed a flexible space-time scan statistic with the capability of detecting noncircular areas. It seems to us, however, that the detection of the most likely cluster defined in these space-time scan statistics is not the same as the detection of localized emerging disease outbreaks because the former compares the observed number of cases with the conditional expected number of cases. In this article, we propose a new space-time scan statistic which compares the observed number of cases with the unconditional expected number of cases, takes a time-to-time variation of Poisson mean into account, and implements an outbreak model to capture localized emerging disease outbreaks more timely and correctly. The proposed models are illustrated with data from weekly surveillance of the number of absentees in primary schools in Kitakyushu-shi, Japan, 2006. © 2010, The International Biometric Society.
Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks
Mall, Raghvendra; Langone, Rocco; Suykens, Johan A. K.
2014-01-01
Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks. PMID:24949877
Mining subspace clusters from DNA microarray data using large itemset techniques.
Chang, Ye-In; Chen, Jiun-Rung; Tsai, Yueh-Chi
2009-05-01
Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this article, we propose the Large Itemset-Based Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the condition-pair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the condition-pair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct gene-pair MDSs.
Hichri, Echrak; Abriel, Hugues; Kucera, Jan P
2018-02-15
It has been proposed that ephaptic conduction, relying on interactions between the sodium (Na + ) current and the extracellular potential in intercalated discs, might contribute to cardiac conduction when gap junctional coupling is reduced, but this mechanism is still controversial. In intercalated discs, Na + channels form clusters near gap junction plaques, but the functional significance of these clusters has never been evaluated. In HEK cells expressing cardiac Na + channels, we show that restricting the extracellular space modulates the Na + current, as predicted by corresponding simulations accounting for ephaptic effects. In a high-resolution model of the intercalated disc, clusters of Na + channels that face each other across the intercellular cleft facilitate ephaptic impulse transmission when gap junctional coupling is reduced. Thus, our simulations reveal a functional role for the clustering of Na + channels in intercalated discs, and suggest that rearrangement of these clusters in disease may influence cardiac conduction. It has been proposed that ephaptic interactions in intercalated discs, mediated by extracellular potentials, contribute to cardiac impulse propagation when gap junctional coupling is reduced. However, experiments demonstrating ephaptic effects on the cardiac Na + current (I Na ) are scarce. Furthermore, Na + channels form clusters around gap junction plaques, but the electrophysiological significance of these clusters has never been investigated. In patch clamp experiments with HEK cells stably expressing human Na v 1.5 channels, we examined how restricting the extracellular space modulates I Na elicited by an activation protocol. In parallel, we developed a high-resolution computer model of the intercalated disc to investigate how the distribution of Na + channels influences ephaptic interactions. Approaching the HEK cells to a non-conducting obstacle always increased peak I Na at step potentials near the threshold of I Na activation and decreased peak I Na at step potentials far above threshold (7 cells, P = 0.0156, Wilcoxon signed rank test). These effects were consistent with corresponding control simulations with a uniform Na + channel distribution. In the intercalated disc computer model, redistributing the Na + channels into a central cluster of the disc potentiated ephaptic effects. Moreover, ephaptic impulse transmission from one cell to another was facilitated by clusters of Na + channels facing each other across the intercellular cleft when gap junctional coupling was reduced. In conclusion, our proof-of-principle experiments demonstrate that confining the extracellular space modulates cardiac I Na , and our simulations reveal the functional role of the aggregation of Na + channels in the perinexus. These findings highlight novel concepts in the physiology of cardiac excitation. © 2017 The Authors. The Journal of Physiology © 2017 The Physiological Society.
NASA Astrophysics Data System (ADS)
Antenucci, F.; Crisanti, A.; Leuzzi, L.
2014-07-01
The Ising and Blume-Emery-Griffiths (BEG) models' critical behavior is analyzed in two dimensions and three dimensions by means of a renormalization group scheme on small clusters made of a few lattice cells. Different kinds of cells are proposed for both ordered and disordered model cases. In particular, cells preserving a possible antiferromagnetic ordering under renormalization allow for the determination of the Néel critical point and its scaling indices. These also provide more reliable estimates of the Curie fixed point than those obtained using cells preserving only the ferromagnetic ordering. In all studied dimensions, the present procedure does not yield a strong-disorder critical point corresponding to the transition to the spin-glass phase. This limitation is thoroughly analyzed and motivated.
NASA Astrophysics Data System (ADS)
Dreano, Denis; Tsiaras, Kostas; Triantafyllou, George; Hoteit, Ibrahim
2017-07-01
Forecasting the state of large marine ecosystems is important for many economic and public health applications. However, advanced three-dimensional (3D) ecosystem models, such as the European Regional Seas Ecosystem Model (ERSEM), are computationally expensive, especially when implemented within an ensemble data assimilation system requiring several parallel integrations. As an alternative to 3D ecological forecasting systems, we propose to implement a set of regional one-dimensional (1D) water-column ecological models that run at a fraction of the computational cost. The 1D model domains are determined using a Gaussian mixture model (GMM)-based clustering method and satellite chlorophyll-a (Chl-a) data. Regionally averaged Chl-a data is assimilated into the 1D models using the singular evolutive interpolated Kalman (SEIK) filter. To laterally exchange information between subregions and improve the forecasting skills, we introduce a new correction step to the assimilation scheme, in which we assimilate a statistical forecast of future Chl-a observations based on information from neighbouring regions. We apply this approach to the Red Sea and show that the assimilative 1D ecological models can forecast surface Chl-a concentration with high accuracy. The statistical assimilation step further improves the forecasting skill by as much as 50%. This general approach of clustering large marine areas and running several interacting 1D ecological models is very flexible. It allows many combinations of clustering, filtering and regression technics to be used and can be applied to build efficient forecasting systems in other large marine ecosystems.
Hough transform for clustered microcalcifications detection in full-field digital mammograms
NASA Astrophysics Data System (ADS)
Fanizzi, A.; Basile, T. M. A.; Losurdo, L.; Amoroso, N.; Bellotti, R.; Bottigli, U.; Dentamaro, R.; Didonna, V.; Fausto, A.; Massafra, R.; Moschetta, M.; Tamborra, P.; Tangaro, S.; La Forgia, D.
2017-09-01
Many screening programs use mammography as principal diagnostic tool for detecting breast cancer at a very early stage. Despite the efficacy of the mammograms in highlighting breast diseases, the detection of some lesions is still doubtless for radiologists. In particular, the extremely minute and elongated salt-like particles of microcalcifications are sometimes no larger than 0.1 mm and represent approximately half of all cancer detected by means of mammograms. Hence the need for automatic tools able to support radiologists in their work. Here, we propose a computer assisted diagnostic tool to support radiologists in identifying microcalcifications in full (native) digital mammographic images. The proposed CAD system consists of a pre-processing step, that improves contrast and reduces noise by applying Sobel edge detection algorithm and Gaussian filter, followed by a microcalcification detection step performed by exploiting the circular Hough transform. The procedure performance was tested on 200 images coming from the Breast Cancer Digital Repository (BCDR), a publicly available database. The automatically detected clusters of microcalcifications were evaluated by skilled radiologists which asses the validity of the correctly identified regions of interest as well as the system error in case of missed clustered microcalcifications. The system performance was evaluated in terms of Sensitivity and False Positives per images (FPi) rate resulting comparable to the state-of-art approaches. The proposed model was able to accurately predict the microcalcification clusters obtaining performances (sensibility = 91.78% and FPi rate = 3.99) which favorably compare to other state-of-the-art approaches.
Special and general superatoms.
Luo, Zhixun; Castleman, A Welford
2014-10-21
Bridging the gap between atoms and macroscopic matter, clusters continue to be a subject of increasing research interest. Among the realm of cluster investigations, an exciting development is the realization that chosen stable clusters can mimic the chemical behavior of an atom or a group of the periodic table of elements. This major finding known as a superatom concept was originated experimentally from the study of aluminum cluster reactivity conducted in 1989 by noting a dramatic size dependence of the reactivity where cluster anions containing a certain number of Al atoms were unreactive toward oxygen while the other species were etched away. This observation was well interpreted by shell closings on the basis of the jellium model, and the related concept (originally termed "unified atom") spawned a wide range of pioneering studies in the 1990s pertaining to the understanding of factors governing the properties of clusters. Under the inspiration of a superatom concept, advances in cluster science in finding stable species not only shed light on magic clusters (i.e., superatomic noble gas) but also enlightened the exploration of stable clusters to mimic the chemical behavior of atoms leading to the discovery of superhalogens, alkaline-earth metals, superalkalis, etc. Among them, certain clusters could enable isovalent isomorphism of precious metals, indicating application potential for inexpensive superatoms for industrial catalysis, while a few superalkalis were found to validate the interesting "harpoon mechanism" involved in the superatomic cluster reactivity; recently also found were the magnetic superatoms of which the cluster-assembled materials could be used in spin electronics. Up to now, extensive studies in cluster science have allowed the stability of superatomic clusters to be understood within a few models, including the jellium model, also aromaticity and Wade-Mingos rules depending on the geometry and metallicity of the cluster. However, the scope of application of the jellium model and modification of the theory to account for nonspherical symmetry and nonmetal-doped metal clusters are still illusive to be further developed. It is still worth mentioning that a superatom concept has also been introduced in ligand-stabilized metal clusters which could also follow the major shell-closing electron count for a spherical, square-well potential. By proposing a new concept named as special and general superatoms, herein we try to summarize all these investigations in series, expecting to provide an overview of this field with a primary focus on the joint undertakings which have given rise to the superatom concept. To be specific, for special superatoms, we limit to clusters under a strict jellium model and simply classify them into groups based on their valence electron counts. While for general superatoms we emphasize on nonmetal-doped metal clusters and ligand-stabilized metal clusters, as well as a few isovalent cluster systems. Hopefully this summary of special and general superatoms benefits the further development of cluster-related theory, and lights up the prospect of using them as building blocks of new materials with tailored properties, such as inexpensive isovalent systems for industrial catalysis, semiconductive superatoms for transistors, and magnetic superatoms for spin electronics.
Analysis and Research on Spatial Data Storage Model Based on Cloud Computing Platform
NASA Astrophysics Data System (ADS)
Hu, Yong
2017-12-01
In this paper, the data processing and storage characteristics of cloud computing are analyzed and studied. On this basis, a cloud computing data storage model based on BP neural network is proposed. In this data storage model, it can carry out the choice of server cluster according to the different attributes of the data, so as to complete the spatial data storage model with load balancing function, and have certain feasibility and application advantages.
Short-term Power Load Forecasting Based on Balanced KNN
NASA Astrophysics Data System (ADS)
Lv, Xianlong; Cheng, Xingong; YanShuang; Tang, Yan-mei
2018-03-01
To improve the accuracy of load forecasting, a short-term load forecasting model based on balanced KNN algorithm is proposed; According to the load characteristics, the historical data of massive power load are divided into scenes by the K-means algorithm; In view of unbalanced load scenes, the balanced KNN algorithm is proposed to classify the scene accurately; The local weighted linear regression algorithm is used to fitting and predict the load; Adopting the Apache Hadoop programming framework of cloud computing, the proposed algorithm model is parallelized and improved to enhance its ability of dealing with massive and high-dimension data. The analysis of the household electricity consumption data for a residential district is done by 23-nodes cloud computing cluster, and experimental results show that the load forecasting accuracy and execution time by the proposed model are the better than those of traditional forecasting algorithm.
Mean-cluster approach indicates cell sorting time scales are determined by collective dynamics
NASA Astrophysics Data System (ADS)
Beatrici, Carine P.; de Almeida, Rita M. C.; Brunnet, Leonardo G.
2017-03-01
Cell migration is essential to cell segregation, playing a central role in tissue formation, wound healing, and tumor evolution. Considering random mixtures of two cell types, it is still not clear which cell characteristics define clustering time scales. The mass of diffusing clusters merging with one another is expected to grow as td /d +2 when the diffusion constant scales with the inverse of the cluster mass. Cell segregation experiments deviate from that behavior. Explanations for that could arise from specific microscopic mechanisms or from collective effects, typical of active matter. Here we consider a power law connecting diffusion constant and cluster mass to propose an analytic approach to model cell segregation where we explicitly take into account finite-size corrections. The results are compared with active matter model simulations and experiments available in the literature. To investigate the role played by different mechanisms we considered different hypotheses describing cell-cell interaction: differential adhesion hypothesis and different velocities hypothesis. We find that the simulations yield normal diffusion for long time intervals. Analytic and simulation results show that (i) cluster evolution clearly tends to a scaling regime, disrupted only at finite-size limits; (ii) cluster diffusion is greatly enhanced by cell collective behavior, such that for high enough tendency to follow the neighbors, cluster diffusion may become independent of cluster size; (iii) the scaling exponent for cluster growth depends only on the mass-diffusion relation, not on the detailed local segregation mechanism. These results apply for active matter systems in general and, in particular, the mechanisms found underlying the increase in cell sorting speed certainly have deep implications in biological evolution as a selection mechanism.
Kurczynska, Monika
2018-01-01
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters–the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models. PMID:29787567
Multiconstrained gene clustering based on generalized projections
2010-01-01
Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386
Wang, Juan; Nishikawa, Robert M; Yang, Yongyi
2016-01-01
In computer-aided detection of microcalcifications (MCs), the detection accuracy is often compromised by frequent occurrence of false positives (FPs), which can be attributed to a number of factors, including imaging noise, inhomogeneity in tissue background, linear structures, and artifacts in mammograms. In this study, the authors investigated a unified classification approach for combating the adverse effects of these heterogeneous factors for accurate MC detection. To accommodate FPs caused by different factors in a mammogram image, the authors developed a classification model to which the input features were adapted according to the image context at a detection location. For this purpose, the input features were defined in two groups, of which one group was derived from the image intensity pattern in a local neighborhood of a detection location, and the other group was used to characterize how a MC is different from its structural background. Owing to the distinctive effect of linear structures in the detector response, the authors introduced a dummy variable into the unified classifier model, which allowed the input features to be adapted according to the image context at a detection location (i.e., presence or absence of linear structures). To suppress the effect of inhomogeneity in tissue background, the input features were extracted from different domains aimed for enhancing MCs in a mammogram image. To demonstrate the flexibility of the proposed approach, the authors implemented the unified classifier model by two widely used machine learning algorithms, namely, a support vector machine (SVM) classifier and an Adaboost classifier. In the experiment, the proposed approach was tested for two representative MC detectors in the literature [difference-of-Gaussians (DoG) detector and SVM detector]. The detection performance was assessed using free-response receiver operating characteristic (FROC) analysis on a set of 141 screen-film mammogram (SFM) images (66 cases) and a set of 188 full-field digital mammogram (FFDM) images (95 cases). The FROC analysis results show that the proposed unified classification approach can significantly improve the detection accuracy of two MC detectors on both SFM and FFDM images. Despite the difference in performance between the two detectors, the unified classifiers can reduce their FP rate to a similar level in the output of the two detectors. In particular, with true-positive rate at 85%, the FP rate on SFM images for the DoG detector was reduced from 1.16 to 0.33 clusters/image (unified SVM) and 0.36 clusters/image (unified Adaboost), respectively; similarly, for the SVM detector, the FP rate was reduced from 0.45 clusters/image to 0.30 clusters/image (unified SVM) and 0.25 clusters/image (unified Adaboost), respectively. Similar FP reduction results were also achieved on FFDM images for the two MC detectors. The proposed unified classification approach can be effective for discriminating MCs from FPs caused by different factors (such as MC-like noise patterns and linear structures) in MC detection. The framework is general and can be applicable for further improving the detection accuracy of existing MC detectors.
Interactive visual exploration and analysis of origin-destination data
NASA Astrophysics Data System (ADS)
Ding, Linfang; Meng, Liqiu; Yang, Jian; Krisp, Jukka M.
2018-05-01
In this paper, we propose a visual analytics approach for the exploration of spatiotemporal interaction patterns of massive origin-destination data. Firstly, we visually query the movement database for data at certain time windows. Secondly, we conduct interactive clustering to allow the users to select input variables/features (e.g., origins, destinations, distance, and duration) and to adjust clustering parameters (e.g. distance threshold). The agglomerative hierarchical clustering method is applied for the multivariate clustering of the origin-destination data. Thirdly, we design a parallel coordinates plot for visualizing the precomputed clusters and for further exploration of interesting clusters. Finally, we propose a gradient line rendering technique to show the spatial and directional distribution of origin-destination clusters on a map view. We implement the visual analytics approach in a web-based interactive environment and apply it to real-world floating car data from Shanghai. The experiment results show the origin/destination hotspots and their spatial interaction patterns. They also demonstrate the effectiveness of our proposed approach.