Zhang, Bo; Liu, Wei; Zhang, Zhiwei; Qu, Yanping; Chen, Zhen; Albert, Paul S
2017-08-01
Joint modeling and within-cluster resampling are two approaches that are used for analyzing correlated data with informative cluster sizes. Motivated by a developmental toxicity study, we examined the performances and validity of these two approaches in testing covariate effects in generalized linear mixed-effects models. We show that the joint modeling approach is robust to the misspecification of cluster size models in terms of Type I and Type II errors when the corresponding covariates are not included in the random effects structure; otherwise, statistical tests may be affected. We also evaluate the performance of the within-cluster resampling procedure and thoroughly investigate the validity of it in modeling correlated data with informative cluster sizes. We show that within-cluster resampling is a valid alternative to joint modeling for cluster-specific covariates, but it is invalid for time-dependent covariates. The two methods are applied to a developmental toxicity study that investigated the effect of exposure to diethylene glycol dimethyl ether.
MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS
Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...
Configurational coupled cluster approach with applications to magnetic model systems
NASA Astrophysics Data System (ADS)
Wu, Siyuan; Nooijen, Marcel
2018-05-01
A general exponential, coupled cluster like, approach is discussed to extract an effective Hamiltonian in configurational space, as a sum of 1-body, 2-body up to n-body operators. The simplest two-body approach is illustrated by calculations on simple magnetic model systems. A key feature of the approach is that equations up to a certain rank do not depend on higher body cluster operators.
Johnson, Jacqueline L; Kreidler, Sarah M; Catellier, Diane J; Murray, David M; Muller, Keith E; Glueck, Deborah H
2015-11-30
We used theoretical and simulation-based approaches to study Type I error rates for one-stage and two-stage analytic methods for cluster-randomized designs. The one-stage approach uses the observed data as outcomes and accounts for within-cluster correlation using a general linear mixed model. The two-stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one-stage and two-stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one-stage and six two-stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two-stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one-stage model with Kenward-Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one-stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster-randomized trials, the Kenward-Roger method is the preferred one-stage approach. Copyright © 2015 John Wiley & Sons, Ltd.
Recent developments of the quantum chemical cluster approach for modeling enzyme reactions.
Siegbahn, Per E M; Himo, Fahmi
2009-06-01
The quantum chemical cluster approach for modeling enzyme reactions is reviewed. Recent applications have used cluster models much larger than before which have given new modeling insights. One important and rather surprising feature is the fast convergence with cluster size of the energetics of the reactions. Even for reactions with significant charge separation it has in some cases been possible to obtain full convergence in the sense that dielectric cavity effects from outside the cluster do not contribute to any significant extent. Direct comparisons between quantum mechanics (QM)-only and QM/molecular mechanics (MM) calculations for quite large clusters in a case where the results differ significantly have shown that care has to be taken when using the QM/MM approach where there is strong charge polarization. Insights from the methods used, generally hybrid density functional methods, have also led to possibilities to give reasonable error limits for the results. Examples are finally given from the most extensive study using the cluster model, the one of oxygen formation at the oxygen-evolving complex in photosystem II.
Mixture modelling for cluster analysis.
McLachlan, G J; Chang, S U
2004-10-01
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach. PMID:25794172
Chaos theory perspective for industry clusters development
NASA Astrophysics Data System (ADS)
Yu, Haiying; Jiang, Minghui; Li, Chengzhang
2016-03-01
Industry clusters have outperformed in economic development in most developing countries. The contributions of industrial clusters have been recognized as promotion of regional business and the alleviation of economic and social costs. It is no doubt globalization is rendering clusters in accelerating the competitiveness of economic activities. In accordance, many ideas and concepts involve in illustrating evolution tendency, stimulating the clusters development, meanwhile, avoiding industrial clusters recession. The term chaos theory is introduced to explain inherent relationship of features within industry clusters. A preferred life cycle approach is proposed for industrial cluster recessive theory analysis. Lyapunov exponents and Wolf model are presented for chaotic identification and examination. A case study of Tianjin, China has verified the model effectiveness. The investigations indicate that the approaches outperform in explaining chaos properties in industrial clusters, which demonstrates industrial clusters evolution, solves empirical issues and generates corresponding strategies.
Estimation of Carcinogenicity using Hierarchical Clustering and Nearest Neighbor Methodologies
Previously a hierarchical clustering (HC) approach and a nearest neighbor (NN) approach were developed to model acute aquatic toxicity end points. These approaches were developed to correlate the toxicity for large, noncongeneric data sets. In this study these approaches applie...
Xu, Peng; Gordon, Mark S
2014-09-04
Anionic water clusters are generally considered to be extremely challenging to model using fragmentation approaches due to the diffuse nature of the excess electron distribution. The local correlation coupled cluster (CC) framework cluster-in-molecule (CIM) approach combined with the completely renormalized CR-CC(2,3) method [abbreviated CIM/CR-CC(2,3)] is shown to be a viable alternative for computing the vertical electron binding energies (VEBE). CIM/CR-CC(2,3) with the threshold parameter ζ set to 0.001, as a trade-off between accuracy and computational cost, demonstrates the reliability of predicting the VEBE, with an average percentage error of ∼15% compared to the full ab initio calculation at the same level of theory. The errors are predominantly from the electron correlation energy. The CIM/CR-CC(2,3) approach provides the ease of a black-box type calculation with few threshold parameters to manipulate. The cluster sizes that can be studied by high-level ab initio methods are significantly increased in comparison with full CC calculations. Therefore, the VEBE computed by the CIM/CR-CC(2,3) method can be used as benchmarks for testing model potential approaches in small-to-intermediate-sized water clusters.
Ng, Edmond S-W; Diaz-Ordaz, Karla; Grieve, Richard; Nixon, Richard M; Thompson, Simon G; Carpenter, James R
2016-10-01
Multilevel models provide a flexible modelling framework for cost-effectiveness analyses that use cluster randomised trial data. However, there is a lack of guidance on how to choose the most appropriate multilevel models. This paper illustrates an approach for deciding what level of model complexity is warranted; in particular how best to accommodate complex variance-covariance structures, right-skewed costs and missing data. Our proposed models differ according to whether or not they allow individual-level variances and correlations to differ across treatment arms or clusters and by the assumed cost distribution (Normal, Gamma, Inverse Gaussian). The models are fitted by Markov chain Monte Carlo methods. Our approach to model choice is based on four main criteria: the characteristics of the data, model pre-specification informed by the previous literature, diagnostic plots and assessment of model appropriateness. This is illustrated by re-analysing a previous cost-effectiveness analysis that uses data from a cluster randomised trial. We find that the most useful criterion for model choice was the deviance information criterion, which distinguishes amongst models with alternative variance-covariance structures, as well as between those with different cost distributions. This strategy for model choice can help cost-effectiveness analyses provide reliable inferences for policy-making when using cluster trials, including those with missing data. © The Author(s) 2013.
Boyack, Kevin W.; Newman, David; Duhon, Russell J.; Klavans, Richard; Patek, Michael; Biberstine, Joseph R.; Schijvenaars, Bob; Skupin, André; Ma, Nianli; Börner, Katy
2011-01-01
Background We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. Methodology We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE. Conclusions PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts. PMID:21437291
Chiara, Matteo; Horner, David S; Spada, Alberto
2013-01-01
De novo transcriptome characterization from Next Generation Sequencing data has become an important approach in the study of non-model plants. Despite notable advances in the assembly of short reads, the clustering of transcripts into unigene-like (locus-specific) clusters remains a somewhat neglected subject. Indeed, closely related paralogous transcripts are often merged into single clusters by current approaches. Here, a novel heuristic method for locus-specific clustering is compared to that implemented in the de novo assembler Oases, using the same initial transcript collections, derived from Arabidopsis thaliana and the developmental model Streptocarpus rexii. We show that the proposed approach improves cluster specificity in the A. thaliana dataset for which the reference genome is available. Furthermore, for the S. rexii data our filtered transcript collection matches a larger number of distinct annotated loci in reference genomes than the Oases set, while containing a reduced overall number of loci. A detailed discussion of advantages and limitations of our approach in processing de novo transcriptome reconstructions is presented. The proposed method should be widely applicable to other organisms, irrespective of the transcript assembly method employed. The S. rexii transcriptome is available as a sophisticated and augmented publicly available online database.
Garcia, Danilo; MacDonald, Shane; Archer, Trevor
2015-01-01
Background. The notion of the affective system as being composed of two dimensions led Archer and colleagues to the development of the affective profiles model. The model consists of four different profiles based on combinations of individuals' experience of high/low positive and negative affect: self-fulfilling, low affective, high affective, and self-destructive. During the past 10 years, an increasing number of studies have used this person-centered model as the backdrop for the investigation of between and within individual differences in ill-being and well-being. The most common approach to this profiling is by dividing individuals' scores of self-reported affect using the median of the population as reference for high/low splits. However, scores just-above and just-below the median might become high and low by arbitrariness, not by reality. Thus, it is plausible to criticize the validity of this variable-oriented approach. Our aim was to compare the median splits approach with a person-oriented approach, namely, cluster analysis. Method. The participants (N = 2, 225) were recruited through Amazons' Mechanical Turk and asked to self-report affect using the Positive Affect Negative Affect Schedule. We compared the profiles' homogeneity and Silhouette coefficients to discern differences in homogeneity and heterogeneity between approaches. We also conducted exact cell-wise analyses matching the profiles from both approaches and matching profiles and gender to investigate profiling agreement with respect to affectivity levels and affectivity and gender. All analyses were conducted using the ROPstat software. Results. The cluster approach (weighted average of cluster homogeneity coefficients = 0.62, Silhouette coefficients = 0.68) generated profiles with greater homogeneity and more distinctive from each other compared to the median splits approach (weighted average of cluster homogeneity coefficients = 0.75, Silhouette coefficients = 0.59). Most of the participants (n = 1,736, 78.0%) were allocated to the same profile (Rand Index = .83), however, 489 (21.98%) were allocated to different profiles depending on the approach. Both approaches allocated females and males similarly in three of the four profiles. Only the cluster analysis approach classified men significantly more often than chance to a self-fulfilling profile (type) and females less often than chance to this very same profile (antitype). Conclusions. Although the question whether one approach is more appropriate than the other is still without answer, the cluster method allocated individuals to profiles that are more in accordance with the conceptual basis of the model and also to expected gender differences. More importantly, regardless of the approach, our findings suggest that the model mirrors a complex and dynamic adaptive system.
Arpino, Bruno; Cannas, Massimo
2016-05-30
This article focuses on the implementation of propensity score matching for clustered data. Different approaches to reduce bias due to cluster-level confounders are considered and compared using Monte Carlo simulations. We investigated methods that exploit the clustered structure of the data in two ways: in the estimation of the propensity score model (through the inclusion of fixed or random effects) or in the implementation of the matching algorithm. In addition to a pure within-cluster matching, we also assessed the performance of a new approach, 'preferential' within-cluster matching. This approach first searches for control units to be matched to treated units within the same cluster. If matching is not possible within-cluster, then the algorithm searches in other clusters. All considered approaches successfully reduced the bias due to the omission of a cluster-level confounder. The preferential within-cluster matching approach, combining the advantages of within-cluster and between-cluster matching, showed a relatively good performance both in the presence of big and small clusters, and it was often the best method. An important advantage of this approach is that it reduces the number of unmatched units as compared with a pure within-cluster matching. We applied these methods to the estimation of the effect of caesarean section on the Apgar score using birth register data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
The "p"-Median Model as a Tool for Clustering Psychological Data
ERIC Educational Resources Information Center
Kohn, Hans-Friedrich; Steinley, Douglas; Brusco, Michael J.
2010-01-01
The "p"-median clustering model represents a combinatorial approach to partition data sets into disjoint, nonhierarchical groups. Object classes are constructed around "exemplars", that is, manifest objects in the data set, with the remaining instances assigned to their closest cluster centers. Effective, state-of-the-art implementations of…
Pfeiffenberger, Erik; Chaleil, Raphael A.G.; Moal, Iain H.
2017-01-01
ABSTRACT Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc. PMID:27935158
On aggregation in CA models in biology
NASA Astrophysics Data System (ADS)
Alber, Mark S.; Kiskowski, Audi
2001-12-01
Aggregation of randomly distributed particles into clusters of aligned particles is modeled using a cellular automata (CA) approach. The CA model accounts for interactions between more than one type of particle, in which pressures for angular alignment with neighbors compete with pressures for grouping by cell type. In the case of only one particle type clusters tend to unite into one big cluster. In the case of several types of particles the dynamics of clusters is more complicated and for specific choices of parameters particle sorting occurs simultaneously with the formation of clusters of aligned particles.
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.
Sun, Zhe; Wang, Ting; Deng, Ke; Wang, Xiao-Feng; Lafyatis, Robert; Ding, Ying; Hu, Ming; Chen, Wei
2018-01-01
Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods. DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html. wei.chen@chp.edu or hum@ccf.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
Accelerating Information Retrieval from Profile Hidden Markov Model Databases.
Tamimi, Ahmad; Ashhab, Yaqoub; Tamimi, Hashem
2016-01-01
Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.
A roadmap of clustering algorithms: finding a match for a biomedical application.
Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael
2009-05-01
Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.
Light clusters in nuclear matter: Excluded volume versus quantum many-body approaches
NASA Astrophysics Data System (ADS)
Hempel, Matthias; Schaffner-Bielich, Jürgen; Typel, Stefan; Röpke, Gerd
2011-11-01
The formation of clusters in nuclear matter is investigated, which occurs, e.g., in low-energy heavy-ion collisions or core-collapse supernovae. In astrophysical applications, the excluded volume concept is commonly used for the description of light clusters. Here we compare a phenomenological excluded volume approach to two quantum many-body models, the quantum statistical model and the generalized relativistic mean-field model. All three models contain bound states of nuclei with mass number A≤4. It is explored to which extent the complex medium effects can be mimicked by the simpler excluded volume model, regarding the chemical composition and thermodynamic variables. Furthermore, the role of heavy nuclei and excited states is investigated by use of the excluded volume model. At temperatures of a few MeV the excluded volume model gives a poor description of the medium effects on the light clusters, but there the composition is actually dominated by heavy nuclei. At larger temperatures there is a rather good agreement, whereas some smaller differences and model dependencies remain.
A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set
Peng, Yi; Zhang, Yong; Kou, Gang; Shi, Yong
2012-01-01
Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm–k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study. PMID:22870181
Master-equation approach to the study of phase-change processes in data storage media
NASA Astrophysics Data System (ADS)
Blyuss, K. B.; Ashwin, P.; Bassom, A. P.; Wright, C. D.
2005-07-01
We study the dynamics of crystallization in phase-change materials using a master-equation approach in which the state of the crystallizing material is described by a cluster size distribution function. A model is developed using the thermodynamics of the processes involved and representing the clusters of size two and greater as a continuum but clusters of size one (monomers) as a separate equation. We present some partial analytical results for the isothermal case and for large cluster sizes, but principally we use numerical simulations to investigate the model. We obtain results that are in good agreement with experimental data and the model appears to be useful for the fast simulation of reading and writing processes in phase-change optical and electrical memories.
Industry Cluster's Adaptive Co-competition Behavior Modeling Inspired by Swarm Intelligence
NASA Astrophysics Data System (ADS)
Xiang, Wei; Ye, Feifan
Adaptation helps the individual enterprise to adjust its behavior to uncertainties in environment and hence determines a healthy growth of both the individuals and the whole industry cluster as well. This paper is focused on the study on co-competition adaptation behavior of industry cluster, which is inspired by swarm intelligence mechanisms. By referencing to ant cooperative transportation and ant foraging behavior and their related swarm intelligence approaches, the cooperative adaptation and competitive adaptation behavior are studied and relevant models are proposed. Those adaptive co-competition behaviors model can be integrated to the multi-agent system of industry cluster to make the industry cluster model more realistic.
A mixture model-based approach to the clustering of microarray expression data.
McLachlan, G J; Bean, R W; Peel, D
2002-03-01
This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/
Model-based Clustering of Categorical Time Series with Multinomial Logit Classification
NASA Astrophysics Data System (ADS)
Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea
2010-09-01
A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.
Ahn, Kwang Woo; Kosoy, Michael; Chan, Kung-Sik
2014-06-01
We developed a two-strain susceptible-infected-recovered (SIR) model that provides a framework for inferring the cross-immunity between two strains of a bacterial species in the host population with discretely sampled co-infection time-series data. Moreover, the model accounts for seasonality in host reproduction. We illustrate an approach using a dataset describing co-infections by several strains of bacteria circulating within a population of cotton rats (Sigmodon hispidus). Bartonella strains were clustered into three genetically close groups, between which the divergence is correspondent to the accepted level of separate bacterial species. The proposed approach revealed no cross-immunity between genetic clusters while limited cross-immunity might exist between subgroups within the clusters. Copyright © 2014. Published by Elsevier B.V.
Jeon, Jihyoun; Hsu, Li; Gorfine, Malka
2012-07-01
Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.
A New Approach for Simulating Galaxy Cluster Properties
NASA Astrophysics Data System (ADS)
Arieli, Y.; Rephaeli, Y.; Norman, M. L.
2008-08-01
We describe a subgrid model for including galaxies into hydrodynamical cosmological simulations of galaxy cluster evolution. Each galaxy construct—or galcon—is modeled as a physically extended object within which star formation, galactic winds, and ram pressure stripping of gas are modeled analytically. Galcons are initialized at high redshift (z ~ 3) after galaxy dark matter halos have formed but before the cluster has virialized. Each galcon moves self-consistently within the evolving cluster potential and injects mass, metals, and energy into intracluster (IC) gas through a well-resolved spherical interface layer. We have implemented galcons into the Enzo adaptive mesh refinement code and carried out a simulation of cluster formation in a ΛCDM universe. With our approach, we are able to economically follow the impact of a large number of galaxies on IC gas. We compare the results of the galcon simulation with a second, more standard simulation where star formation and feedback are treated using a popular heuristic prescription. One advantage of the galcon approach is explicit control over the star formation history of cluster galaxies. Using a galactic SFR derived from the cosmic star formation density, we find the galcon simulation produces a lower stellar fraction, a larger gas core radius, a more isothermal temperature profile, and a flatter metallicity gradient than the standard simulation, in better agreement with observations.
The cosmological analysis of X-ray cluster surveys. III. 4D X-ray observable diagrams
NASA Astrophysics Data System (ADS)
Pierre, M.; Valotti, A.; Faccioli, L.; Clerc, N.; Gastaud, R.; Koulouridis, E.; Pacaud, F.
2017-11-01
Context. Despite compelling theoretical arguments, the use of clusters as cosmological probes is, in practice, frequently questioned because of the many uncertainties surrounding cluster-mass estimates. Aims: Our aim is to develop a fully self-consistent cosmological approach of X-ray cluster surveys, exclusively based on observable quantities rather than masses. This procedure is justified given the possibility to directly derive the cluster properties via ab initio modelling, either analytically or by using hydrodynamical simulations. In this third paper, we evaluate the method on cluster toy-catalogues. Methods: We model the population of detected clusters in the count-rate - hardness-ratio - angular size - redshift space and compare the corresponding four-dimensional diagram with theoretical predictions. The best cosmology+physics parameter configuration is determined using a simple minimisation procedure; errors on the parameters are estimated by averaging the results from ten independent survey realisations. The method allows a simultaneous fit of the cosmological parameters of the cluster evolutionary physics and of the selection effects. Results: When using information from the X-ray survey alone plus redshifts, this approach is shown to be as accurate as the modelling of the mass function for the cosmological parameters and to perform better for the cluster physics, for a similar level of assumptions on the scaling relations. It enables the identification of degenerate combinations of parameter values. Conclusions: Given the considerably shorter computer times involved for running the minimisation procedure in the observed parameter space, this method appears to clearly outperform traditional mass-based approaches when X-ray survey data alone are available.
Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization.
Mitra, Adway; Biswas, Soma; Bhattacharyya, Chiranjib
2017-03-01
A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.
Ab initio calculation of one-nucleon halo states
NASA Astrophysics Data System (ADS)
Rodkin, D. M.; Tchuvil'sky, Yu M.
2018-02-01
We develop an approach to microscopic and ab initio description of clustered systems, states with halo nucleon and one-nucleon resonances. For these purposes a basis combining ordinary shell-model components and cluster-channel terms is built up. The transformation of clustered wave functions to the uniform Slater-determinant type is performed using the concept of cluster coefficients. The resulting basis of orthonormalized wave functions is used for calculating the eigenvalues and the eigenvectors of Hamiltonians built in the framework of ab initio approaches. Calculations of resonance and halo states of 5He, 9Be and 9B nuclei demonstrate that the approach is workable and labor-saving.
Density-based cluster algorithms for the identification of core sets
NASA Astrophysics Data System (ADS)
Lemke, Oliver; Keller, Bettina G.
2016-10-01
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
ERIC Educational Resources Information Center
Nimon, Kim
2012-01-01
Using state achievement data that are openly accessible, this paper demonstrates the application of hierarchical linear modeling within the context of career technical education research. Three prominent approaches to analyzing clustered data (i.e., modeling aggregated data, modeling disaggregated data, modeling hierarchical data) are discussed…
Kéchichian, Razmig; Valette, Sébastien; Desvignes, Michel; Prost, Rémy
2013-11-01
We derive shortest-path constraints from graph models of structure adjacency relations and introduce them in a joint centroidal Voronoi image clustering and Graph Cut multiobject semiautomatic segmentation framework. The vicinity prior model thus defined is a piecewise-constant model incurring multiple levels of penalization capturing the spatial configuration of structures in multiobject segmentation. Qualitative and quantitative analyses and comparison with a Potts prior-based approach and our previous contribution on synthetic, simulated, and real medical images show that the vicinity prior allows for the correct segmentation of distinct structures having identical intensity profiles and improves the precision of segmentation boundary placement while being fairly robust to clustering resolution. The clustering approach we take to simplify images prior to segmentation strikes a good balance between boundary adaptivity and cluster compactness criteria furthermore allowing to control the trade-off. Compared with a direct application of segmentation on voxels, the clustering step improves the overall runtime and memory footprint of the segmentation process up to an order of magnitude without compromising the quality of the result.
Million-body star cluster simulations: comparisons between Monte Carlo and direct N-body
NASA Astrophysics Data System (ADS)
Rodriguez, Carl L.; Morscher, Meagan; Wang, Long; Chatterjee, Sourav; Rasio, Frederic A.; Spurzem, Rainer
2016-12-01
We present the first detailed comparison between million-body globular cluster simulations computed with a Hénon-type Monte Carlo code, CMC, and a direct N-body code, NBODY6++GPU. Both simulations start from an identical cluster model with 106 particles, and include all of the relevant physics needed to treat the system in a highly realistic way. With the two codes `frozen' (no fine-tuning of any free parameters or internal algorithms of the codes) we find good agreement in the overall evolution of the two models. Furthermore, we find that in both models, large numbers of stellar-mass black holes (>1000) are retained for 12 Gyr. Thus, the very accurate direct N-body approach confirms recent predictions that black holes can be retained in present-day, old globular clusters. We find only minor disagreements between the two models and attribute these to the small-N dynamics driving the evolution of the cluster core for which the Monte Carlo assumptions are less ideal. Based on the overwhelming general agreement between the two models computed using these vastly different techniques, we conclude that our Monte Carlo approach, which is more approximate, but dramatically faster compared to the direct N-body, is capable of producing an accurate description of the long-term evolution of massive globular clusters even when the clusters contain large populations of stellar-mass black holes.
Reniers, Genserik; Dullaert, Wout; Karel, Soudan
2009-08-15
Every company situated within a chemical cluster faces domino effect risks, whose magnitude depends on every company's own risk management strategies and on those of all others. Preventing domino effects is therefore very important to avoid catastrophes in the chemical process industry. Given that chemical companies are interlinked by domino effect accident links, there is some likelihood that even if certain companies fully invest in domino effects prevention measures, they can nonetheless experience an external domino effect caused by an accident which occurred in another chemical enterprise of the cluster. In this article a game-theoretic approach to interpret and model behaviour of chemical plants within chemical clusters while negotiating and deciding on domino effects prevention investments is employed.
Wu, Xiao; Shen, Jiong; Li, Yiguo; Lee, Kwang Y
2014-05-01
This paper develops a novel data-driven fuzzy modeling strategy and predictive controller for boiler-turbine unit using fuzzy clustering and subspace identification (SID) methods. To deal with the nonlinear behavior of boiler-turbine unit, fuzzy clustering is used to provide an appropriate division of the operation region and develop the structure of the fuzzy model. Then by combining the input data with the corresponding fuzzy membership functions, the SID method is extended to extract the local state-space model parameters. Owing to the advantages of the both methods, the resulting fuzzy model can represent the boiler-turbine unit very closely, and a fuzzy model predictive controller is designed based on this model. As an alternative approach, a direct data-driven fuzzy predictive control is also developed following the same clustering and subspace methods, where intermediate subspace matrices developed during the identification procedure are utilized directly as the predictor. Simulation results show the advantages and effectiveness of the proposed approach. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Błaszczuk, Artur; Krzywański, Jarosław
2017-03-01
The interrelation between fuzzy logic and cluster renewal approaches for heat transfer modeling in a circulating fluidized bed (CFB) has been established based on a local furnace data. The furnace data have been measured in a 1296 t/h CFB boiler with low level of flue gas recirculation. In the present study, the bed temperature and suspension density were treated as experimental variables along the furnace height. The measured bed temperature and suspension density were varied in the range of 1131-1156 K and 1.93-6.32 kg/m3, respectively. Using the heat transfer coefficient for commercial CFB combustor, two empirical heat transfer correlation were developed in terms of important operating parameters including bed temperature and also suspension density. The fuzzy logic results were found to be in good agreement with the corresponding experimental heat transfer data obtained based on cluster renewal approach. The predicted bed-to-wall heat transfer coefficient covered a range of 109-241 W/(m2K) and 111-240 W/(m2K), for fuzzy logic and cluster renewal approach respectively. The divergence in calculated heat flux recovery along the furnace height between fuzzy logic and cluster renewal approach did not exceeded ±2%.
2015-06-23
T. Bates, S. Brocklebank, S. Pauls, and D.Rockmore, A spectral clustering approach to the structure of personality: contrasting the FFM and...A spectral clustering approach to the structure of personality: contrasting the FFM and HEXACO models, Journal of Research in Personality, Volume 57
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials
Diaz-Ordaz, Karla; Bartlett, Jonathan W
2016-01-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.
Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W
2017-06-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
The Halo Boundary of Galaxy Clusters in the SDSS
NASA Astrophysics Data System (ADS)
Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh; Adhikari, Susmita; Dalal, Neal; Kravtsov, Andrey; More, Surhud; Rozo, Eduardo; Rykoff, Eli; Sheth, Ravi K.
2017-05-01
Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the “infalling” regime outside the halo to the “collapsed” regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxy colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a “splashback”-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. With upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.
Strategy Generalization across Orientation Tasks: Testing a Computational Cognitive Model
2008-07-01
arranged in groups ( clusters ). The space, itself, was divided into four quadrants, which had 1, 2, 3, and 4 objects, respectively. The arrangement of... clusters , of objects play an important role in the model’s performance, by providing some context for narrowing the search for the target to a portion of the...model uses a hierarchical approach to accomplish this. First, the model identifies a group or cluster of objects that contains the target. The number of
A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.
Bruneau, Marine; Mottet, Thierry; Moulin, Serge; Kerbiriou, Maël; Chouly, Franz; Chretien, Stéphane; Guyeux, Christophe
2018-02-01
In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clusters is not required here. For the sake of illustration, this method is applied on a set of 100 DNA sequences taken from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene, extracted from a collection of Platyhelminthes and Nematoda species. The resulting clusters are tightly consistent with the phylogenetic tree computed using a maximum likelihood approach on gene alignment. They are coherent too with the NCBI taxonomy. Further test results based on synthesized data are then provided, showing that the proposed approach is better able to recover the clusters than the most widely used software, namely Cd-hit-est and BLASTClust. Copyright © 2017 Elsevier Ltd. All rights reserved.
Fiero, Mallorie H; Hsu, Chiu-Hsieh; Bell, Melanie L
2017-11-20
We extend the pattern-mixture approach to handle missing continuous outcome data in longitudinal cluster randomized trials, which randomize groups of individuals to treatment arms, rather than the individuals themselves. Individuals who drop out at the same time point are grouped into the same dropout pattern. We approach extrapolation of the pattern-mixture model by applying multilevel multiple imputation, which imputes missing values while appropriately accounting for the hierarchical data structure found in cluster randomized trials. To assess parameters of interest under various missing data assumptions, imputed values are multiplied by a sensitivity parameter, k, which increases or decreases imputed values. Using simulated data, we show that estimates of parameters of interest can vary widely under differing missing data assumptions. We conduct a sensitivity analysis using real data from a cluster randomized trial by increasing k until the treatment effect inference changes. By performing a sensitivity analysis for missing data, researchers can assess whether certain missing data assumptions are reasonable for their cluster randomized trial. Copyright © 2017 John Wiley & Sons, Ltd.
Guenole, Nigel
2018-01-01
The test for item level cluster bias examines the improvement in model fit that results from freeing an item's between level residual variance from a baseline model with equal within and between level factor loadings and between level residual variances fixed at zero. A potential problem is that this approach may include a misspecified unrestricted model if any non-invariance is present, but the log-likelihood difference test requires that the unrestricted model is correctly specified. A free baseline approach where the unrestricted model includes only the restrictions needed for model identification should lead to better decision accuracy, but no studies have examined this yet. We ran a Monte Carlo study to investigate this issue. When the referent item is unbiased, compared to the free baseline approach, the constrained baseline approach led to similar true positive (power) rates but much higher false positive (Type I error) rates. The free baseline approach should be preferred when the referent indicator is unbiased. When the referent assumption is violated, the false positive rate was unacceptably high for both free and constrained baseline approaches, and the true positive rate was poor regardless of whether the free or constrained baseline approach was used. Neither the free or constrained baseline approach can be recommended when the referent indicator is biased. We recommend paying close attention to ensuring the referent indicator is unbiased in tests of cluster bias. All Mplus input and output files, R, and short Python scripts used to execute this simulation study are uploaded to an open access repository.
Guenole, Nigel
2018-01-01
The test for item level cluster bias examines the improvement in model fit that results from freeing an item's between level residual variance from a baseline model with equal within and between level factor loadings and between level residual variances fixed at zero. A potential problem is that this approach may include a misspecified unrestricted model if any non-invariance is present, but the log-likelihood difference test requires that the unrestricted model is correctly specified. A free baseline approach where the unrestricted model includes only the restrictions needed for model identification should lead to better decision accuracy, but no studies have examined this yet. We ran a Monte Carlo study to investigate this issue. When the referent item is unbiased, compared to the free baseline approach, the constrained baseline approach led to similar true positive (power) rates but much higher false positive (Type I error) rates. The free baseline approach should be preferred when the referent indicator is unbiased. When the referent assumption is violated, the false positive rate was unacceptably high for both free and constrained baseline approaches, and the true positive rate was poor regardless of whether the free or constrained baseline approach was used. Neither the free or constrained baseline approach can be recommended when the referent indicator is biased. We recommend paying close attention to ensuring the referent indicator is unbiased in tests of cluster bias. All Mplus input and output files, R, and short Python scripts used to execute this simulation study are uploaded to an open access repository. PMID:29551985
Complex networks as a unified framework for descriptive analysis and predictive modeling in climate
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R
The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less
Kee, Kerk F; Sparks, Lisa; Struppa, Daniele C; Mannucci, Mirco A; Damiano, Alberto
2016-01-01
By integrating the simplicial model of social aggregation with existing research on opinion leadership and diffusion networks, this article introduces the constructs of simplicial diffusers (mathematically defined as nodes embedded in simplexes; a simplex is a socially bonded cluster) and simplicial diffusing sets (mathematically defined as minimal covers of a simplicial complex; a simplicial complex is a social aggregation in which socially bonded clusters are embedded) to propose a strategic approach for information diffusion of cancer screenings as a health intervention on Facebook for community cancer prevention and control. This approach is novel in its incorporation of interpersonally bonded clusters, culturally distinct subgroups, and different united social entities that coexist within a larger community into a computational simulation to select sets of simplicial diffusers with the highest degree of information diffusion for health intervention dissemination. The unique contributions of the article also include seven propositions and five algorithmic steps for computationally modeling the simplicial model with Facebook data.
A quasichemical approach for protein-cluster free energies in dilute solution
NASA Astrophysics Data System (ADS)
Young, Teresa M.; Roberts, Christopher J.
2007-10-01
Reversible formation of protein oligomers or small clusters is a key step in processes such as protein polymerization, fibril formation, and protein phase separation from dilute solution. A straightforward, statistical mechanical approach to accurately calculate cluster free energies in solution is presented using a cell-based, quasichemical (QC) approximation for the partition function of proteins in an implicit solvent. The inputs to the model are the protein potential of mean force (PMF) and the corresponding subcell degeneracies up to relatively low particle densities. The approach is tested using simple two and three dimensional lattice models in which proteins interact with either isotropic or anisotropic nearest-neighbor attractions. Comparison with direct Monte Carlo simulation shows that cluster probabilities and free energies of oligomer formation (ΔGi0) are quantitatively predicted by the QC approach for protein volume fractions ˜10-2 (weight/volume concentration ˜10gl-1) and below. For small clusters, ΔGi0 depends weakly on the strength of short-ranged attractive interactions for most experimentally relevant values of the normalized osmotic second virial coefficient (b2*). For larger clusters (i ≫2), there is a small but non-negligible b2* dependence. The results suggest that nonspecific, hydrophobic attractions may not significantly stabilize prenuclei in processes such as non-native aggregation. Biased Monte Carlo methods are shown to accurately provide subcell degeneracies that are intractable to obtain analytically or by direct enumeration, and so offer a means to generalize the approach to mixtures and proteins with more complex PMFs.
Cluster-based control of a separating flow over a smoothly contoured ramp
NASA Astrophysics Data System (ADS)
Kaiser, Eurika; Noack, Bernd R.; Spohn, Andreas; Cattafesta, Louis N.; Morzyński, Marek
2017-12-01
The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. The proposed closed-loop control framework addresses a key issue of model-based control: The actuation effect often results from slow dynamics of strongly nonlinear interactions which the flow reveals at timescales much longer than the prediction horizon of any model. Hence, we employ a probabilistic approach based on a cluster-based discretization of the Liouville equation for the evolution of the probability distribution. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a control-dependent Markov model. This Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is determined. We examine how the approach can be used to improve the open-loop actuation in a separating flow dominated by Kelvin-Helmholtz shedding. For this purpose, the feature space, in which the model is learned, and the admissible control inputs are tailored to strongly oscillatory flows.
NASA Astrophysics Data System (ADS)
Sehgal, V.; Lakhanpal, A.; Maheswaran, R.; Khosa, R.; Sridhar, Venkataramana
2018-01-01
This study proposes a wavelet-based multi-resolution modeling approach for statistical downscaling of GCM variables to mean monthly precipitation for five locations at Krishna Basin, India. Climatic dataset from NCEP is used for training the proposed models (Jan.'69 to Dec.'94) and are applied to corresponding CanCM4 GCM variables to simulate precipitation for the validation (Jan.'95-Dec.'05) and forecast (Jan.'06-Dec.'35) periods. The observed precipitation data is obtained from the India Meteorological Department (IMD) gridded precipitation product at 0.25 degree spatial resolution. This paper proposes a novel Multi-Scale Wavelet Entropy (MWE) based approach for clustering climatic variables into suitable clusters using k-means methodology. Principal Component Analysis (PCA) is used to obtain the representative Principal Components (PC) explaining 90-95% variance for each cluster. A multi-resolution non-linear approach combining Discrete Wavelet Transform (DWT) and Second Order Volterra (SoV) is used to model the representative PCs to obtain the downscaled precipitation for each downscaling location (W-P-SoV model). The results establish that wavelet-based multi-resolution SoV models perform significantly better compared to the traditional Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) based frameworks. It is observed that the proposed MWE-based clustering and subsequent PCA, helps reduce the dimensionality of the input climatic variables, while capturing more variability compared to stand-alone k-means (no MWE). The proposed models perform better in estimating the number of precipitation events during the non-monsoon periods whereas the models with clustering without MWE over-estimate the rainfall during the dry season.
Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K
2003-11-01
Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Chen, Yun; Yang, Hui
2016-01-01
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581
Chen, Yun; Yang, Hui
2016-12-14
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.
Clustering of financial time series
NASA Astrophysics Data System (ADS)
D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo
2013-05-01
This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.
Generating clustered scale-free networks using Poisson based localization of edges
NASA Astrophysics Data System (ADS)
Türker, İlker
2018-05-01
We introduce a variety of network models using a Poisson-based edge localization strategy, which result in clustered scale-free topologies. We first verify the success of our localization strategy by realizing a variant of the well-known Watts-Strogatz model with an inverse approach, implying a small-world regime of rewiring from a random network through a regular one. We then apply the rewiring strategy to a pure Barabasi-Albert model and successfully achieve a small-world regime, with a limited capacity of scale-free property. To imitate the high clustering property of scale-free networks with higher accuracy, we adapted the Poisson-based wiring strategy to a growing network with the ingredients of both preferential attachment and local connectivity. To achieve the collocation of these properties, we used a routine of flattening the edges array, sorting it, and applying a mixing procedure to assemble both global connections with preferential attachment and local clusters. As a result, we achieved clustered scale-free networks with a computational fashion, diverging from the recent studies by following a simple but efficient approach.
Circulation Clusters--An Empirical Approach to Decentralization of Academic Libraries.
ERIC Educational Resources Information Center
McGrath, William E.
1986-01-01
Discusses the issue of centralization or decentralization of academic library collections, and describes a statistical analysis of book circulation at the University of Southwestern Louisiana that yielded subject area clusters as a compromise solution to the problem. Applications of the cluster model for all types of library catalogs are…
Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials
ERIC Educational Resources Information Center
Sanders, Elizabeth A.
2011-01-01
This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…
The halo boundary of galaxy clusters in the SDSS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh
Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the "infalling" regime outside the halo to the "collapsed" regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a "splashback"-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. As a result, with upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
The Halo Boundary of Galaxy Clusters in the SDSS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baxter, Eric; Jain, Bhuvnesh; Sheth, Ravi K.
Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the “infalling” regime outside the halo to the “collapsed” regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a “splashback”-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. With upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
The halo boundary of galaxy clusters in the SDSS
Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh; ...
2017-05-18
Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the "infalling" regime outside the halo to the "collapsed" regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a "splashback"-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. As a result, with upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
Modeling tensional homeostasis in multicellular clusters.
Tam, Sze Nok; Smith, Michael L; Stamenović, Dimitrije
2017-03-01
Homeostasis of mechanical stress in cells, or tensional homeostasis, is essential for normal physiological function of tissues and organs and is protective against disease progression, including atherosclerosis and cancer. Recent experimental studies have shown that isolated cells are not capable of maintaining tensional homeostasis, whereas multicellular clusters are, with stability increasing with the size of the clusters. Here, we proposed simple mathematical models to interpret experimental results and to obtain insight into factors that determine homeostasis. Multicellular clusters were modeled as one-dimensional arrays of linearly elastic blocks that were either jointed or disjointed. Fluctuating forces that mimicked experimentally measured cell-substrate tractions were obtained from Monte Carlo simulations. These forces were applied to the cluster models, and the corresponding stress field in the cluster was calculated by solving the equilibrium equation. It was found that temporal fluctuations of the cluster stress field became attenuated with increasing cluster size, indicating that the cluster approached tensional homeostasis. These results were consistent with previously reported experimental data. Furthermore, the models revealed that key determinants of tensional homeostasis in multicellular clusters included the cluster size, the distribution of traction forces, and mechanical coupling between adjacent cells. Based on these findings, we concluded that tensional homeostasis was a multicellular phenomenon. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Clustering of color map pixels: an interactive approach
NASA Astrophysics Data System (ADS)
Moon, Yiu Sang; Luk, Franklin T.; Yuen, K. N.; Yeung, Hoi Wo
2003-12-01
The demand for digital maps continues to arise as mobile electronic devices become more popular nowadays. Instead of creating the entire map from void, we may convert a scanned paper map into a digital one. Color clustering is the very first step of the conversion process. Currently, most of the existing clustering algorithms are fully automatic. They are fast and efficient but may not work well in map conversion because of the numerous ambiguous issues associated with printed maps. Here we introduce two interactive approaches for color clustering on the map: color clustering with pre-calculated index colors (PCIC) and color clustering with pre-calculated color ranges (PCCR). We also introduce a memory model that could enhance and integrate different image processing techniques for fine-tuning the clustering results. Problems and examples of the algorithms are discussed in the paper.
Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.
Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka
2014-02-01
In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.
Wang, Juan; Nishikawa, Robert M; Yang, Yongyi
2017-04-01
In computerized detection of clustered microcalcifications (MCs) from mammograms, the traditional approach is to apply a pattern detector to locate the presence of individual MCs, which are subsequently grouped into clusters. Such an approach is often susceptible to the occurrence of false positives (FPs) caused by local image patterns that resemble MCs. We investigate the feasibility of a direct detection approach to determining whether an image region contains clustered MCs or not. Toward this goal, we develop a deep convolutional neural network (CNN) as the classifier model to which the input consists of a large image window ([Formula: see text] in size). The multiple layers in the CNN classifier are trained to automatically extract image features relevant to MCs at different spatial scales. In the experiments, we demonstrated this approach on a dataset consisting of both screen-film mammograms and full-field digital mammograms. We evaluated the detection performance both on classifying image regions of clustered MCs using a receiver operating characteristic (ROC) analysis and on detecting clustered MCs from full mammograms by a free-response receiver operating characteristic analysis. For comparison, we also considered a recently developed MC detector with FP suppression. In classifying image regions of clustered MCs, the CNN classifier achieved 0.971 in the area under the ROC curve, compared to 0.944 for the MC detector. In detecting clustered MCs from full mammograms, at 90% sensitivity, the CNN classifier obtained an FP rate of 0.69 clusters/image, compared to 1.17 clusters/image by the MC detector. These results indicate that using global image features can be more effective in discriminating clustered MCs from FPs caused by various sources, such as linear structures, thereby providing a more accurate detection of clustered MCs on mammograms.
Clustering Genes of Common Evolutionary History
Gori, Kevin; Suchan, Tomasz; Alvarez, Nadir; Goldman, Nick; Dessimoz, Christophe
2016-01-01
Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent—due to events such as incomplete lineage sorting or horizontal gene transfer—it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such “process-agnostic” approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward’s method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta. We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl). PMID:26893301
Fulton, Kara A.; Liu, Danping; Haynie, Denise L.; Albert, Paul S.
2016-01-01
The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian–Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored. PMID:26937263
Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering
NASA Technical Reports Server (NTRS)
Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland
2000-01-01
Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.
2013-01-01
Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333
NASA Astrophysics Data System (ADS)
Asa'd, Randa S.; Vazdekis, Alexandre; Cerviño, Miguel; Noël, Noelia E. D.; Beasley, Michael A.; Kassab, Mahmoud
2017-11-01
The optical integrated spectra of three Large Magellanic Cloud young stellar clusters (NGC 1984, NGC 1994 and NGC 2011) exhibit concave continua and prominent molecular bands which deviate significantly from the predictions of single stellar population (SSP) models. In order to understand the appearance of these spectra, we create a set of young stellar population (MILES) models, which we make available to the community. We use archival International Ultraviolet Explorer integrated UV spectra to independently constrain the cluster masses and extinction, and rule out strong stochastic effects in the optical spectra. In addition, we also analyse deep colour-magnitude diagrams of the clusters to provide independent age determinations based on isochrone fitting. We explore hypotheses, including age spreads in the clusters, a top-heavy initial mass function, different SSP models and the role of red supergiant stars (RSG). We find that the strong molecular features in the optical spectra can be only reproduced by modelling an increased fraction of about ˜20 per cent by luminosity of RSG above what is predicted by canonical stellar evolution models. Given the uncertainties in stellar evolution at Myr ages, we cannot presently rule out the presence of Myr age spreads in these clusters. Our work combines different wavelengths as well as different approaches (resolved data as well as integrated spectra for the same sample) in order to reveal the complete picture. We show that each approach provides important information but in combination we can better understand the cluster stellar populations.
Enrichment Clusters: A Practical Plan for Real-World, Student-Driven Learning.
ERIC Educational Resources Information Center
Renzulli, Joseph S.; Gentry, Marcia; Reis, Sally M.
This guidebook provides a rationale and guidelines for implementing a student-driven learning approach using enrichment clusters. Enrichment clusters allow students who share a common interest to meet each week to produce a product, performance, or targeted service based on that common interest. Chapter 1 discusses different models of learning.…
Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice
2015-01-01
Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it.
General Framework for Effect Sizes in Cluster Randomized Experiments
ERIC Educational Resources Information Center
VanHoudnos, Nathan
2016-01-01
Cluster randomized experiments are ubiquitous in modern education research. Although a variety of modeling approaches are used to analyze these data, perhaps the most common methodology is a normal mixed effects model where some effects, such as the treatment effect, are regarded as fixed, and others, such as the effect of group random assignment…
Multilevel and Single-Level Models for Measured and Latent Variables When Data Are Clustered
ERIC Educational Resources Information Center
Stapleton, Laura M.; McNeish, Daniel M.; Yang, Ji Seung
2016-01-01
Multilevel models are often used to evaluate hypotheses about relations among constructs when data are nested within clusters (Raudenbush & Bryk, 2002), although alternative approaches are available when analyzing nested data (Binder & Roberts, 2003; Sterba, 2009). The overarching goal of this article is to suggest when it is appropriate…
Efficient generation of low-energy folded states of a model protein
NASA Astrophysics Data System (ADS)
Gordon, Heather L.; Kwan, Wai Kei; Gong, Chunhang; Larrass, Stefan; Rothstein, Stuart M.
2003-01-01
A number of short simulated annealing runs are performed on a highly-frustrated 46-"residue" off-lattice model protein. We perform, in an iterative fashion, a principal component analysis of the 946 nonbonded interbead distances, followed by two varieties of cluster analyses: hierarchical and k-means clustering. We identify several distinct sets of conformations with reasonably consistent cluster membership. Nonbonded distance constraints are derived for each cluster and are employed within a distance geometry approach to generate many new conformations, previously unidentified by the simulated annealing experiments. Subsequent analyses suggest that these new conformations are members of the parent clusters from which they were generated. Furthermore, several novel, previously unobserved structures with low energy were uncovered, augmenting the ensemble of simulated annealing results, and providing a complete distribution of low-energy states. The computational cost of this approach to generating low-energy conformations is small when compared to the expense of further Monte Carlo simulated annealing runs.
Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A
2008-04-01
Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.
NASA Astrophysics Data System (ADS)
Kumar, Rohit; Puri, Rajeev K.
2018-03-01
Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.
An Island Grouping Genetic Algorithm for Fuzzy Partitioning Problems
Salcedo-Sanz, S.; Del Ser, J.; Geem, Z. W.
2014-01-01
This paper presents a novel fuzzy clustering technique based on grouping genetic algorithms (GGAs), which are a class of evolutionary algorithms especially modified to tackle grouping problems. Our approach hinges on a GGA devised for fuzzy clustering by means of a novel encoding of individuals (containing elements and clusters sections), a new fitness function (a superior modification of the Davies Bouldin index), specially tailored crossover and mutation operators, and the use of a scheme based on a local search and a parallelization process, inspired from an island-based model of evolution. The overall performance of our approach has been assessed over a number of synthetic and real fuzzy clustering problems with different objective functions and distance measures, from which it is concluded that the proposed approach shows excellent performance in all cases. PMID:24977235
Dynamic Fuzzy Model Development for a Drum-type Boiler-turbine Plant Through GK Clustering
NASA Astrophysics Data System (ADS)
Habbi, Ahcène; Zelmat, Mimoun
2008-10-01
This paper discusses a TS fuzzy model identification method for an industrial drum-type boiler plant using the GK fuzzy clustering approach. The fuzzy model is constructed from a set of input-output data that covers a wide operating range of the physical plant. The reference data is generated using a complex first-principle-based mathematical model that describes the key dynamical properties of the boiler-turbine dynamics. The proposed fuzzy model is derived by means of fuzzy clustering method with particular attention on structure flexibility and model interpretability issues. This may provide a basement of a new way to design model based control and diagnosis mechanisms for the complex nonlinear plant.
Novel layered clustering-based approach for generating ensemble of classifiers.
Rahman, Ashfaqur; Verma, Brijesh
2011-05-01
This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.
An alternative validation strategy for the Planck cluster catalogue and y-distortion maps
NASA Astrophysics Data System (ADS)
Khatri, Rishi
2016-07-01
We present an all-sky map of the y-type distortion calculated from the full mission Planck High Frequency Instrument (HFI) data using the recently proposed approach to component separation, which is based on parametric model fitting and model selection. This simple model-selection approach enables us to distinguish between carbon monoxide (CO) line emission and y-type distortion, something that is not possible using the internal linear combination based methods. We create a mask to cover the regions of significant CO emission relying on the information in the χ2 map that was obtained when fitting for the y-distortion and CO emission to the lowest four HFI channels. We revisit the second Planck cluster catalogue and try to quantify the quality of the cluster candidates in an approach that is similar in spirit to Aghanim et al. (2015, A&A, 580, A138). We find that at least 93% of the clusters in the cosmology sample are free of CO contamination. We also find that 59% of unconfirmed candidates may have significant contamination from molecular clouds. We agree with Planck Collaboration XXVII (2016, A&A, in press) on the worst offenders. We suggest an alternative validation strategy of measuring and subtracting the CO emission from the Planck cluster candidates using radio telescopes, thus improving the reliability of the catalogue. Our CO mask and annotations to the Planck cluster catalogue, identifying cluster candidates with possible CO contamination, are made publicly available. The full Tables 1-3 are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/592/A48
3D morphology-based clustering and simulation of human pyramidal cell dendritic spines.
Luengo-Sanchez, Sergio; Fernaud-Espinosa, Isabel; Bielza, Concha; Benavides-Piccione, Ruth; Larrañaga, Pedro; DeFelipe, Javier
2018-06-13
The dendritic spines of pyramidal neurons are the targets of most excitatory synapses in the cerebral cortex. They have a wide variety of morphologies, and their morphology appears to be critical from the functional point of view. To further characterize dendritic spine geometry, we used in this paper over 7,000 individually 3D reconstructed dendritic spines from human cortical pyramidal neurons to group dendritic spines using model-based clustering. This approach uncovered six separate groups of human dendritic spines. To better understand the differences between these groups, the discriminative characteristics of each group were identified as a set of rules. Model-based clustering was also useful for simulating accurate 3D virtual representations of spines that matched the morphological definitions of each cluster. This mathematical approach could provide a useful tool for theoretical predictions on the functional features of human pyramidal neurons based on the morphology of dendritic spines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rusek, Marian; Orlowski, Arkadiusz
2005-04-01
The dynamics of small ({<=}55 atoms) argon clusters ionized by an intense femtosecond laser pulse is studied using a time-dependent Thomas-Fermi model. The resulting Bloch-like hydrodynamic equations are solved numerically using the smooth particle hydrodynamics method without the necessity of grid simulations. As follows from recent experiments, absorption of radiation and subsequent ionization of clusters observed in the short-wavelength laser frequency regime (98 nm) differs considerably from that in the optical spectral range (800 nm). Our theoretical approach provides a unified framework for treating these very different frequency regimes and allows for a deeper understanding of the underlying cluster explosionmore » mechanisms. The results of our analysis following from extensive numerical simulations presented in this paper are compared both with experimental findings and with predictions of other theoretical models.« less
Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf
2017-09-01
Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wykes, M., E-mail: mikewykes@gmail.com; Parambil, R.; Gierschner, J.
Here, we present a general approach to treating vibronic coupling in molecular crystals based on atomistic simulations of large clusters. Such clusters comprise model aggregates treated at the quantum chemical level embedded within a realistic environment treated at the molecular mechanics level. As we calculate ground and excited state equilibrium geometries and vibrational modes of model aggregates, our approach is able to capture effects arising from coupling to intermolecular degrees of freedom, absent from existing models relying on geometries and normal modes of single molecules. Using the geometries and vibrational modes of clusters, we are able to simulate the fluorescencemore » spectra of aggregates for which the lowest excited state bears negligible oscillator strength (as is the case, e.g., ideal H-aggregates) by including both Franck-Condon (FC) and Herzberg-Teller (HT) vibronic transitions. The latter terms allow the adiabatic excited state of the cluster to couple with vibrations in a perturbative fashion via derivatives of the transition dipole moment along nuclear coordinates. While vibronic coupling simulations employing FC and HT terms are well established for single-molecules, to our knowledge this is the first time they are applied to molecular aggregates. Here, we apply this approach to the simulation of the low-temperature fluorescence spectrum of para-distyrylbenzene single-crystal H-aggregates and draw comparisons with coarse-grained Frenkel-Holstein approaches previously extensively applied to such systems.« less
Implicit Priors in Galaxy Cluster Mass and Scaling Relation Determinations
NASA Technical Reports Server (NTRS)
Mantz, A.; Allen, S. W.
2011-01-01
Deriving the total masses of galaxy clusters from observations of the intracluster medium (ICM) generally requires some prior information, in addition to the assumptions of hydrostatic equilibrium and spherical symmetry. Often, this information takes the form of particular parametrized functions used to describe the cluster gas density and temperature profiles. In this paper, we investigate the implicit priors on hydrostatic masses that result from this fully parametric approach, and the implications of such priors for scaling relations formed from those masses. We show that the application of such fully parametric models of the ICM naturally imposes a prior on the slopes of the derived scaling relations, favoring the self-similar model, and argue that this prior may be influential in practice. In contrast, this bias does not exist for techniques which adopt an explicit prior on the form of the mass profile but describe the ICM non-parametrically. Constraints on the slope of the cluster mass-temperature relation in the literature show a separation based the approach employed, with the results from fully parametric ICM modeling clustering nearer the self-similar value. Given that a primary goal of scaling relation analyses is to test the self-similar model, the application of methods subject to strong, implicit priors should be avoided. Alternative methods and best practices are discussed.
AN AGGREGATION AND EPISODE SELECTION SCHEME FOR EPA'S MODELS-3 CMAQ
The development of an episode selection and aggregation approach, designed to support distributional estimation for use with the Models-3 Community Multiscale Air Quality (CMAQ) model, is described. The approach utilized cluster analysis of the 700 hPa u and v wind field compo...
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Support Distribution Machines
NASA Astrophysics Data System (ADS)
Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal; Fromenteau, Sebastien; Poczos, Barnabas; Schneider, Jeff
2018-01-01
We study dynamical mass measurements of galaxy clusters contaminated by interlopers and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create two mock catalogs from Multidark’s publicly available N-body MDPL1 simulation, one with perfect galaxy cluster membership infor- mation and the other where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power-law scaling relation to infer cluster mass from galaxy line-of-sight (LOS) velocity dispersion. Assuming perfect membership knowledge, this unrealistic case produces a wide fractional mass error distribution, with a width E=0.87. Interlopers introduce additional scatter, significantly widening the error distribution further (E=2.13). We employ the support distribution machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement (E=0.67) for the contaminated case. Remarkably, SDM applied to contaminated clusters is better able to recover masses than even the scaling relation approach applied to uncon- taminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.
Description of alternating-parity bands within the dinuclear-system model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shneidman, T. M.; Adamian, G. G., E-mail: adamian@theor.jinr.ru; Antonenko, N. V.
2016-11-15
A cluster approach is used to describe ground-state-based alternating-parity bands in even–even nuclei and to study the band-termination mechanism. A method is proposed for testing the cluster nature of alternating-parity bands.
Chen, Ling; Feng, Yanqin; Sun, Jianguo
2017-10-01
This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
Inductive Approaches to Improving Diagnosis and Design for Diagnosability
NASA Technical Reports Server (NTRS)
Fisher, Douglas H. (Principal Investigator)
1995-01-01
The first research area under this grant addresses the problem of classifying time series according to their morphological features in the time domain. A supervised learning system called CALCHAS, which induces a classification procedure for signatures from preclassified examples, was developed. For each of several signature classes, the system infers a model that captures the class's morphological features using Bayesian model induction and the minimum message length approach to assign priors. After induction, a time series (signature) is classified in one of the classes when there is enough evidence to support that decision. Time series with sufficiently novel features, belonging to classes not present in the training set, are recognized as such. A second area of research assumes two sources of information about a system: a model or domain theory that encodes aspects of the system under study and data from actual system operations over time. A model, when it exists, represents strong prior expectations about how a system will perform. Our work with a diagnostic model of the RCS (Reaction Control System) of the Space Shuttle motivated the development of SIG, a system which combines information from a model (or domain theory) and data. As it tracks RCS behavior, the model computes quantitative and qualitative values. Induction is then performed over the data represented by both the 'raw' features and the model-computed high-level features. Finally, work on clustering for operating mode discovery motivated some important extensions to the clustering strategy we had used. One modification appends an iterative optimization technique onto the clustering system; this optimization strategy appears to be novel in the clustering literature. A second modification improves the noise tolerance of the clustering system. In particular, we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus making post-clustering analysis easier.
The development of an episode selection and aggregation approach, designed to support distributional estimation of use with the Models-3 Community Multiscale Air Quality (CMAQ) model, is described. The approach utilized cluster analysis of the 700-hPa east-west and north-south...
Modeling and clustering water demand patterns from real-world smart meter data
NASA Astrophysics Data System (ADS)
Cheifetz, Nicolas; Noumir, Zineb; Samé, Allou; Sandraz, Anne-Claire; Féliers, Cédric; Heim, Véronique
2017-08-01
Nowadays, drinking water utilities need an acute comprehension of the water demand on their distribution network, in order to efficiently operate the optimization of resources, manage billing and propose new customer services. With the emergence of smart grids, based on automated meter reading (AMR), a better understanding of the consumption modes is now accessible for smart cities with more granularities. In this context, this paper evaluates a novel methodology for identifying relevant usage profiles from the water consumption data produced by smart meters. The methodology is fully data-driven using the consumption time series which are seen as functions or curves observed with an hourly time step. First, a Fourier-based additive time series decomposition model is introduced to extract seasonal patterns from time series. These patterns are intended to represent the customer habits in terms of water consumption. Two functional clustering approaches are then used to classify the extracted seasonal patterns: the functional version of K-means, and the Fourier REgression Mixture (FReMix) model. The K-means approach produces a hard segmentation and K representative prototypes. On the other hand, the FReMix is a generative model and also produces K profiles as well as a soft segmentation based on the posterior probabilities. The proposed approach is applied to a smart grid deployed on the largest water distribution network (WDN) in France. The two clustering strategies are evaluated and compared. Finally, a realistic interpretation of the consumption habits is given for each cluster. The extensive experiments and the qualitative interpretation of the resulting clusters allow one to highlight the effectiveness of the proposed methodology.
Barker, Daniel; D'Este, Catherine; Campbell, Michael J; McElduff, Patrick
2017-03-09
Stepped wedge cluster randomised trials frequently involve a relatively small number of clusters. The most common frameworks used to analyse data from these types of trials are generalised estimating equations and generalised linear mixed models. A topic of much research into these methods has been their application to cluster randomised trial data and, in particular, the number of clusters required to make reasonable inferences about the intervention effect. However, for stepped wedge trials, which have been claimed by many researchers to have a statistical power advantage over the parallel cluster randomised trial, the minimum number of clusters required has not been investigated. We conducted a simulation study where we considered the most commonly used methods suggested in the literature to analyse cross-sectional stepped wedge cluster randomised trial data. We compared the per cent bias, the type I error rate and power of these methods in a stepped wedge trial setting with a binary outcome, where there are few clusters available and when the appropriate adjustment for a time trend is made, which by design may be confounding the intervention effect. We found that the generalised linear mixed modelling approach is the most consistent when few clusters are available. We also found that none of the common analysis methods for stepped wedge trials were both unbiased and maintained a 5% type I error rate when there were only three clusters. Of the commonly used analysis approaches, we recommend the generalised linear mixed model for small stepped wedge trials with binary outcomes. We also suggest that in a stepped wedge design with three steps, at least two clusters be randomised at each step, to ensure that the intervention effect estimator maintains the nominal 5% significance level and is also reasonably unbiased.
Huber, Heinrich J; Connolly, Niamh M C; Dussmann, Heiko; Prehn, Jochen H M
2012-03-01
We devised an approach to extract control principles of cellular bioenergetics for intact and impaired mitochondria from ODE-based models and applied it to a recently established bioenergetic model of cancer cells. The approach used two methods for varying ODE model parameters to determine those model components that, either alone or in combination with other components, most decisively regulated bioenergetic state variables. We found that, while polarisation of the mitochondrial membrane potential (ΔΨ(m)) and, therefore, the protomotive force were critically determined by respiratory complex I activity in healthy mitochondria, complex III activity was dominant for ΔΨ(m) during conditions of cytochrome-c deficiency. As a further important result, cellular bioenergetics in healthy, ATP-producing mitochondria was regulated by three parameter clusters that describe (1) mitochondrial respiration, (2) ATP production and consumption and (3) coupling of ATP-production and respiration. These parameter clusters resembled metabolic blocks and their intermediaries from top-down control analyses. However, parameter clusters changed significantly when cells changed from low to high ATP levels or when mitochondria were considered to be impaired by loss of cytochrome-c. This change suggests that the assumption of static metabolic blocks by conventional top-down control analyses is not valid under these conditions. Our approach is complementary to both ODE and top-down control analysis approaches and allows a better insight into cellular bioenergetics and its pathological alterations.
Exploring the Dynamics of Exoplanetary Systems in a Young Stellar Cluster
NASA Astrophysics Data System (ADS)
Thornton, Jonathan Daniel; Glaser, Joseph Paul; Wall, Joshua Edward
2018-01-01
I describe a dynamical simulation of planetary systems in a young star cluster. One rather arbitrary aspect of cluster simulations is the choice of initial conditions. These are typically chosen from some standard model, such as Plummer or King, or from a “fractal” distribution to try to model young clumpy systems. Here I adopt the approach of realizing an initial cluster model directly from a detailed magnetohydrodynamical model of cluster formation from a 1000-solar-mass interstellar gas cloud, with magnetic fields and radiative and wind feedback from massive stars included self-consistently. The N-body simulation of the stars and planets starts once star formation is largely over and feedback has cleared much of the gas from the region where the newborn stars reside. It continues until the cluster dissolves in the galactic field. Of particular interest is what would happen to the free-floating planets created in the gas cloud simulation. Are they captured by a star or are they ejected from the cluster? This method of building a dynamical cluster simulation directly from the results of a cluster formation model allows us to better understand the evolution of young star clusters and enriches our understanding of extrasolar planet development in them. These simulations were performed within the AMUSE simulation framework, and combine N-body, multiples and background potential code.
Exploring the Internal Dynamics of Globular Clusters
NASA Astrophysics Data System (ADS)
Watkins, Laura L.; van der Marel, Roeland; Bellini, Andrea; Luetzgendorf, Nora; HSTPROMO Collaboration
2018-01-01
Exploring the Internal Dynamics of Globular ClustersThe formation histories and structural properties of globular clusters are imprinted on their internal dynamics. Energy equipartition results in velocity differences for stars of different mass, and leads to mass segregation, which results in different spatial distributions for stars of different mass. Intermediate-mass black holes significantly increase the velocity dispersions at the centres of clusters. By combining accurate measurements of their internal kinematics with state-of-the-art dynamical models, we can characterise both the velocity dispersion and mass profiles of clusters, tease apart the different effects, and understand how clusters may have formed and evolved.Using proper motions from the Hubble Space Telescope Proper Motion (HSTPROMO) Collaboration for a set of 22 Milky Way globular clusters, and our discrete dynamical modelling techniques designed to work with large, high-quality datasets, we are studying a variety of internal cluster properties. We will present the results of theoretical work on simulated clusters that demonstrates the efficacy of our approach, and preliminary results from application to real clusters.
TWave: High-Order Analysis of Functional MRI
Barnathan, Michael; Megalooikonomou, Vasileios; Faloutsos, Christos; Faro, Scott; Mohamed, Feroze B.
2011-01-01
The traditional approach to functional image analysis models images as matrices of raw voxel intensity values. Although such a representation is widely utilized and heavily entrenched both within neuroimaging and in the wider data mining community, the strong interactions among space, time, and categorical modes such as subject and experimental task inherent in functional imaging yield a dataset with “high-order” structure, which matrix models are incapable of exploiting. Reasoning across all of these modes of data concurrently requires a high-order model capable of representing relationships between all modes of the data in tandem. We thus propose to model functional MRI data using tensors, which are high-order generalizations of matrices equivalent to multidimensional arrays or data cubes. However, several unique challenges exist in the high-order analysis of functional medical data: naïve tensor models are incapable of exploiting spatiotemporal locality patterns, standard tensor analysis techniques exhibit poor efficiency, and mixtures of numeric and categorical modes of data are very often present in neuroimaging experiments. Formulating the problem of image clustering as a form of Latent Semantic Analysis and using the WaveCluster algorithm as a baseline, we propose a comprehensive hybrid tensor and wavelet framework for clustering, concept discovery, and compression of functional medical images which successfully addresses these challenges. Our approach reduced runtime and dataset size on a 9.3 GB finger opposition motor task fMRI dataset by up to 98% while exhibiting improved spatiotemporal coherence relative to standard tensor, wavelet, and voxel-based approaches. Our clustering technique was capable of automatically differentiating between the frontal areas of the brain responsible for task-related habituation and the motor regions responsible for executing the motor task, in contrast to a widely used fMRI analysis program, SPM, which only detected the latter region. Furthermore, our approach discovered latent concepts suggestive of subject handedness nearly 100x faster than standard approaches. These results suggest that a high-order model is an integral component to accurate scalable functional neuroimaging. PMID:21729758
A Bayesian cluster analysis method for single-molecule localization microscopy data.
Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick
2016-12-01
Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.
High- and low-level hierarchical classification algorithm based on source separation process
NASA Astrophysics Data System (ADS)
Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber
2016-10-01
High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.
Balzer, Laura B; Zheng, Wenjing; van der Laan, Mark J; Petersen, Maya L
2018-01-01
We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
Copula based flexible modeling of associations between clustered event times.
Geerdens, Candida; Claeskens, Gerda; Janssen, Paul
2016-07-01
Multivariate survival data are characterized by the presence of correlation between event times within the same cluster. First, we build multi-dimensional copulas with flexible and possibly symmetric dependence structures for such data. In particular, clustered right-censored survival data are modeled using mixtures of max-infinitely divisible bivariate copulas. Second, these copulas are fit by a likelihood approach where the vast amount of copula derivatives present in the likelihood is approximated by finite differences. Third, we formulate conditions for clustered right-censored survival data under which an information criterion for model selection is either weakly consistent or consistent. Several of the familiar selection criteria are included. A set of four-dimensional data on time-to-mastitis is used to demonstrate the developed methodology.
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
Spatial clustering of average risks and risk trends in Bayesian disease mapping.
Anderson, Craig; Lee, Duncan; Dean, Nema
2017-01-01
Spatiotemporal disease mapping focuses on estimating the spatial pattern in disease risk across a set of nonoverlapping areal units over a fixed period of time. The key aim of such research is to identify areas that have a high average level of disease risk or where disease risk is increasing over time, thus allowing public health interventions to be focused on these areas. Such aims are well suited to the statistical approach of clustering, and while much research has been done in this area in a purely spatial setting, only a handful of approaches have focused on spatiotemporal clustering of disease risk. Therefore, this paper outlines a new modeling approach for clustering spatiotemporal disease risk data, by clustering areas based on both their mean risk levels and the behavior of their temporal trends. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Prediction of Fracture Behavior in Rock and Rock-like Materials Using Discrete Element Models
NASA Astrophysics Data System (ADS)
Katsaga, T.; Young, P.
2009-05-01
The study of fracture initiation and propagation in heterogeneous materials such as rock and rock-like materials are of principal interest in the field of rock mechanics and rock engineering. It is crucial to study and investigate failure prediction and safety measures in civil and mining structures. Our work offers a practical approach to predict fracture behaviour using discrete element models. In this approach, the microstructures of materials are presented through the combination of clusters of bonded particles with different inter-cluster particle and bond properties, and intra-cluster bond properties. The geometry of clusters is transferred from information available from thin sections, computed tomography (CT) images and other visual presentation of the modeled material using customized AutoCAD built-in dialog- based Visual Basic Application. Exact microstructures of the tested sample, including fractures, faults, inclusions and void spaces can be duplicated in the discrete element models. Although the microstructural fabrics of rocks and rock-like structures may have different scale, fracture formation and propagation through these materials are alike and will follow similar mechanics. Synthetic material provides an excellent condition for validating the modelling approaches, as fracture behaviours are known with the well-defined composite's properties. Calibration of the macro-properties of matrix material and inclusions (aggregates), were followed with the overall mechanical material responses calibration by adjusting the interfacial properties. The discrete element model predicted similar fracture propagation features and path as that of the real sample material. The path of the fractures and matrix-inclusion interaction was compared using computed tomography images. Initiation and fracture formation in the model and real material were compared using Acoustic Emission data. Analysing the temporal and spatial evolution of AE events, collected during the sample testing, in relation to the CT images allows the precise reconstruction of the failure sequence. Our proposed modelling approach illustrates realistic fracture formation and growth predictions at different loading conditions.
Model selection for clustering of pharmacokinetic responses.
Guerra, Rui P; Carvalho, Alexandra M; Mateus, Paulo
2018-08-01
Pharmacokinetics comprises the study of drug absorption, distribution, metabolism and excretion over time. Clinical pharmacokinetics, focusing on therapeutic management, offers important insights towards personalised medicine through the study of efficacy and toxicity of drug therapies. This study is hampered by subject's high variability in drug blood concentration, when starting a therapy with the same drug dosage. Clustering of pharmacokinetics responses has been addressed recently as a way to stratify subjects and provide different drug doses for each stratum. This clustering method, however, is not able to automatically determine the correct number of clusters, using an user-defined parameter for collapsing clusters that are closer than a given heuristic threshold. We aim to use information-theoretical approaches to address parameter-free model selection. We propose two model selection criteria for clustering pharmacokinetics responses, founded on the Minimum Description Length and on the Normalised Maximum Likelihood. Experimental results show the ability of model selection schemes to unveil the correct number of clusters underlying the mixture of pharmacokinetics responses. In this work we were able to devise two model selection criteria to determine the number of clusters in a mixture of pharmacokinetics curves, advancing over previous works. A cost-efficient parallel implementation in Java of the proposed method is publicly available for the community. Copyright © 2018 Elsevier B.V. All rights reserved.
Some Unsolved Problems, Questions, and Applications of the Brightsen Nucleon Cluster Model
NASA Astrophysics Data System (ADS)
Smarandache, Florentin
2010-10-01
Brightsen Model is opposite to the Standard Model, and it was build on John Weeler's Resonating Group Structure Model and on Linus Pauling's Close-Packed Spheron Model. Among Brightsen Model's predictions and applications we cite the fact that it derives the average number of prompt neutrons per fission event, it provides a theoretical way for understanding the low temperature / low energy reactions and for approaching the artificially induced fission, it predicts that forces within nucleon clusters are stronger than forces between such clusters within isotopes; it predicts the unmatter entities inside nuclei that result from stable and neutral union of matter and antimatter, and so on. But these predictions have to be tested in the future at the new CERN laboratory.
Ayral, Thomas; Vučičević, Jaksa; Parcollet, Olivier
2017-10-20
We present an embedded-cluster method, based on the triply irreducible local expansion formalism. It turns the Fierz ambiguity, inherent to approaches based on a bosonic decoupling of local fermionic interactions, into a convergence criterion. It is based on the approximation of the three-leg vertex by a coarse-grained vertex computed from a self-consistently determined cluster impurity model. The computed self-energies are, by construction, continuous functions of momentum. We show that, in three interaction and doping regimes of the two-dimensional Hubbard model, self-energies obtained with clusters of size four only are very close to numerically exact benchmark results. We show that the Fierz parameter, which parametrizes the freedom in the Hubbard-Stratonovich decoupling, can be used as a quality control parameter. By contrast, the GW+extended dynamical mean field theory approximation with four cluster sites is shown to yield good results only in the weak-coupling regime and for a particular decoupling. Finally, we show that the vertex has spatially nonlocal components only at low Matsubara frequencies.
Stopka, Thomas J; Goulart, Michael A; Meyers, David J; Hutcheson, Marga; Barton, Kerri; Onofrey, Shauna; Church, Daniel; Donahue, Ashley; Chui, Kenneth K H
2017-04-20
Hepatitis C virus (HCV) infections have increased during the past decade but little is known about geographic clustering patterns. We used a unique analytical approach, combining geographic information systems (GIS), spatial epidemiology, and statistical modeling to identify and characterize HCV hotspots, statistically significant clusters of census tracts with elevated HCV counts and rates. We compiled sociodemographic and HCV surveillance data (n = 99,780 cases) for Massachusetts census tracts (n = 1464) from 2002 to 2013. We used a five-step spatial epidemiological approach, calculating incremental spatial autocorrelations and Getis-Ord Gi* statistics to identify clusters. We conducted logistic regression analyses to determine factors associated with the HCV hotspots. We identified nine HCV clusters, with the largest in Boston, New Bedford/Fall River, Worcester, and Springfield (p < 0.05). In multivariable analyses, we found that HCV hotspots were independently and positively associated with the percent of the population that was Hispanic (adjusted odds ratio [AOR]: 1.07; 95% confidence interval [CI]: 1.04, 1.09) and the percent of households receiving food stamps (AOR: 1.83; 95% CI: 1.22, 2.74). HCV hotspots were independently and negatively associated with the percent of the population that were high school graduates or higher (AOR: 0.91; 95% CI: 0.89, 0.93) and the percent of the population in the "other" race/ethnicity category (AOR: 0.88; 95% CI: 0.85, 0.91). We identified locations where HCV clusters were a concern, and where enhanced HCV prevention, treatment, and care can help combat the HCV epidemic in Massachusetts. GIS, spatial epidemiological and statistical analyses provided a rigorous approach to identify hotspot clusters of disease, which can inform public health policy and intervention targeting. Further studies that incorporate spatiotemporal cluster analyses, Bayesian spatial and geostatistical models, spatially weighted regression analyses, and assessment of associations between HCV clustering and the built environment are needed to expand upon our combined spatial epidemiological and statistical methods.
Inherent Structure versus Geometric Metric for State Space Discretization
Liu, Hanzhong; Li, Minghai; Fan, Jue; Huo, Shuanghong
2016-01-01
Inherent structure (IS) and geometry-based clustering methods are commonly used for analyzing molecular dynamics trajectories. ISs are obtained by minimizing the sampled conformations into local minima on potential/effective energy surface. The conformations that are minimized into the same energy basin belong to one cluster. We investigate the influence of the applications of these two methods of trajectory decomposition on our understanding of the thermodynamics and kinetics of alanine tetrapeptide. We find that at the micro cluster level, the IS approach and root-mean-square deviation (RMSD) based clustering method give totally different results. Depending on the local features of energy landscape, the conformations with close RMSDs can be minimized into different minima, while the conformations with large RMSDs could be minimized into the same basin. However, the relaxation timescales calculated based on the transition matrices built from the micro clusters are similar. The discrepancy at the micro cluster level leads to different macro clusters. Although the dynamic models established through both clustering methods are validated approximately Markovian, the IS approach seems to give a meaningful state space discretization at the macro cluster level. PMID:26915811
Multiple-Instance Regression with Structured Data
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Lane, Terran; Roper, Alex
2008-01-01
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Recapitulation of Ayurveda constitution types by machine learning of phenotypic traits.
Tiwari, Pradeep; Kutum, Rintu; Sethi, Tavpritesh; Shrivastava, Ankita; Girase, Bhushan; Aggarwal, Shilpi; Patil, Rutuja; Agarwal, Dhiraj; Gautam, Pramod; Agrawal, Anurag; Dash, Debasis; Ghosh, Saurabh; Juvekar, Sanjay; Mukerji, Mitali; Prasher, Bhavana
2017-01-01
In Ayurveda system of medicine individuals are classified into seven constitution types, "Prakriti", for assessing disease susceptibility and drug responsiveness. Prakriti evaluation involves clinical examination including questions about physiological and behavioural traits. A need was felt to develop models for accurately predicting Prakriti classes that have been shown to exhibit molecular differences. The present study was carried out on data of phenotypic attributes in 147 healthy individuals of three extreme Prakriti types, from a genetically homogeneous population of Western India. Unsupervised and supervised machine learning approaches were used to infer inherent structure of the data, and for feature selection and building classification models for Prakriti respectively. These models were validated in a North Indian population. Unsupervised clustering led to emergence of three natural clusters corresponding to three extreme Prakriti classes. The supervised modelling approaches could classify individuals, with distinct Prakriti types, in the training and validation sets. This study is the first to demonstrate that Prakriti types are distinct verifiable clusters within a multidimensional space of multiple interrelated phenotypic traits. It also provides a computational framework for predicting Prakriti classes from phenotypic attributes. This approach may be useful in precision medicine for stratification of endophenotypes in healthy and diseased populations.
Automated modal parameter estimation using correlation analysis and bootstrap sampling
NASA Astrophysics Data System (ADS)
Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.
2018-02-01
The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
Shi, Weifang; Zeng, Weihua
2013-01-01
Reducing human vulnerability to chemical hazards in the industrialized city is a matter of great urgency. Vulnerability mapping is an alternative approach for providing vulnerability-reducing interventions in a region. This study presents a method for mapping human vulnerability to chemical hazards by using clustering analysis for effective vulnerability reduction. Taking the city of Shanghai as the study area, we measure human exposure to chemical hazards by using the proximity model with additionally considering the toxicity of hazardous substances, and capture the sensitivity and coping capacity with corresponding indicators. We perform an improved k-means clustering approach on the basis of genetic algorithm by using a 500 m × 500 m geographical grid as basic spatial unit. The sum of squared errors and silhouette coefficient are combined to measure the quality of clustering and to determine the optimal clustering number. Clustering result reveals a set of six typical human vulnerability patterns that show distinct vulnerability dimension combinations. The vulnerability mapping of the study area reflects cluster-specific vulnerability characteristics and their spatial distribution. Finally, we suggest specific points that can provide new insights in rationally allocating the limited funds for the vulnerability reduction of each cluster. PMID:23787337
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.
Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai
2016-03-01
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
Low Temperature Kinetics of the First Steps of Water Cluster Formation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bourgalais, J.; Roussel, V.; Capron, M.
2016-03-01
We present a combined experimental and theoretical low temperature kinetic study of water cluster formation. Water cluster growth takes place in low temperature (23-69 K) supersonic flows. The observed kinetics of formation of water clusters are reproduced with a kinetic model based on theoretical predictions for the first steps of clusterization. The temperature-and pressure-dependent association and dissociation rate coefficients are predicted with an ab initio transition state theory based master equation approach over a wide range of temperatures (20-100 K) and pressures (10(-6) - 10 bar).
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning
NASA Astrophysics Data System (ADS)
Ntampaka, M.; Trac, H.; Sutherland, D. J.; Fromenteau, S.; Póczos, B.; Schneider, J.
2016-11-01
We study dynamical mass measurements of galaxy clusters contaminated by interlopers and show that a modern machine learning algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create two mock catalogs from Multidark’s publicly available N-body MDPL1 simulation, one with perfect galaxy cluster membership information and the other where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power-law scaling relation to infer cluster mass from galaxy line-of-sight (LOS) velocity dispersion. Assuming perfect membership knowledge, this unrealistic case produces a wide fractional mass error distribution, with a width of {{Δ }}ε ≈ 0.87. Interlopers introduce additional scatter, significantly widening the error distribution further ({{Δ }}ε ≈ 2.13). We employ the support distribution machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement ({{Δ }}ε ≈ 0.67) for the contaminated case. Remarkably, SDM applied to contaminated clusters is better able to recover masses than even the scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.
MIXOR: a computer program for mixed-effects ordinal regression analysis.
Hedeker, D; Gibbons, R D
1996-03-01
MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.
A density-based clustering model for community detection in complex networks
NASA Astrophysics Data System (ADS)
Zhao, Xiang; Li, Yantao; Qu, Zehui
2018-04-01
Network clustering (or graph partitioning) is an important technique for uncovering the underlying community structures in complex networks, which has been widely applied in various fields including astronomy, bioinformatics, sociology, and bibliometric. In this paper, we propose a density-based clustering model for community detection in complex networks (DCCN). The key idea is to find group centers with a higher density than their neighbors and a relatively large integrated-distance from nodes with higher density. The experimental results indicate that our approach is efficient and effective for community detection of complex networks.
NASA Astrophysics Data System (ADS)
Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.
2018-04-01
Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.
Schulz, Tizian; Stoye, Jens; Doerr, Daniel
2018-05-08
Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.
State estimation and prediction using clustered particle filters.
Lee, Yoonsang; Majda, Andrew J
2016-12-20
Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.
State estimation and prediction using clustered particle filters
Lee, Yoonsang; Majda, Andrew J.
2016-01-01
Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332
A hybrid algorithm for clustering of time series data based on affinity search technique.
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
Clustering gene expression data based on predicted differential effects of GV interaction.
Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu
2005-02-01
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bush, B.; Melaina, M.; Penev, M.
This report describes the development and analysis of detailed temporal and spatial scenarios for early market hydrogen fueling infrastructure clustering and fuel cell electric vehicle rollout using the Scenario Evaluation, Regionalization and Analysis (SERA) model. The report provides an overview of the SERA scenario development framework and discusses the approach used to develop the nationwidescenario.
Machine learning approaches for estimation of prediction interval for the model output.
Shrestha, Durga L; Solomatine, Dimitri P
2006-03-01
A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.
NASA Astrophysics Data System (ADS)
Hassan, Kazi; Allen, Deonie; Haynes, Heather
2016-04-01
This paper considers 1D hydraulic model data on the effect of high flow clusters and sequencing on sediment transport. Using observed flow gauge data from the River Caldew, England, a novel stochastic modelling approach was developed in order to create alternative 50 year flow sequences. Whilst the observed probability density of gauge data was preserved in all sequences, the order in which those flows occurred was varied using the output from a Hidden Markov Model (HMM) with generalised Pareto distribution (GP). In total, one hundred 50 year synthetic flow series were generated and used as the inflow boundary conditions for individual flow series model runs using the 1D sediment transport model HEC-RAS. The model routed graded sediment through the case study river reach to define the long-term morphological changes. Comparison of individual simulations provided a detailed understanding of the sensitivity of channel capacity to flow sequence. Specifically, each 50 year synthetic flow sequence was analysed using a 3-month, 6-month or 12-month rolling window approach and classified for clusters in peak discharge. As a cluster is described as a temporal grouping of flow events above a specified threshold, the threshold condition used herein is considered as a morphologically active channel forming discharge event. Thus, clusters were identified for peak discharges in excess of 10%, 20%, 50%, 100% and 150% of the 1 year Return Period (RP) event. The window of above-peak flows also required cluster definition and was tested for timeframes 1, 2, 10 and 30 days. Subsequently, clusters could be described in terms of the number of events, maximum peak flow discharge, cumulative flow discharge and skewness (i.e. a description of the flow sequence). The model output for each cluster was analysed for the cumulative flow volume and cumulative sediment transport (mass). This was then compared to the total sediment transport of a single flow event of equivalent flow volume. Results illustrate that clustered flood events generated sediment loads up to an order of magnitude greater than that of individual events of the same flood volume. Correlations were significant for sediment volume compared to both maximum flow discharge (R2<0.8) and number of events (R2 -0.5 to -0.7) within the cluster. The strongest correlations occurred for clusters with a greater number of flow events only slightly above-threshold. This illustrates that the numerical model can capture a degree of the non-linear morphological response to flow magnitude. Analysis of the relationship between morphological change and the skewness of flow events within each cluster was also determined, illustrating only minor sensitivity to cluster peak distribution skewness. This is surprising and discussion is presented on model limitations, including the capability of sediment transport formulae to effectively account for temporal processes of antecedent flow, hysteresis, local supply etc.
Atlas-guided cluster analysis of large tractography datasets.
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.
Study of cluster behavior in the riser of CFB by the DSMC method
NASA Astrophysics Data System (ADS)
Liu, H. P.; Liu, D. Y.; Liu, H.
2010-03-01
The flow behaviors of clusters in the riser of a two-dimensional (2D) circulating fluidized bed was numerically studied based on the Euler-Lagrangian approach. Gas turbulence was modeled by means of Large Eddy Simulation (LES). Particle collision was modeled by means of the direct simulation Monte Carlo (DSMC) method. Clusters' hydrodynamic characteristics are obtained using a cluster identification method proposed by sharrma et al. (2000). The descending clusters near the wall region and the up- and down-flowing clusters in the core were studied separately due to their different flow behaviors. The effects of superficial gas velocity on the cluster behavior were analyzed. Simulated results showed that near wall clusters flow downward and the descent velocity is about -45 cm/s. The occurrence frequency of the up-flowing cluster is higher than that of down-flowing cluster in the core of riser. With the increase of superficial gas velocity, the solid concentration and occurrence frequency of clusters decrease, while the cluster axial velocity increase. Simulated results were in agreement with experimental data. The stochastic method used in present paper is feasible for predicting the cluster flow behavior in CFBs.
Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B
2017-04-01
Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.
NASA Astrophysics Data System (ADS)
Dalkilic, Turkan Erbay; Apaydin, Aysen
2009-11-01
In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.
Neck formation and deformation effects in a preformed cluster model of exotic cluster decays
NASA Astrophysics Data System (ADS)
Kumar, Satish; Gupta, Raj K.
1997-01-01
Using the nuclear proximity approach and the two center nuclear shape parametrization, the interaction potential between two deformed and pole-to-pole oriented nuclei forming a necked configuration in the overlap region is calculated and its role is studied for the cluster decay half-lives. The barrier is found to move to a larger relative separation, with its proximity minimum lying in the neighborhood of the Q value of decay and its height and width reduced considerably. For cluster decay calculations in the preformed cluster model of Malik and Gupta, due to deformations and orientations of nuclei, the (empirical) preformation factor is found to get reduced considerably and agrees nicely with other model calculations known to be successful for their predictions of cluster decay half-lives. Comparison with the earlier case of nuclei treated as spheres suggests that the effects of both deformations and neck formation get compensated by choosing the position of cluster preformation and the inner classical turning point for penetrability calculations at the touching configuration of spherical nuclei.
Structure of the starch granule--a curved crystal.
Larsson, K
1991-09-01
A structure model of the molecular arrangement in native starch proposed earlier is further considered, with special regard to the lateral packing of cluster units. The amylopectin molecules are radially distributed, with branches concentrated in clusters. Within each cluster the polyglucan chains form double helices which are hexagonally packed. The clusters form spherically concentric crystalline layers with amylose in an amorphous form acting as a space-filler. A translational mechanism for the change of helical direction at boundaries between clusters is proposed which can account for variations in the curvature of the concentric layers. The model is related to X-ray diffraction data and optical birefringence, considering dissembly at gelatinization. The structure is also discussed in relation to biosynthesis. Some aspects of gelatinization, such as the recent glass-transition approach, are then considered.
Optimizing the maximum reported cluster size in the spatial scan statistic for ordinal data.
Kim, Sehwi; Jung, Inkyung
2017-01-01
The spatial scan statistic is an important tool for spatial cluster detection. There have been numerous studies on scanning window shapes. However, little research has been done on the maximum scanning window size or maximum reported cluster size. Recently, Han et al. proposed to use the Gini coefficient to optimize the maximum reported cluster size. However, the method has been developed and evaluated only for the Poisson model. We adopt the Gini coefficient to be applicable to the spatial scan statistic for ordinal data to determine the optimal maximum reported cluster size. Through a simulation study and application to a real data example, we evaluate the performance of the proposed approach. With some sophisticated modification, the Gini coefficient can be effectively employed for the ordinal model. The Gini coefficient most often picked the optimal maximum reported cluster sizes that were the same as or smaller than the true cluster sizes with very high accuracy. It seems that we can obtain a more refined collection of clusters by using the Gini coefficient. The Gini coefficient developed specifically for the ordinal model can be useful for optimizing the maximum reported cluster size for ordinal data and helpful for properly and informatively discovering cluster patterns.
Optimizing the maximum reported cluster size in the spatial scan statistic for ordinal data
Kim, Sehwi
2017-01-01
The spatial scan statistic is an important tool for spatial cluster detection. There have been numerous studies on scanning window shapes. However, little research has been done on the maximum scanning window size or maximum reported cluster size. Recently, Han et al. proposed to use the Gini coefficient to optimize the maximum reported cluster size. However, the method has been developed and evaluated only for the Poisson model. We adopt the Gini coefficient to be applicable to the spatial scan statistic for ordinal data to determine the optimal maximum reported cluster size. Through a simulation study and application to a real data example, we evaluate the performance of the proposed approach. With some sophisticated modification, the Gini coefficient can be effectively employed for the ordinal model. The Gini coefficient most often picked the optimal maximum reported cluster sizes that were the same as or smaller than the true cluster sizes with very high accuracy. It seems that we can obtain a more refined collection of clusters by using the Gini coefficient. The Gini coefficient developed specifically for the ordinal model can be useful for optimizing the maximum reported cluster size for ordinal data and helpful for properly and informatively discovering cluster patterns. PMID:28753674
Prediction models for clustered data: comparison of a random intercept and standard regression model
2013-01-01
Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436
Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne
2013-02-15
When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.
Light clusters and pasta phases in warm and dense nuclear matter
NASA Astrophysics Data System (ADS)
Avancini, Sidney S.; Ferreira, Márcio; Pais, Helena; Providência, Constança; Röpke, Gerd
2017-04-01
The pasta phases are calculated for warm stellar matter in a framework of relativistic mean-field models, including the possibility of light cluster formation. Results from three different semiclassical approaches are compared with a quantum statistical calculation. Light clusters are considered as point-like particles, and their abundances are determined from the minimization of the free energy. The couplings of the light clusters to mesons are determined from experimental chemical equilibrium constants and many-body quantum statistical calculations. The effect of these light clusters on the chemical potentials is also discussed. It is shown that, by including heavy clusters, light clusters are present up to larger nucleonic densities, although with smaller mass fractions.
Bayesian network meta-analysis for cluster randomized trials with binary outcomes.
Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard
2017-06-01
Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.
2018-07-01
Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
CASP10-BCL::Fold efficiently samples topologies of large proteins.
Heinze, Sten; Putnam, Daniel K; Fischer, Axel W; Kohlmann, Tim; Weiner, Brian E; Meiler, Jens
2015-03-01
During CASP10 in summer 2012, we tested BCL::Fold for prediction of free modeling (FM) and template-based modeling (TBM) targets. BCL::Fold assembles the tertiary structure of a protein from predicted secondary structure elements (SSEs) omitting more flexible loop regions early on. This approach enables the sampling of conformational space for larger proteins with more complex topologies. In preparation of CASP11, we analyzed the quality of CASP10 models throughout the prediction pipeline to understand BCL::Fold's ability to sample the native topology, identify native-like models by scoring and/or clustering approaches, and our ability to add loop regions and side chains to initial SSE-only models. The standout observation is that BCL::Fold sampled topologies with a GDT_TS score > 33% for 12 of 18 and with a topology score > 0.8 for 11 of 18 test cases de novo. Despite the sampling success of BCL::Fold, significant challenges still exist in clustering and loop generation stages of the pipeline. The clustering approach employed for model selection often failed to identify the most native-like assembly of SSEs for further refinement and submission. It was also observed that for some β-strand proteins model refinement failed as β-strands were not properly aligned to form hydrogen bonds removing otherwise accurate models from the pool. Further, BCL::Fold samples frequently non-natural topologies that require loop regions to pass through the center of the protein. © 2015 Wiley Periodicals, Inc.
Principal Component Clustering Approach to Teaching Quality Discriminant Analysis
ERIC Educational Resources Information Center
Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan
2016-01-01
Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…
Aftershock identification problem via the nearest-neighbor analysis for marked point processes
NASA Astrophysics Data System (ADS)
Gabrielov, A.; Zaliapin, I.; Wong, H.; Keilis-Borok, V.
2007-12-01
The centennial observations on the world seismicity have revealed a wide variety of clustering phenomena that unfold in the space-time-energy domain and provide most reliable information about the earthquake dynamics. However, there is neither a unifying theory nor a convenient statistical apparatus that would naturally account for the different types of seismic clustering. In this talk we present a theoretical framework for nearest-neighbor analysis of marked processes and obtain new results on hierarchical approach to studying seismic clustering introduced by Baiesi and Paczuski (2004). Recall that under this approach one defines an asymmetric distance D in space-time-energy domain such that the nearest-neighbor spanning graph with respect to D becomes a time- oriented tree. We demonstrate how this approach can be used to detect earthquake clustering. We apply our analysis to the observed seismicity of California and synthetic catalogs from ETAS model and show that the earthquake clustering part is statistically different from the homogeneous part. This finding may serve as a basis for an objective aftershock identification procedure.
Characterization and analysis of a transcriptome from the boreal spider crab Hyas araneus.
Harms, Lars; Frickenhaus, Stephan; Schiffer, Melanie; Mark, Felix C; Storch, Daniela; Pörtner, Hans-Otto; Held, Christoph; Lucassen, Magnus
2013-12-01
Research investigating the genetic basis of physiological responses has significantly broadened our understanding of the mechanisms underlying organismic response to environmental change. However, genomic data are currently available for few taxa only, thus excluding physiological model species from this approach. In this study we report the transcriptome of the model organism Hyas araneus from Spitsbergen (Arctic). We generated 20,479 transcripts, using the 454 GS FLX sequencing technology in combination with an Illumina HiSeq sequencing approach. Annotation by Blastx revealed 7159 blast hits in the NCBI non-redundant protein database. The comparison between the spider crab H. araneus transcriptome and EST libraries of the European lobster Homarus americanus and the porcelain crab Petrolisthes cinctipes yielded 3229/2581 sequences with a significant hit, respectively. The clustering by the Markov Clustering Algorithm (MCL) revealed a common core of 1710 clusters present in all three species and 5903 unique clusters for H. araneus. The combined sequencing approaches generated transcripts that will greatly expand the limited genomic data available for crustaceans. We introduce the MCL clustering for transcriptome comparisons as a simple approach to estimate similarities between transcriptomic libraries of different size and quality and to analyze homologies within the selected group of species. In particular, we identified a large variety of reverse transcriptase (RT) sequences not only in the H. araneus transcriptome and other decapod crustaceans, but also sea urchin, supporting the hypothesis of a heritable, anti-viral immunity and the proposed viral fragment integration by host-derived RTs in marine invertebrates. © 2013.
Sani-Kast, Nicole; Scheringer, Martin; Slomberg, Danielle; Labille, Jérôme; Praetorius, Antonia; Ollivier, Patrick; Hungerbühler, Konrad
2015-12-01
Engineered nanoparticle (ENP) fate models developed to date - aimed at predicting ENP concentration in the aqueous environment - have limited applicability because they employ constant environmental conditions along the modeled system or a highly specific environmental representation; both approaches do not show the effects of spatial and/or temporal variability. To address this conceptual gap, we developed a novel modeling strategy that: 1) incorporates spatial variability in environmental conditions in an existing ENP fate model; and 2) analyzes the effect of a wide range of randomly sampled environmental conditions (representing variations in water chemistry). This approach was employed to investigate the transport of nano-TiO2 in the Lower Rhône River (France) under numerous sets of environmental conditions. The predicted spatial concentration profiles of nano-TiO2 were then grouped according to their similarity by using cluster analysis. The analysis resulted in a small number of clusters representing groups of spatial concentration profiles. All clusters show nano-TiO2 accumulation in the sediment layer, supporting results from previous studies. Analysis of the characteristic features of each cluster demonstrated a strong association between the water conditions in regions close to the ENP emission source and the cluster membership of the corresponding spatial concentration profiles. In particular, water compositions favoring heteroaggregation between the ENPs and suspended particulate matter resulted in clusters of low variability. These conditions are, therefore, reliable predictors of the eventual fate of the modeled ENPs. The conclusions from this study are also valid for ENP fate in other large river systems. Our results, therefore, shift the focus of future modeling and experimental research of ENP environmental fate to the water characteristic in regions near the expected ENP emission sources. Under conditions favoring heteroaggregation in these regions, the fate of the ENPs can be readily predicted. Copyright © 2014 Elsevier B.V. All rights reserved.
Application of the AMPLE cluster-and-truncate approach to NMR structures for molecular replacement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bibby, Jaclyn; Keegan, Ronan M.; Mayans, Olga
2013-11-01
Processing of NMR structures for molecular replacement by AMPLE works well. AMPLE is a program developed for clustering and truncating ab initio protein structure predictions into search models for molecular replacement. Here, it is shown that its core cluster-and-truncate methods also work well for processing NMR ensembles into search models. Rosetta remodelling helps to extend success to NMR structures bearing low sequence identity or high structural divergence from the target protein. Potential future routes to improved performance are considered and practical, general guidelines on using AMPLE are provided.
Clustering Multivariate Time Series Using Hidden Markov Models
Ghassempour, Shima; Girosi, Federico; Maeder, Anthony
2014-01-01
In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers. PMID:24662996
Observing Stellar Clusters in the Computer
NASA Astrophysics Data System (ADS)
Borch, A.; Spurzem, R.; Hurley, J.
2006-08-01
We present a new approach to combine direct N-body simulations to stellar population synthesis modeling in order to model the dynamical evolution and color evolution of globular clusters at the same time. This allows us to model the spectrum, colors and luminosities of each star in the simulated cluster. For this purpose the NBODY6++ code (Spurzem 1999) is used, which is a parallel version of the NBODY code. J. Hurley implemented simple recipes to follow the changes of stellar masses, radii, and luminosities due to stellar evolution into the NBODY6++ code (Hurley et al. 2001), in the sense that each simulation particle represents one star. These prescriptions cover all evolutionary phases and solar to globular cluster metallicities. We used the stellar parameters obtained by this stellar evolution routine and coupled them to the stellar library BaSeL 2.0 (Lejeune et al. 1997). As a first application we investigated the integrated broad band colors of simulated clusters. We modeled tidally disrupted globular clusters and compared the results with isolated globular clusters. Due to energy equipartition we expected a relative blueing of tidally disrupted clusters, because of the higher escape probability of red, low-mass stars. This behaviour we actually observe for concentrated globular clusters. The mass-to-light ratio of isolated clusters follows exactly a color-M/L correlation, similar as described in Bell and de Jong (2001) in the case of spiral galaxies. At variance to this correlation, in tidally disrupted clusters the M/L ratio becomes significantly lower at the time of cluster dissolution. Hence, for isolated clusters the behavior of the stellar population is not influenced by dynamical evolution, whereas the stellar population of tidally disrupted clusters is strongly influenced by dynamical effects.
Kwong, C. K.; Fung, K. Y.; Jiang, Huimin; Chan, K. Y.
2013-01-01
Affective design is an important aspect of product development to achieve a competitive edge in the marketplace. A neural-fuzzy network approach has been attempted recently to model customer satisfaction for affective design and it has been proved to be an effective one to deal with the fuzziness and non-linearity of the modeling as well as generate explicit customer satisfaction models. However, such an approach to modeling customer satisfaction has two limitations. First, it is not suitable for the modeling problems which involve a large number of inputs. Second, it cannot adapt to new data sets, given that its structure is fixed once it has been developed. In this paper, a modified dynamic evolving neural-fuzzy approach is proposed to address the above mentioned limitations. A case study on the affective design of mobile phones was conducted to illustrate the effectiveness of the proposed methodology. Validation tests were conducted and the test results indicated that: (1) the conventional Adaptive Neuro-Fuzzy Inference System (ANFIS) failed to run due to a large number of inputs; (2) the proposed dynamic neural-fuzzy model outperforms the subtractive clustering-based ANFIS model and fuzzy c-means clustering-based ANFIS model in terms of their modeling accuracy and computational effort. PMID:24385884
Kwong, C K; Fung, K Y; Jiang, Huimin; Chan, K Y; Siu, Kin Wai Michael
2013-01-01
Affective design is an important aspect of product development to achieve a competitive edge in the marketplace. A neural-fuzzy network approach has been attempted recently to model customer satisfaction for affective design and it has been proved to be an effective one to deal with the fuzziness and non-linearity of the modeling as well as generate explicit customer satisfaction models. However, such an approach to modeling customer satisfaction has two limitations. First, it is not suitable for the modeling problems which involve a large number of inputs. Second, it cannot adapt to new data sets, given that its structure is fixed once it has been developed. In this paper, a modified dynamic evolving neural-fuzzy approach is proposed to address the above mentioned limitations. A case study on the affective design of mobile phones was conducted to illustrate the effectiveness of the proposed methodology. Validation tests were conducted and the test results indicated that: (1) the conventional Adaptive Neuro-Fuzzy Inference System (ANFIS) failed to run due to a large number of inputs; (2) the proposed dynamic neural-fuzzy model outperforms the subtractive clustering-based ANFIS model and fuzzy c-means clustering-based ANFIS model in terms of their modeling accuracy and computational effort.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110
Di Costanzo, Ezio; Giacomello, Alessandro; Messina, Elisa; Natalini, Roberto; Pontrelli, Giuseppe; Rossi, Fabrizio; Smits, Robert; Twarogowska, Monika
2018-03-14
We propose a discrete in continuous mathematical model describing the in vitro growth process of biophsy-derived mammalian cardiac progenitor cells growing as clusters in the form of spheres (Cardiospheres). The approach is hybrid: discrete at cellular scale and continuous at molecular level. In the present model, cells are subject to the self-organizing collective dynamics mechanism and, additionally, they can proliferate and differentiate, also depending on stochastic processes. The two latter processes are triggered and regulated by chemical signals present in the environment. Numerical simulations show the structure and the development of the clustered progenitors and are in a good agreement with the results obtained from in vitro experiments.
Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M
2018-06-01
Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less
Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.
Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si
2017-07-01
Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.
MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks.
Keel, Brittney N; Deng, Bo; Moriyama, Etsuko N
2018-04-15
Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. emoriyama2@unl.edu. Supplementary data are available at Bioinformatics online.
Atlas-Guided Cluster Analysis of Large Tractography Datasets
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
NASA Astrophysics Data System (ADS)
Shah, Shishir
This paper presents a segmentation method for detecting cells in immunohistochemically stained cytological images. A two-phase approach to segmentation is used where an unsupervised clustering approach coupled with cluster merging based on a fitness function is used as the first phase to obtain a first approximation of the cell locations. A joint segmentation-classification approach incorporating ellipse as a shape model is used as the second phase to detect the final cell contour. The segmentation model estimates a multivariate density function of low-level image features from training samples and uses it as a measure of how likely each image pixel is to be a cell. This estimate is constrained by the zero level set, which is obtained as a solution to an implicit representation of an ellipse. Results of segmentation are presented and compared to ground truth measurements.
NASA Astrophysics Data System (ADS)
Dreano, Denis; Tsiaras, Kostas; Triantafyllou, George; Hoteit, Ibrahim
2017-07-01
Forecasting the state of large marine ecosystems is important for many economic and public health applications. However, advanced three-dimensional (3D) ecosystem models, such as the European Regional Seas Ecosystem Model (ERSEM), are computationally expensive, especially when implemented within an ensemble data assimilation system requiring several parallel integrations. As an alternative to 3D ecological forecasting systems, we propose to implement a set of regional one-dimensional (1D) water-column ecological models that run at a fraction of the computational cost. The 1D model domains are determined using a Gaussian mixture model (GMM)-based clustering method and satellite chlorophyll-a (Chl-a) data. Regionally averaged Chl-a data is assimilated into the 1D models using the singular evolutive interpolated Kalman (SEIK) filter. To laterally exchange information between subregions and improve the forecasting skills, we introduce a new correction step to the assimilation scheme, in which we assimilate a statistical forecast of future Chl-a observations based on information from neighbouring regions. We apply this approach to the Red Sea and show that the assimilative 1D ecological models can forecast surface Chl-a concentration with high accuracy. The statistical assimilation step further improves the forecasting skill by as much as 50%. This general approach of clustering large marine areas and running several interacting 1D ecological models is very flexible. It allows many combinations of clustering, filtering and regression technics to be used and can be applied to build efficient forecasting systems in other large marine ecosystems.
Mean-cluster approach indicates cell sorting time scales are determined by collective dynamics
NASA Astrophysics Data System (ADS)
Beatrici, Carine P.; de Almeida, Rita M. C.; Brunnet, Leonardo G.
2017-03-01
Cell migration is essential to cell segregation, playing a central role in tissue formation, wound healing, and tumor evolution. Considering random mixtures of two cell types, it is still not clear which cell characteristics define clustering time scales. The mass of diffusing clusters merging with one another is expected to grow as td /d +2 when the diffusion constant scales with the inverse of the cluster mass. Cell segregation experiments deviate from that behavior. Explanations for that could arise from specific microscopic mechanisms or from collective effects, typical of active matter. Here we consider a power law connecting diffusion constant and cluster mass to propose an analytic approach to model cell segregation where we explicitly take into account finite-size corrections. The results are compared with active matter model simulations and experiments available in the literature. To investigate the role played by different mechanisms we considered different hypotheses describing cell-cell interaction: differential adhesion hypothesis and different velocities hypothesis. We find that the simulations yield normal diffusion for long time intervals. Analytic and simulation results show that (i) cluster evolution clearly tends to a scaling regime, disrupted only at finite-size limits; (ii) cluster diffusion is greatly enhanced by cell collective behavior, such that for high enough tendency to follow the neighbors, cluster diffusion may become independent of cluster size; (iii) the scaling exponent for cluster growth depends only on the mass-diffusion relation, not on the detailed local segregation mechanism. These results apply for active matter systems in general and, in particular, the mechanisms found underlying the increase in cell sorting speed certainly have deep implications in biological evolution as a selection mechanism.
Terminal-Area Aircraft Intent Inference Approach Based on Online Trajectory Clustering.
Yang, Yang; Zhang, Jun; Cai, Kai-quan
2015-01-01
Terminal-area aircraft intent inference (T-AII) is a prerequisite to detect and avoid potential aircraft conflict in the terminal airspace. T-AII challenges the state-of-the-art AII approaches due to the uncertainties of air traffic situation, in particular due to the undefined flight routes and frequent maneuvers. In this paper, a novel T-AII approach is introduced to address the limitations by solving the problem with two steps that are intent modeling and intent inference. In the modeling step, an online trajectory clustering procedure is designed for recognizing the real-time available routes in replacing of the missed plan routes. In the inference step, we then present a probabilistic T-AII approach based on the multiple flight attributes to improve the inference performance in maneuvering scenarios. The proposed approach is validated with real radar trajectory and flight attributes data of 34 days collected from Chengdu terminal area in China. Preliminary results show the efficacy of the presented approach.
On selecting a prior for the precision parameter of Dirichlet process mixture models
Dorazio, R.M.
2009-01-01
In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter ?? and a base probability measure G0. In problems where ?? is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for ??. In this paper an approach is developed for computing a prior for the precision parameter ?? that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.
Modularization of biochemical networks based on classification of Petri net t-invariants.
Grafahrend-Belau, Eva; Schreiber, Falk; Heiner, Monika; Sackmann, Andrea; Junker, Björn H; Grunwald, Stefanie; Speer, Astrid; Winder, Katja; Koch, Ina
2008-02-08
Structural analysis of biochemical networks is a growing field in bioinformatics and systems biology. The availability of an increasing amount of biological data from molecular biological networks promises a deeper understanding but confronts researchers with the problem of combinatorial explosion. The amount of qualitative network data is growing much faster than the amount of quantitative data, such as enzyme kinetics. In many cases it is even impossible to measure quantitative data because of limitations of experimental methods, or for ethical reasons. Thus, a huge amount of qualitative data, such as interaction data, is available, but it was not sufficiently used for modeling purposes, until now. New approaches have been developed, but the complexity of data often limits the application of many of the methods. Biochemical Petri nets make it possible to explore static and dynamic qualitative system properties. One Petri net approach is model validation based on the computation of the system's invariant properties, focusing on t-invariants. T-invariants correspond to subnetworks, which describe the basic system behavior.With increasing system complexity, the basic behavior can only be expressed by a huge number of t-invariants. According to our validation criteria for biochemical Petri nets, the necessary verification of the biological meaning, by interpreting each subnetwork (t-invariant) manually, is not possible anymore. Thus, an automated, biologically meaningful classification would be helpful in analyzing t-invariants, and supporting the understanding of the basic behavior of the considered biological system. Here, we introduce a new approach to automatically classify t-invariants to cope with network complexity. We apply clustering techniques such as UPGMA, Complete Linkage, Single Linkage, and Neighbor Joining in combination with different distance measures to get biologically meaningful clusters (t-clusters), which can be interpreted as modules. To find the optimal number of t-clusters to consider for interpretation, the cluster validity measure, Silhouette Width, is applied. We considered two different case studies as examples: a small signal transduction pathway (pheromone response pathway in Saccharomyces cerevisiae) and a medium-sized gene regulatory network (gene regulation of Duchenne muscular dystrophy). We automatically classified the t-invariants into functionally distinct t-clusters, which could be interpreted biologically as functional modules in the network. We found differences in the suitability of the various distance measures as well as the clustering methods. In terms of a biologically meaningful classification of t-invariants, the best results are obtained using the Tanimoto distance measure. Considering clustering methods, the obtained results suggest that UPGMA and Complete Linkage are suitable for clustering t-invariants with respect to the biological interpretability. We propose a new approach for the biological classification of Petri net t-invariants based on cluster analysis. Due to the biologically meaningful data reduction and structuring of network processes, large sets of t-invariants can be evaluated, allowing for model validation of qualitative biochemical Petri nets. This approach can also be applied to elementary mode analysis.
Modularization of biochemical networks based on classification of Petri net t-invariants
Grafahrend-Belau, Eva; Schreiber, Falk; Heiner, Monika; Sackmann, Andrea; Junker, Björn H; Grunwald, Stefanie; Speer, Astrid; Winder, Katja; Koch, Ina
2008-01-01
Background Structural analysis of biochemical networks is a growing field in bioinformatics and systems biology. The availability of an increasing amount of biological data from molecular biological networks promises a deeper understanding but confronts researchers with the problem of combinatorial explosion. The amount of qualitative network data is growing much faster than the amount of quantitative data, such as enzyme kinetics. In many cases it is even impossible to measure quantitative data because of limitations of experimental methods, or for ethical reasons. Thus, a huge amount of qualitative data, such as interaction data, is available, but it was not sufficiently used for modeling purposes, until now. New approaches have been developed, but the complexity of data often limits the application of many of the methods. Biochemical Petri nets make it possible to explore static and dynamic qualitative system properties. One Petri net approach is model validation based on the computation of the system's invariant properties, focusing on t-invariants. T-invariants correspond to subnetworks, which describe the basic system behavior. With increasing system complexity, the basic behavior can only be expressed by a huge number of t-invariants. According to our validation criteria for biochemical Petri nets, the necessary verification of the biological meaning, by interpreting each subnetwork (t-invariant) manually, is not possible anymore. Thus, an automated, biologically meaningful classification would be helpful in analyzing t-invariants, and supporting the understanding of the basic behavior of the considered biological system. Methods Here, we introduce a new approach to automatically classify t-invariants to cope with network complexity. We apply clustering techniques such as UPGMA, Complete Linkage, Single Linkage, and Neighbor Joining in combination with different distance measures to get biologically meaningful clusters (t-clusters), which can be interpreted as modules. To find the optimal number of t-clusters to consider for interpretation, the cluster validity measure, Silhouette Width, is applied. Results We considered two different case studies as examples: a small signal transduction pathway (pheromone response pathway in Saccharomyces cerevisiae) and a medium-sized gene regulatory network (gene regulation of Duchenne muscular dystrophy). We automatically classified the t-invariants into functionally distinct t-clusters, which could be interpreted biologically as functional modules in the network. We found differences in the suitability of the various distance measures as well as the clustering methods. In terms of a biologically meaningful classification of t-invariants, the best results are obtained using the Tanimoto distance measure. Considering clustering methods, the obtained results suggest that UPGMA and Complete Linkage are suitable for clustering t-invariants with respect to the biological interpretability. Conclusion We propose a new approach for the biological classification of Petri net t-invariants based on cluster analysis. Due to the biologically meaningful data reduction and structuring of network processes, large sets of t-invariants can be evaluated, allowing for model validation of qualitative biochemical Petri nets. This approach can also be applied to elementary mode analysis. PMID:18257938
Calculation of the wetting parameter from a cluster model in the framework of nanothermodynamics.
García-Morales, V; Cervera, J; Pellicer, J
2003-06-01
The critical wetting parameter omega(c) determines the strength of interfacial fluctuations in critical wetting transitions. In this Brief Report, we calculate omega(c) from considerations on critical liquid clusters inside a vapor phase. The starting point is a cluster model developed by Hill and Chamberlin in the framework of nanothermodynamics [Proc. Natl. Acad. Sci. USA 95, 12779 (1998)]. Our calculations yield results for omega(c) between 0.52 and 1.00, depending on the degrees of freedom considered. The findings are in agreement with previous experimental results and give an idea of the universal dynamical behavior of the clusters when approaching criticality. We suggest that this behavior is a combination of translation and vortex rotational motion (omega(c)=0.84).
Nova-driven winds in globular clusters
NASA Technical Reports Server (NTRS)
Scott, E. H.; Durisen, R. H.
1978-01-01
Recent sensitive searches for H-alpha emission from ionized intracluster gas in globular clusters have set upper limits that conflict with theoretical predictions. It is suggested that nova outbursts heat the gas, producing winds that resolve this discrepancy. The incidence of novae in globular clusters, the conversion of kinetic energy of the nova shell to thermal energy of the intracluster gas, and the characteristics of the resultant winds are discussed. Calculated emission from the nova-driven models does not conflict with any observations to date. Some suggestions are made concerning the most promising approaches for future detection of intracluster gas on the basis of these models. The possible relationship of nova-driven winds to globular cluster X-ray sources is also considered.
Spatiotemporal modeling of node temperatures in supercomputers
Storlie, Curtis Byron; Reich, Brian James; Rust, William Newton; ...
2016-06-10
Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to investigate the effect that changes to the cooling system in a machine room had on three large machines that were housed there. Coupled with this goal was the aim to develop a general good-practice for characterizing the effect of cooling changes and monitoring machine node temperatures in this andmore » other machine rooms. This paper focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1,600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized Pareto distribution for the upper tail is used to model the marginal distribution, along with a Gaussian process copula to account for spatio-temporal dependence. A Gaussian Markov random field (GMRF) model is used to model the spatial effects on the node temperatures as the cooling changes take place. This model is then used to assess the condition of the node temperatures after each change to the room. The analysis approach was used to uncover the cause of a problematic episode of overheating nodes on one of the supercomputing clusters. Lastly, this same approach can easily be applied to monitor and investigate cooling systems at other data centers, as well.« less
Multilevel joint competing risk models
NASA Astrophysics Data System (ADS)
Karunarathna, G. H. S.; Sooriyarachchi, M. R.
2017-09-01
Joint modeling approaches are often encountered for different outcomes of competing risk time to event and count in many biomedical and epidemiology studies in the presence of cluster effect. Hospital length of stay (LOS) has been the widely used outcome measure in hospital utilization due to the benchmark measurement for measuring multiple terminations such as discharge, transferred, dead and patients who have not completed the event of interest at the follow up period (censored) during hospitalizations. Competing risk models provide a method of addressing such multiple destinations since classical time to event models yield biased results when there are multiple events. In this study, the concept of joint modeling has been applied to the dengue epidemiology in Sri Lanka, 2006-2008 to assess the relationship between different outcomes of LOS and platelet count of dengue patients with the district cluster effect. Two key approaches have been applied to build up the joint scenario. In the first approach, modeling each competing risk separately using the binary logistic model, treating all other events as censored under the multilevel discrete time to event model, while the platelet counts are assumed to follow a lognormal regression model. The second approach is based on the endogeneity effect in the multilevel competing risks and count model. Model parameters were estimated using maximum likelihood based on the Laplace approximation. Moreover, the study reveals that joint modeling approach yield more precise results compared to fitting two separate univariate models, in terms of AIC (Akaike Information Criterion).
Evolution of the Mass and Luminosity Functions of Globular Star Clusters
NASA Astrophysics Data System (ADS)
Goudfrooij, Paul; Fall, S. Michael
2016-12-01
We reexamine the dynamical evolution of the mass and luminosity functions of globular star clusters (GCMF and GCLF). Fall & Zhang (2001, FZ01) showed that a power-law MF, as commonly seen among young cluster systems, would evolve by dynamical processes over a Hubble time into a peaked MF with a shape very similar to the observed GCMF in the Milky Way and other galaxies. To simplify the calculations, the semi-analytical FZ01 model adopted the “classical” theory of stellar escape from clusters, and neglected variations in the M/L ratios of clusters. Kruijssen & Portegies Zwart (2009, KPZ09) modified the FZ01 model to include “retarded” and mass-dependent stellar escape, the latter causing significant M/L variations. KPZ09 asserted that their model was compatible with observations, whereas the FZ01 model was not. We show here that this claim is not correct; the FZ01 and KPZ09 models fit the observed Galactic GCLF equally well. We also show that there is no detectable correlation between M/L and L for GCs in the Milky Way and Andromeda galaxies, in contradiction with the KPZ09 model. Our comparisons of the FZ01 and KPZ09 models with observations can be explained most simply if stars escape at rates approaching the classical limit for high-mass clusters, as expected on theoretical grounds.
Sauzet, Odile; Peacock, Janet L
2017-07-20
The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.
Grošelj, Petra; Zadnik Stirn, Lidija
2015-09-15
Environmental management problems can be dealt with by combining participatory methods, which make it possible to include various stakeholders in a decision-making process, and multi-criteria methods, which offer a formal model for structuring and solving a problem. This paper proposes a three-phase decision making approach based on the analytic network process and SWOT (strengths, weaknesses, opportunities and threats) analysis. The approach enables inclusion of various stakeholders or groups of stakeholders in particular stages of decision making. The structure of the proposed approach is composed of a network consisting of an objective cluster, a cluster of strategic goals, a cluster of SWOT factors and a cluster of alternatives. The application of the suggested approach is applied to a management problem of Pohorje, a mountainous area in Slovenia. Stakeholders from sectors that are important for Pohorje (forestry, agriculture, tourism and nature protection agencies) who can offer a wide range of expert knowledge were included in the decision-making process. The results identify the alternative of "sustainable development" as the most appropriate for development of Pohorje. The application in the paper offers an example of employing the new approach to an environmental management problem. This can also be applied to decision-making problems in various other fields. Copyright © 2015 Elsevier Ltd. All rights reserved.
Understanding Teacher Users of a Digital Library Service: A Clustering Approach
ERIC Educational Resources Information Center
Xu, Beijie
2011-01-01
This research examined teachers' online behaviors while using a digital library service--the Instructional Architect (IA)--through three consecutive studies. In the first two studies, a statistical model called latent class analysis (LCA) was applied to cluster different groups of IA teachers according to their diverse online behaviors. The third…
A clustering approach applied to time-lapse ERT interpretation - Case study of Lascaux cave
NASA Astrophysics Data System (ADS)
Xu, Shan; Sirieix, Colette; Riss, Joëlle; Malaurent, Philippe
2017-09-01
The Lascaux cave, located in southwest France, is one of the most important prehistoric cave in the world that shows Paleolithic paintings. This study aims to characterize the structure of the weathered epikarst setting located above the cave using Time-Lapse Electrical Resistivity Tomography (ERT) combined with local hydrogeological and climatic environmental data. Twenty ERT profiles were carried out for two years and helped us to record the seasonal and spatial variations of the electrical resistivity of the hydraulic upstream area of the Lascaux cave. The 20 interpreted resistivity models were merged into a single synthetic model using a multidimensional statistical method (Hierarchical Agglomerative Clustering). The individual blocks from the synthetic model associated with a similar resistivity variability were gathered into 7 clusters. We combined the resistivity temporal variations with climatic and hydrogeological data to propose a geo-electrical model that relates to a conceptual geological model. We provide a geological interpretation for each cluster regarding epikarst features. The superficial clusters (no 1 & 2) are linked to effective rainfall and trees, probably a fractured limestone. Another two clusters (no 6 & 7) are linked to detrital formations (sand and clay respectively). The cluster 3 may correspond to a marly limestone that forms a non-permeable horizon. Finally, the electrical behavior of the last two clusters (no 4 & 5) is correlated with the variation of flow rate; they may be a privileged feed zone of the flow in the cave.
Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text
Xin, Yu; Hochberg, Ephraim; Joshi, Rohit; Uzuner, Ozlem; Szolovits, Peter
2015-01-01
Objective Extracting medical knowledge from electronic medical records requires automated approaches to combat scalability limitations and selection biases. However, existing machine learning approaches are often regarded by clinicians as black boxes. Moreover, training data for these automated approaches at often sparsely annotated at best. The authors target unsupervised learning for modeling clinical narrative text, aiming at improving both accuracy and interpretability. Methods The authors introduce a novel framework named subgraph augmented non-negative tensor factorization (SANTF). In addition to relying on atomic features (e.g., words in clinical narrative text), SANTF automatically mines higher-order features (e.g., relations of lymphoid cells expressing antigens) from clinical narrative text by converting sentences into a graph representation and identifying important subgraphs. The authors compose a tensor using patients, higher-order features, and atomic features as its respective modes. We then apply non-negative tensor factorization to cluster patients, and simultaneously identify latent groups of higher-order features that link to patient clusters, as in clinical guidelines where a panel of immunophenotypic features and laboratory results are used to specify diagnostic criteria. Results and Conclusion SANTF demonstrated over 10% improvement in averaged F-measure on patient clustering compared to widely used non-negative matrix factorization (NMF) and k-means clustering methods. Multiple baselines were established by modeling patient data using patient-by-features matrices with different feature configurations and then performing NMF or k-means to cluster patients. Feature analysis identified latent groups of higher-order features that lead to medical insights. We also found that the latent groups of atomic features help to better correlate the latent groups of higher-order features. PMID:25862765
Data Mining Technologies Inspired from Visual Principle
NASA Astrophysics Data System (ADS)
Xu, Zongben
In this talk we review the recent work done by our group on data mining (DM) technologies deduced from simulating visual principle. Through viewing a DM problem as a cognition problems and treading a data set as an image with each light point located at a datum position, we developed a series of high efficient algorithms for clustering, classification and regression via mimicking visual principles. In pattern recognition, human eyes seem to possess a singular aptitude to group objects and find important structure in an efficient way. Thus, a DM algorithm simulating visual system may solve some basic problems in DM research. From this point of view, we proposed a new approach for data clustering by modeling the blurring effect of lateral retinal interconnections based on scale space theory. In this approach, as the data image blurs, smaller light blobs merge into large ones until the whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process then generates a family of clustering along the hierarchy. The proposed approach provides unique solutions to many long standing problems, such as the cluster validity and the sensitivity to initialization problems, in clustering. We extended such an approach to classification and regression problems, through combatively employing the Weber's law in physiology and the cell response classification facts. The resultant classification and regression algorithms are proven to be very efficient and solve the problems of model selection and applicability to huge size of data set in DM technologies. We finally applied the similar idea to the difficult parameter setting problem in support vector machine (SVM). Viewing the parameter setting problem as a recognition problem of choosing a visual scale at which the global and local structures of a data set can be preserved, and the difference between the two structures be maximized in the feature space, we derived a direct parameter setting formula for the Gaussian SVM. The simulations and applications show that the suggested formula significantly outperforms the known model selection methods in terms of efficiency and precision.
Solving the scalability issue in quantum-based refinement: Q|R#1.
Zheng, Min; Moriarty, Nigel W; Xu, Yanting; Reimers, Jeffrey R; Afonine, Pavel V; Waller, Mark P
2017-12-01
Accurately refining biomacromolecules using a quantum-chemical method is challenging because the cost of a quantum-chemical calculation scales approximately as n m , where n is the number of atoms and m (≥3) is based on the quantum method of choice. This fundamental problem means that quantum-chemical calculations become intractable when the size of the system requires more computational resources than are available. In the development of the software package called Q|R, this issue is referred to as Q|R#1. A divide-and-conquer approach has been developed that fragments the atomic model into small manageable pieces in order to solve Q|R#1. Firstly, the atomic model of a crystal structure is analyzed to detect noncovalent interactions between residues, and the results of the analysis are represented as an interaction graph. Secondly, a graph-clustering algorithm is used to partition the interaction graph into a set of clusters in such a way as to minimize disruption to the noncovalent interaction network. Thirdly, the environment surrounding each individual cluster is analyzed and any residue that is interacting with a particular cluster is assigned to the buffer region of that particular cluster. A fragment is defined as a cluster plus its buffer region. The gradients for all atoms from each of the fragments are computed, and only the gradients from each cluster are combined to create the total gradients. A quantum-based refinement is carried out using the total gradients as chemical restraints. In order to validate this interaction graph-based fragmentation approach in Q|R, the entire atomic model of an amyloid cross-β spine crystal structure (PDB entry 2oNA) was refined.
An integrated approach to reconstructing genome-scale transcriptional regulatory networks
Imam, Saheed; Noguera, Daniel R.; Donohue, Timothy J.; ...
2015-02-27
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making themmore » highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.« less
Qin, Lei; Snoussi, Hichem; Abdallah, Fahed
2014-01-01
We propose a novel approach for tracking an arbitrary object in video sequences for visual surveillance. The first contribution of this work is an automatic feature extraction method that is able to extract compact discriminative features from a feature pool before computing the region covariance descriptor. As the feature extraction method is adaptive to a specific object of interest, we refer to the region covariance descriptor computed using the extracted features as the adaptive covariance descriptor. The second contribution is to propose a weakly supervised method for updating the object appearance model during tracking. The method performs a mean-shift clustering procedure among the tracking result samples accumulated during a period of time and selects a group of reliable samples for updating the object appearance model. As such, the object appearance model is kept up-to-date and is prevented from contamination even in case of tracking mistakes. We conducted comparing experiments on real-world video sequences, which confirmed the effectiveness of the proposed approaches. The tracking system that integrates the adaptive covariance descriptor and the clustering-based model updating method accomplished stable object tracking on challenging video sequences. PMID:24865883
Lin, Shih-Yen; Liu, Chih-Wei
2014-01-01
This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741
Wu, Hsin-Hung; Lin, Shih-Yen; Liu, Chih-Wei
2014-01-01
This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs.
Gay, Emilie; Senoussi, Rachid; Barnouin, Jacques
2007-01-01
Methods for spatial cluster detection dealing with diseases quantified by continuous variables are few, whereas several diseases are better approached by continuous indicators. For example, subclinical mastitis of the dairy cow is evaluated using a continuous marker of udder inflammation, the somatic cell score (SCS). Consequently, this study proposed to analyze spatialized risk and cluster components of herd SCS through a new method based on a spatial hazard model. The dataset included annual SCS for 34 142 French dairy herds for the year 2000, and important SCS risk factors: mean parity, percentage of winter and spring calvings, and herd size. The model allowed the simultaneous estimation of the effects of known risk factors and of potential spatial clusters on SCS, and the mapping of the estimated clusters and their range. Mean parity and winter and spring calvings were significantly associated with subclinical mastitis risk. The model with the presence of 3 clusters was highly significant, and the 3 clusters were attractive, i.e. closeness to cluster center increased the occurrence of high SCS. The three localizations were the following: close to the city of Troyes in the northeast of France; around the city of Limoges in the center-west; and in the southwest close to the city of Tarbes. The semi-parametric method based on spatial hazard modeling applies to continuous variables, and takes account of both risk factors and potential heterogeneity of the background population. This tool allows a quantitative detection but assumes a spatially specified form for clusters.
Lotfi, Tamara; Bou-Karroum, Lama; Darzi, Andrea; Hajjar, Rayan; El Rahyel, Ahmed; El Eid, Jamale; Itani, Mira; Brax, Hneine; Akik, Chaza; Osman, Mona; Hassan, Ghayda; El-Jardali, Fadi; Akl, Elie
2016-08-03
Our objective was to identify published models of coordination between entities funding or delivering health services in humanitarian crises, whether the coordination took place during or after the crises. We included reports describing models of coordination in sufficient detail to allow reproducibility. We also included reports describing implementation of identified models, as case studies. We searched Medline, PubMed, EMBASE, Cochrane Central Register of Controlled Trials, CINAHL, PsycINFO, and the WHO Global Health Library. We also searched websites of relevant organizations. We followed standard systematic review methodology. Our search captured 14,309 citations. The screening process identified 34 eligible papers describing five models of coordination of delivering health services: the "Cluster Approach" (with 16 case studies), the 4Ws "Who is Where, When, doing What" mapping tool (with four case studies), the "Sphere Project" (with two case studies), the "5x5" model (with one case study), and the "model of information coordination" (with one case study). The 4Ws and the 5x5 focus on coordination of services for mental health, the remaining models do not focus on a specific health topic. The Cluster approach appears to be the most widely used. One case study was a mixed implementation of the Cluster approach and the Sphere model. We identified no model of coordination for funding of health service. This systematic review identified five proposed coordination models that have been implemented by entities funding or delivering health service in humanitarian crises. There is a need to compare the effect of these different models on outcomes such as availability of and access to health services.
Clustering and variable selection in the presence of mixed variable types and missing data.
Storlie, C B; Myers, S M; Katusic, S K; Weaver, A L; Voigt, R G; Croarkin, P E; Stoeckel, R E; Port, J D
2018-05-17
We consider the problem of model-based clustering in the presence of many correlated, mixed continuous, and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach, and the Dirichlet process is used to construct a mixture model with an unknown number of components. Variable selection is also performed to identify the variables that are most influential for determining cluster membership. The work is motivated by the need to cluster patients thought to potentially have autism spectrum disorder on the basis of many cognitive and/or behavioral test scores. There are a modest number of patients (486) in the data set along with many (55) test score variables (many of which are discrete valued and/or missing). The goal of the work is to (1) cluster these patients into similar groups to help identify those with similar clinical presentation and (2) identify a sparse subset of tests that inform the clusters in order to eliminate unnecessary testing. The proposed approach compares very favorably with other methods via simulation of problems of this type. The results of the autism spectrum disorder analysis suggested 3 clusters to be most likely, while only 4 test scores had high (>0.5) posterior probability of being informative. This will result in much more efficient and informative testing. The need to cluster observations on the basis of many correlated, continuous/discrete variables with missing values is a common problem in the health sciences as well as in many other disciplines. Copyright © 2018 John Wiley & Sons, Ltd.
Classical plasma dynamics of Mie-oscillations in atomic clusters
NASA Astrophysics Data System (ADS)
Kull, H.-J.; El-Khawaldeh, A.
2018-04-01
Mie plasmons are of basic importance for the absorption of laser light by atomic clusters. In this work we first review the classical Rayleigh-theory of a dielectric sphere in an external electric field and Thomson’s plum-pudding model applied to atomic clusters. Both approaches allow for elementary discussions of Mie oscillations, however, they also indicate deficiencies in describing the damping mechanisms by electrons crossing the cluster surface. Nonlinear oscillator models have been widely studied to gain an understanding of damping and absorption by outer ionization of the cluster. In the present work, we attempt to address the issue of plasmon relaxation in atomic clusters in more detail based on classical particle simulations. In particular, we wish to study the role of thermal motion on plasmon relaxation, thereby extending nonlinear models of collective single-electron motion. Our simulations are particularly adopted to the regime of classical kinetics in weakly coupled plasmas and to cluster sizes extending the Debye-screening length. It will be illustrated how surface scattering leads to the relaxation of Mie oscillations in the presence of thermal motion and of electron spill-out at the cluster surface. This work is intended to give, from a classical perspective, further insight into recent work on plasmon relaxation in quantum plasmas [1].
To center or not to center? Investigating inertia with a multilevel autoregressive model.
Hamaker, Ellen L; Grasman, Raoul P P P
2014-01-01
Whether level 1 predictors should be centered per cluster has received considerable attention in the multilevel literature. While most agree that there is no one preferred approach, it has also been argued that cluster mean centering is desirable when the within-cluster slope and the between-cluster slope are expected to deviate, and the main interest is in the within-cluster slope. However, we show in a series of simulations that if one has a multilevel autoregressive model in which the level 1 predictor is the lagged outcome variable (i.e., the outcome variable at the previous occasion), cluster mean centering will in general lead to a downward bias in the parameter estimate of the within-cluster slope (i.e., the autoregressive relationship). This is particularly relevant if the main question is whether there is on average an autoregressive effect. Nonetheless, we show that if the main interest is in estimating the effect of a level 2 predictor on the autoregressive parameter (i.e., a cross-level interaction), cluster mean centering should be preferred over other forms of centering. Hence, researchers should be clear on what is considered the main goal of their study, and base their choice of centering method on this when using a multilevel autoregressive model.
To center or not to center? Investigating inertia with a multilevel autoregressive model
Hamaker, Ellen L.; Grasman, Raoul P. P. P.
2015-01-01
Whether level 1 predictors should be centered per cluster has received considerable attention in the multilevel literature. While most agree that there is no one preferred approach, it has also been argued that cluster mean centering is desirable when the within-cluster slope and the between-cluster slope are expected to deviate, and the main interest is in the within-cluster slope. However, we show in a series of simulations that if one has a multilevel autoregressive model in which the level 1 predictor is the lagged outcome variable (i.e., the outcome variable at the previous occasion), cluster mean centering will in general lead to a downward bias in the parameter estimate of the within-cluster slope (i.e., the autoregressive relationship). This is particularly relevant if the main question is whether there is on average an autoregressive effect. Nonetheless, we show that if the main interest is in estimating the effect of a level 2 predictor on the autoregressive parameter (i.e., a cross-level interaction), cluster mean centering should be preferred over other forms of centering. Hence, researchers should be clear on what is considered the main goal of their study, and base their choice of centering method on this when using a multilevel autoregressive model. PMID:25688215
Large-scale model quality assessment for improving protein tertiary structure prediction.
Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin
2015-06-15
Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling. The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/. © The Author 2015. Published by Oxford University Press.
Cascades on a class of clustered random networks
NASA Astrophysics Data System (ADS)
Hackett, Adam; Melnik, Sergey; Gleeson, James P.
2011-05-01
We present an analytical approach to determining the expected cascade size in a broad range of dynamical models on the class of random networks with arbitrary degree distribution and nonzero clustering introduced previously in [M. E. J. Newman, Phys. Rev. Lett. PRLTAO0031-900710.1103/PhysRevLett.103.058701103, 058701 (2009)]. A condition for the existence of global cascades is derived as well as a general criterion that determines whether increasing the level of clustering will increase, or decrease, the expected cascade size. Applications, examples of which are provided, include site percolation, bond percolation, and Watts’ threshold model; in all cases analytical results give excellent agreement with numerical simulations.
Techniques to derive geometries for image-based Eulerian computations
Dillard, Seth; Buchholz, James; Vigmostad, Sarah; Kim, Hyunggun; Udaykumar, H.S.
2014-01-01
Purpose The performance of three frequently used level set-based segmentation methods is examined for the purpose of defining features and boundary conditions for image-based Eulerian fluid and solid mechanics models. The focus of the evaluation is to identify an approach that produces the best geometric representation from a computational fluid/solid modeling point of view. In particular, extraction of geometries from a wide variety of imaging modalities and noise intensities, to supply to an immersed boundary approach, is targeted. Design/methodology/approach Two- and three-dimensional images, acquired from optical, X-ray CT, and ultrasound imaging modalities, are segmented with active contours, k-means, and adaptive clustering methods. Segmentation contours are converted to level sets and smoothed as necessary for use in fluid/solid simulations. Results produced by the three approaches are compared visually and with contrast ratio, signal-to-noise ratio, and contrast-to-noise ratio measures. Findings While the active contours method possesses built-in smoothing and regularization and produces continuous contours, the clustering methods (k-means and adaptive clustering) produce discrete (pixelated) contours that require smoothing using speckle-reducing anisotropic diffusion (SRAD). Thus, for images with high contrast and low to moderate noise, active contours are generally preferable. However, adaptive clustering is found to be far superior to the other two methods for images possessing high levels of noise and global intensity variations, due to its more sophisticated use of local pixel/voxel intensity statistics. Originality/value It is often difficult to know a priori which segmentation will perform best for a given image type, particularly when geometric modeling is the ultimate goal. This work offers insight to the algorithm selection process, as well as outlining a practical framework for generating useful geometric surfaces in an Eulerian setting. PMID:25750470
Transformation and model choice for RNA-seq co-expression analysis.
Rau, Andrea; Maugis-Rabusseau, Cathy
2018-05-01
Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.
Clustering P-Wave Receiver Functions To Constrain Subsurface Seismic Structure
NASA Astrophysics Data System (ADS)
Chai, C.; Larmat, C. S.; Maceira, M.; Ammon, C. J.; He, R.; Zhang, H.
2017-12-01
The acquisition of high-quality data from permanent and temporary dense seismic networks provides the opportunity to apply statistical and machine learning techniques to a broad range of geophysical observations. Lekic and Romanowicz (2011) used clustering analysis on tomographic velocity models of the western United States to perform tectonic regionalization and the velocity-profile clusters agree well with known geomorphic provinces. A complementary and somewhat less restrictive approach is to apply cluster analysis directly to geophysical observations. In this presentation, we apply clustering analysis to teleseismic P-wave receiver functions (RFs) continuing efforts of Larmat et al. (2015) and Maceira et al. (2015). These earlier studies validated the approach with surface waves and stacked EARS RFs from the USArray stations. In this study, we experiment with both the K-means and hierarchical clustering algorithms. We also test different distance metrics defined in the vector space of RFs following Lekic and Romanowicz (2011). We cluster data from two distinct data sets. The first, corresponding to the western US, was by smoothing/interpolation of receiver-function wavefield (Chai et al. 2015). Spatial coherence and agreement with geologic region increase with this simpler, spatially smoothed set of observations. The second data set is composed of RFs for more than 800 stations of the China Digital Seismic Network (CSN). Preliminary results show a first order agreement between clusters and tectonic region and each region cluster includes a distinct Ps arrival, which probably reflects differences in crustal thickness. Regionalization remains an important step to characterize a model prior to application of full waveform and/or stochastic imaging techniques because of the computational expense of these types of studies. Machine learning techniques can provide valuable information that can be used to design and characterize formal geophysical inversion, providing information on spatial variability in the subsurface geology.
The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts
NASA Astrophysics Data System (ADS)
Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.
2012-07-01
We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.
Simultaneous Co-Clustering and Classification in Customers Insight
NASA Astrophysics Data System (ADS)
Anggistia, M.; Saefuddin, A.; Sartono, B.
2017-04-01
Building predictive model based on the heterogeneous dataset may yield many problems, such as less precise in parameter and prediction accuracy. Such problem can be solved by segmenting the data into relatively homogeneous groups and then build a predictive model for each cluster. The advantage of using this strategy usually gives result in simpler models, more interpretable, and more actionable without any loss in accuracy and reliability. This work concerns on marketing data set which recorded a customer behaviour across products. There are some variables describing customer and product as attributes. The basic idea of this approach is to combine co-clustering and classification simultaneously. The objective of this research is to analyse the customer across product characteristics, so the marketing strategy implemented precisely.
Lo, Kenneth
2011-01-01
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components. PMID:22125375
Lo, Kenneth; Gottardo, Raphael
2012-01-01
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.
Huang, Yangxin; Lu, Xiaosun; Chen, Jiaqing; Liang, Juan; Zangmeister, Miriam
2017-10-27
Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately.
Attempt to probe nuclear charge radii by cluster and proton emissions
NASA Astrophysics Data System (ADS)
Qian, Yibin; Ren, Zhongzhou; Ni, Dongdong
2013-05-01
We deduce the rms nuclear charge radii for ground states of light and medium-mass nuclei from experimental data of cluster radioactivity and proton emission in a unified framework. On the basis of the density-dependent cluster model, the calculated decay half-lives are obtained within the modified two-potential approach. The charge distribution of emitted clusters in the cluster decay and that of daughter nuclei in the proton emission are determined to correspondingly reproduce the experimental half-lives within the folding model. The obtained charge distribution is then employed to give the rms charge radius of the studied nuclei. Satisfactory agreement between theory and experiment is achieved for available experimental data, and the present results are found to be consistent with theoretical estimations. This study is expected to be helpful in the future detection of nuclear sizes, especially for these exotic nuclei near the proton dripline.
Nguyen, Hien D; Ullmann, Jeremy F P; McLachlan, Geoffrey J; Voleti, Venkatakaushik; Li, Wenze; Hillman, Elizabeth M C; Reutens, David C; Janke, Andrew L
2018-02-01
Calcium is a ubiquitous messenger in neural signaling events. An increasing number of techniques are enabling visualization of neurological activity in animal models via luminescent proteins that bind to calcium ions. These techniques generate large volumes of spatially correlated time series. A model-based functional data analysis methodology via Gaussian mixtures is suggested for the clustering of data from such visualizations is proposed. The methodology is theoretically justified and a computationally efficient approach to estimation is suggested. An example analysis of a zebrafish imaging experiment is presented.
Few-Photon Model of the Optical Emission of Semiconductor Quantum Dots
NASA Astrophysics Data System (ADS)
Richter, Marten; Carmele, Alexander; Sitek, Anna; Knorr, Andreas
2009-08-01
The Jaynes-Cummings model provides a well established theoretical framework for single electron two level systems in a radiation field. Similar exactly solvable models for semiconductor light emitters such as quantum dots dominated by many particle interactions are not known. We access these systems by a generalized cluster expansion, the photon-probability cluster expansion: a reliable approach for few-photon dynamics in many body electron systems. As a first application, we discuss vacuum Rabi oscillations and show that their amplitude determines the number of electrons in the quantum dot.
Modeling of Cluster-Induced Turbulence in Particle-Laden Channel Flow
NASA Astrophysics Data System (ADS)
Baker, Michael; Capecelatro, Jesse; Kong, Bo; Fox, Rodney; Desjardins, Olivier
2017-11-01
A phenomenon often observed in gas-solid flows is the formation of mesoscale clusters of particles due to the relative motion between the solid and fluid phases that is sustained through the dampening of collisional particle motion from interphase momentum coupling inside these clusters. The formation of such sustained clusters, leading to cluster-induced turbulence (CIT), can have a significant impact in industrial processes, particularly in regards to mixing, reaction progress, and heat transfer. Both Euler-Lagrange (EL) and Euler-Euler anisotropic Gaussian (EE-AG) approaches are used in this work to perform mesoscale simulations of CIT in fully developed gas-particle channel flow. The results from these simulations are applied in the development of a two-phase Reynolds-Averaged Navier-Stokes (RANS) model to capture the wall-normal flow characteristics in a less computationally expensive manner. Parameters such as mass loading, particle size, and gas velocity are varied to examine their respective impact on cluster formation and turbulence statistics. Acknowledging support from the NSF (AN:1437865).
Analysis and Modeling of Structure Formation in Granular and Fluid-Solid Flows
NASA Astrophysics Data System (ADS)
Murphy, Eric
Granular and multiphase flows are encountered in a number of industrial processes with particular emphasis in this manuscript given to the particular applications in cement pumping, pneumatic conveying, fluid catalytic cracking, CO2 capture, and fast pyrolysis of bio-materials. These processes are often modeled using averaged equations that may be simulated using computational fluid dynamics. Closure models are then required that describe the average forces that arise from both interparticle interactions, e.g. shear stress, and interphase interactions, such as mean drag. One of the biggest hurdles to this approach is the emergence of non-trivial spatio-temporal structures in the particulate phase, which can significantly modify the qualitative behavior of these forces and the resultant flow phenomenology. For example, the formation of large clusters in cohesive granular flows is responsible for a transition from solid-like to fluid-like rheology. Another example is found in gas-solid systems, where clustering at small scales is observed to significantly lower in the observed drag. Moreover, there remains the possibility that structure formation may occur at all scales, leading to a lack of scale separation required for traditional averaging approaches. In this context, several modeling problems are treated 1) first-principles based modeling of the rheology of cement slurries, 2) modeling the mean solid-solid drag experienced by polydisperse particles undergoing segregation, and 3) modeling clustering in homogeneous gas-solid flows. The first and third components are described in greater detail. In the study on the rheology of cements, several sub-problems are introduced, which systematically increase in the number and complexity of interparticle interactions. These interparticle interactions include inelasticity, friction, cohesion, and fluid interactions. In the first study, the interactions between cohesive inelastic particles was fully characterized for the first time. Next, kinetic theory was used to predict the cooling of a gas of such particles. DEM was then used to validate this approach. A study on the rheology of dry cohesive granules with and without friction was then carried out, where the physics of different flow phenomenology was exhaustively explored. Lastly, homogeneous cement slurry simulations were carried out, and compared with vane-rheometer experiments. Qualitative agreement between simulation and experiment were observed. Lastly, the physics of clustering in homogeneous gas-solid flows is explored in the hopes of gaining a mechanistic explanation of how particle-fluid interactions lead to clustering. Exact equations are derived, detailing the evolution of the two particle density, which may be closed using high-fidelity particle-resolved direct numerical simulation. Two canonical gas-solid flows are then addressed, the homogeneously cooling gas-solid flow (HCGSF) and sedimenting gas-solid flow (SGSF). A mechanism responsible for clustering in the HCGSF is identified. Clustering of plane-wave like structures is observed in the SGSF, and the exact terms are quantified. A method for modeling the dynamics of clustering in these systems is proposed, which may aid in the prediction of clustering and other correlation length-scales useful for less expensive computations.
IoT Service Clustering for Dynamic Service Matchmaking.
Zhao, Shuai; Yu, Le; Cheng, Bo; Chen, Junliang
2017-07-27
As the adoption of service-oriented paradigms in the IoT (Internet of Things) environment, real-world devices will open their capabilities through service interfaces, which enable other functional entities to interact with them. In an IoT application, it is indispensable to find suitable services for satisfying users' requirements or replacing the unavailable services. However, from the perspective of performance, it is inappropriate to find desired services from the service repository online directly. Instead, clustering services offline according to their similarity and matchmaking or discovering service online in limited clusters is necessary. This paper proposes a multidimensional model-based approach to measure the similarity between IoT services. Then, density-peaks-based clustering is employed to gather similar services together according to the result of similarity measurement. Based on the service clustering, the algorithms of dynamic service matchmaking, discovery, and replacement will be performed efficiently. Evaluating experiments are conducted to validate the performance of proposed approaches, and the results are promising.
IoT Service Clustering for Dynamic Service Matchmaking
Yu, Le; Cheng, Bo; Chen, Junliang
2017-01-01
As the adoption of service-oriented paradigms in the IoT (Internet of Things) environment, real-world devices will open their capabilities through service interfaces, which enable other functional entities to interact with them. In an IoT application, it is indispensable to find suitable services for satisfying users’ requirements or replacing the unavailable services. However, from the perspective of performance, it is inappropriate to find desired services from the service repository online directly. Instead, clustering services offline according to their similarity and matchmaking or discovering service online in limited clusters is necessary. This paper proposes a multidimensional model-based approach to measure the similarity between IoT services. Then, density-peaks-based clustering is employed to gather similar services together according to the result of similarity measurement. Based on the service clustering, the algorithms of dynamic service matchmaking, discovery, and replacement will be performed efficiently. Evaluating experiments are conducted to validate the performance of proposed approaches, and the results are promising. PMID:28749431
ERIC Educational Resources Information Center
Ho, Hsuan-Fu; Hung, Chia-Chi
2008-01-01
Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…
CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS
McParland, Damien; Gormley, Isobel Claire; McCormick, Tyler H.; Clark, Samuel J.; Kabudula, Chodziwadziwa Whiteson; Collinson, Mark A.
2014-01-01
The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region. PMID:25485026
NASA Astrophysics Data System (ADS)
Hozé, Nathanaël; Holcman, David
2012-01-01
We develop a coagulation-fragmentation model to study a system composed of a small number of stochastic objects moving in a confined domain, that can aggregate upon binding to form local clusters of arbitrary sizes. A cluster can also dissociate into two subclusters with a uniform probability. To study the statistics of clusters, we combine a Markov chain analysis with a partition number approach. Interestingly, we obtain explicit formulas for the size and the number of clusters in terms of hypergeometric functions. Finally, we apply our analysis to study the statistical physics of telomeres (ends of chromosomes) clustering in the yeast nucleus and show that the diffusion-coagulation-fragmentation process can predict the organization of telomeres.
Density-based clustering analyses to identify heterogeneous cellular sub-populations
NASA Astrophysics Data System (ADS)
Heaster, Tiffany M.; Walsh, Alex J.; Landman, Bennett A.; Skala, Melissa C.
2017-02-01
Autofluorescence microscopy of NAD(P)H and FAD provides functional metabolic measurements at the single-cell level. Here, density-based clustering algorithms were applied to metabolic autofluorescence measurements to identify cell-level heterogeneity in tumor cell cultures. The performance of the density-based clustering algorithm, DENCLUE, was tested in samples with known heterogeneity (co-cultures of breast carcinoma lines). DENCLUE was found to better represent the distribution of cell clusters compared to Gaussian mixture modeling. Overall, DENCLUE is a promising approach to quantify cell-level heterogeneity, and could be used to understand single cell population dynamics in cancer progression and treatment.
Yelland, Lisa N; Salter, Amy B; Ryan, Philip
2011-10-15
Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
Mixture Modeling: Applications in Educational Psychology
ERIC Educational Resources Information Center
Harring, Jeffrey R.; Hodis, Flaviu A.
2016-01-01
Model-based clustering methods, commonly referred to as finite mixture modeling, have been applied to a wide variety of cross-sectional and longitudinal data to account for heterogeneity in population characteristics. In this article, we elucidate 2 such approaches: growth mixture modeling and latent profile analysis. Both techniques are…
2007-01-01
including tree- based methods such as the unweighted pair group method of analysis ( UPGMA ) and Neighbour-joining (NJ) (Saitou & Nei, 1987). By...based Bayesian approach and the tree-based UPGMA and NJ cluster- ing methods. The results obtained suggest that far more species occur in the An...unlikely that groups that differ by more than these levels are conspecific. Genetic distances were clustered using the UPGMA and NJ algorithms in MEGA
Population Structure With Localized Haplotype Clusters
Browning, Sharon R.; Weir, Bruce S.
2010-01-01
We propose a multilocus version of FST and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific FST estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based FST than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of FST and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data. PMID:20457877
Medium resolution spectroscopy and chemical composition of Galactic globular clusters
NASA Astrophysics Data System (ADS)
Khamidullina, D. A.; Sharina, M. E.; Shimansky, V. V.; Davoust, E.
We used integrated-light medium-resolution spectra of six Galactic globular clusters and model stellar atmospheres to carry out population synthesis and to derive chemical composition and age of the clusters. We used medium-resolution spectra of globular clusters published by Schiavon et al. (2005), as well as our long-slit observations with the 1.93 m telescope of the Haute Provence Observatory. The observed spectra were fitted to the theoretical ones interactively. As an initial approach, we used masses, radii and log g of stars in the clusters corresponding to the best fitting isochrones in the observed color-magnitude diagrams. The computed synthetic blanketed spectra of stars were summed according to the Chabrier mass function. To improve the determination of age and helium content, the shape and depth of the Balmer absorption lines was analysed. The abundances of Mg, Ca, C and several other elements were derived. A reasonable agreement with the literature data both in chemical composition and in age of the clusters is found. Our method might be useful for the development of stellar population models and for a better understanding of extragalactic star clusters.
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
NASA Astrophysics Data System (ADS)
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard
2014-09-01
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G., E-mail: yannis@princeton.edu, E-mail: gerhard.hummer@biophys.mpg.de
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlapmore » with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.« less
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard
2014-01-01
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space. PMID:25240340
Kopelman, Naama M; Mayzel, Jonathan; Jakobsson, Mattias; Rosenberg, Noah A; Mayrose, Itay
2015-09-01
The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. © 2015 John Wiley & Sons Ltd.
Vigre, Håkan; Domingues, Ana Rita Coutinho Calado; Pedersen, Ulrik Bo; Hald, Tine
2016-03-01
The aim of the project as the cluster analysis was to in part to develop a generic structured quantitative microbiological risk assessment (QMRA) model of human salmonellosis due to pork consumption in EU member states (MSs), and the objective of the cluster analysis was to group the EU MSs according to the relative contribution of different pathways of Salmonella in the farm-to-consumption chain of pork products. In the development of the model, by selecting a case study MS from each cluster the model was developed to represent different aspects of pig production, pork production, and consumption of pork products across EU states. The objective of the cluster analysis was to aggregate MSs into groups of countries with similar importance of different pathways of Salmonella in the farm-to-consumption chain using available, and where possible, universal register data related to the pork production and consumption in each country. Based on MS-specific information about distribution of (i) small and large farms, (ii) small and large slaughterhouses, (iii) amount of pork meat consumed, and (iv) amount of sausages consumed we used nonhierarchical and hierarchical cluster analysis to group the MSs. The cluster solutions were validated internally using statistic measures and externally by comparing the clustered MSs with an estimated human incidence of salmonellosis due to pork products in the MSs. Finally, each cluster was characterized qualitatively using the centroids of the clusters. © 2016 Society for Risk Analysis.
NASA Astrophysics Data System (ADS)
Hartman, Joshua D.; Monaco, Stephen; Schatschneider, Bohdan; Beran, Gregory J. O.
2015-09-01
We assess the quality of fragment-based ab initio isotropic 13C chemical shift predictions for a collection of 25 molecular crystals with eight different density functionals. We explore the relative performance of cluster, two-body fragment, combined cluster/fragment, and the planewave gauge-including projector augmented wave (GIPAW) models relative to experiment. When electrostatic embedding is employed to capture many-body polarization effects, the simple and computationally inexpensive two-body fragment model predicts both isotropic 13C chemical shifts and the chemical shielding tensors as well as both cluster models and the GIPAW approach. Unlike the GIPAW approach, hybrid density functionals can be used readily in a fragment model, and all four hybrid functionals tested here (PBE0, B3LYP, B3PW91, and B97-2) predict chemical shifts in noticeably better agreement with experiment than the four generalized gradient approximation (GGA) functionals considered (PBE, OPBE, BLYP, and BP86). A set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided based on these benchmark calculations. Statistical cross-validation procedures are used to demonstrate the robustness of these fits.
Hartman, Joshua D; Monaco, Stephen; Schatschneider, Bohdan; Beran, Gregory J O
2015-09-14
We assess the quality of fragment-based ab initio isotropic (13)C chemical shift predictions for a collection of 25 molecular crystals with eight different density functionals. We explore the relative performance of cluster, two-body fragment, combined cluster/fragment, and the planewave gauge-including projector augmented wave (GIPAW) models relative to experiment. When electrostatic embedding is employed to capture many-body polarization effects, the simple and computationally inexpensive two-body fragment model predicts both isotropic (13)C chemical shifts and the chemical shielding tensors as well as both cluster models and the GIPAW approach. Unlike the GIPAW approach, hybrid density functionals can be used readily in a fragment model, and all four hybrid functionals tested here (PBE0, B3LYP, B3PW91, and B97-2) predict chemical shifts in noticeably better agreement with experiment than the four generalized gradient approximation (GGA) functionals considered (PBE, OPBE, BLYP, and BP86). A set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided based on these benchmark calculations. Statistical cross-validation procedures are used to demonstrate the robustness of these fits.
Analysis of cytokine release assay data using machine learning approaches.
Xiong, Feiyu; Janko, Marco; Walker, Mindi; Makropoulos, Dorie; Weinstock, Daniel; Kam, Moshe; Hrebien, Leonid
2014-10-01
The possible onset of Cytokine Release Syndrome (CRS) is an important consideration in the development of monoclonal antibody (mAb) therapeutics. In this study, several machine learning approaches are used to analyze CRS data. The analyzed data come from a human blood in vitro assay which was used to assess the potential of mAb-based therapeutics to produce cytokine release similar to that induced by Anti-CD28 superagonistic (Anti-CD28 SA) mAbs. The data contain 7 mAbs and two negative controls, a total of 423 samples coming from 44 donors. Three (3) machine learning approaches were applied in combination to observations obtained from that assay, namely (i) Hierarchical Cluster Analysis (HCA); (ii) Principal Component Analysis (PCA) followed by K-means clustering; and (iii) Decision Tree Classification (DTC). All three approaches were able to identify the treatment that caused the most severe cytokine response. HCA was able to provide information about the expected number of clusters in the data. PCA coupled with K-means clustering allowed classification of treatments sample by sample, and visualizing clusters of treatments. DTC models showed the relative importance of various cytokines such as IFN-γ, TNF-α and IL-10 to CRS. The use of these approaches in tandem provides better selection of parameters for one method based on outcomes from another, and an overall improved analysis of the data through complementary approaches. Moreover, the DTC analysis showed in addition that IL-17 may be correlated with CRS reactions, although this correlation has not yet been corroborated in the literature. Copyright © 2014 Elsevier B.V. All rights reserved.
Two- and three-cluster decays of light nuclei within a hyperspherical harmonics approach
NASA Astrophysics Data System (ADS)
Vasilevsky, V. S.; Lashko, Yu. A.; Filippov, G. F.
2018-06-01
We consider a set of three-cluster systems (4He, 7Li, 7Be, 8Be, 10Be) within a microscopic model which involves hyperspherical harmonics to represent intercluster motion. We selected three-cluster systems which have at least one binary channel. Our aim is to study whether hyperspherical harmonics are able, and under what conditions, to describe two-body channel(s) (nondemocratic motion) or if they are suitable for describing the three-cluster continuum only (democratic motion). It is demonstrated that a rather restricted number of hyperspherical harmonics allows us to describe bound states and scattering states in the two-body continuum for a three-cluster system.
Perera, Angelo S.; Thomas, Javix; Poopari, Mohammad R.; Xu, Yunjie
2016-01-01
Vibrational optical activity spectroscopies, namely vibrational circular dichroism (VCD) and Raman optical activity (ROA), have been emerged in the past decade as powerful spectroscopic tools for stereochemical information of a wide range of chiral compounds in solution directly. More recently, their applications in unveiling solvent effects, especially those associated with water solvent, have been explored. In this review article, we first select a few examples to demonstrate the unique sensitivity of VCD spectral signatures to both bulk solvent effects and explicit hydrogen-bonding interactions in solution. Second, we discuss the induced solvent chirality, or chiral transfer, VCD spectral features observed in the water bending band region in detail. From these chirality transfer spectral data, the related conformer specific gas phase spectroscopic studies of small chiral hydration clusters, and the associated matrix isolation VCD experiments of hydrogen-bonded complexes in cold rare gas matrices, a general picture of solvation in aqueous solution emerges. In such an aqueous solution, some small chiral hydration clusters, rather than the chiral solutes themselves, are the dominant species and are the ones that contribute mainly to the experimentally observed VCD features. We then review a series of VCD studies of amino acids and their derivatives in aqueous solution under different pHs to emphasize the importance of the inclusion of the bulk solvent effects. These experimental data and the associated theoretical analyses are the foundation for the proposed “clusters-in-a-liquid” approach to account for solvent effects effectively. We present several approaches to identify and build such representative chiral hydration clusters. Recent studies which applied molecular dynamics simulations and the subsequent snapshot averaging approach to generate the ROA, VCD, electronic CD, and optical rotatory dispersion spectra are also reviewed. Challenges associated with the molecular dynamics snapshot approach are discussed and the successes of the seemingly random “ad hoc explicit solvation” reported before are also explained. To further test and improve the “clusters-in-a-liquid” model in practice, future work in terms of conformer specific gas phase spectroscopy of sequential solvation of a chiral solute, matrix isolation VCD measurements of small chiral hydration clusters, and more sophisticated models for the bulk solvent effects would be highly valuable. PMID:26942177
A phase field model for segregation and precipitation induced by irradiation in alloys
NASA Astrophysics Data System (ADS)
Badillo, A.; Bellon, P.; Averback, R. S.
2015-04-01
A phase field model is introduced to model the evolution of multicomponent alloys under irradiation, including radiation-induced segregation and precipitation. The thermodynamic and kinetic components of this model are derived using a mean-field model. The mobility coefficient and the contribution of chemical heterogeneity to free energy are rescaled by the cell size used in the phase field model, yielding microstructural evolutions that are independent of the cell size. A new treatment is proposed for point defect clusters, using a mixed discrete-continuous approach to capture the stochastic character of defect cluster production in displacement cascades, while retaining the efficient modeling of the fate of these clusters using diffusion equations. The model is tested on unary and binary alloy systems using two-dimensional simulations. In a unary system, the evolution of point defects under irradiation is studied in the presence of defect clusters, either pre-existing ones or those created by irradiation, and compared with rate theory calculations. Binary alloys with zero and positive heats of mixing are then studied to investigate the effect of point defect clustering on radiation-induced segregation and precipitation in undersaturated solid solutions. Lastly, irradiation conditions and alloy parameters leading to irradiation-induced homogeneous precipitation are investigated. The results are discussed in the context of experimental results reported for Ni-Si and Al-Zn undersaturated solid solutions subjected to irradiation.
de Lara-Castells, María Pilar; Stoll, Hermann; Mitrushchenkov, Alexander O
2014-08-21
As a prototypical dispersion-dominated physisorption problem, we analyze here the performance of dispersionless and dispersion-accounting methodologies on the helium interaction with cluster models of the TiO2(110) surface. A special focus has been given to the dispersionless density functional dlDF and the dlDF+Das construction for the total interaction energy (K. Pernal, R. Podeswa, K. Patkowski, and K. Szalewicz, Phys. Rev. Lett. 2009, 109, 263201), where Das is an effective interatomic pairwise functional form for the dispersion. Likewise, the performance of symmetry-adapted perturbation theory (SAPT) method is evaluated, where the interacting monomers are described by density functional theory (DFT) with the dlDF, PBE, and PBE0 functionals. Our benchmarks include CCSD(T)-F12b calculations and comparative analysis on the nuclear bound states supported by the He-cluster potentials. Moreover, intra- and intermonomer correlation contributions to the physisorption interaction are analyzed through the method of increments (H. Stoll, J. Chem. Phys. 1992, 97, 8449) at the CCSD(T) level of theory. This method is further applied in conjunction with a partitioning of the Hartree-Fock interaction energy to estimate individual interaction energy components, comparing them with those obtained using the different SAPT(DFT) approaches. The cluster size evolution of dispersionless and dispersion-accounting energy components is then discussed, revealing the reduced role of the dispersionless interaction and intramonomer correlation when the extended nature of the surface is better accounted for. On the contrary, both post-Hartree-Fock and SAPT(DFT) results clearly demonstrate the high-transferability character of the effective pairwise dispersion interaction whatever the cluster model is. Our contribution also illustrates how the method of increments can be used as a valuable tool not only to achieve the accuracy of CCSD(T) calculations using large cluster models but also to evaluate the performance of SAPT(DFT) methods for the physically well-defined contributions to the total interaction energy. Overall, our work indicates the excellent performance of a dlDF+Das approach in which the parameters are optimized using the smallest cluster model of the target surface to treat van der Waals adsorbate-surface interactions.
Goovaerts, Pierre; Jacquez, Geoffrey M
2004-01-01
Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930
Dynamical Competition of IC-Industry Clustering from Taiwan to China
NASA Astrophysics Data System (ADS)
Tsai, Bi-Huei; Tsai, Kuo-Hui
2009-08-01
Most studies employ qualitative approach to explore the industrial clusters; however, few research has objectively quantified the evolutions of industry clustering. The purpose of this paper is to quantitatively analyze clustering among IC design, IC manufacturing as well as IC packaging and testing industries by using the foreign direct investment (FDI) data. The Lotka-Volterra system equations are first adopted here to capture the competition or cooperation among such three industries, thus explaining their clustering inclinations. The results indicate that the evolution of FDI into China for IC design industry significantly inspire the subsequent FDI of IC manufacturing as well as IC packaging and testing industries. Since IC design industry lie in the upstream stage of IC production, the middle-stream IC manufacturing and downstream IC packing and testing enterprises tend to cluster together with IC design firms, in order to sustain a steady business. Finally, Taiwan IC industry's FDI amount into China is predicted to cumulatively increase, which supports the industrial clustering tendency for Taiwan IC industry. Particularly, the FDI prediction of Lotka-Volterra model performs superior to that of the conventional Bass model after the forecast accuracy of these two models are compared. The prediction ability is dramatically improved as the industrial mutualism among each IC production stage is taken into account.
Combining Multiobjective Optimization and Cluster Analysis to Study Vocal Fold Functional Morphology
Palaparthi, Anil; Riede, Tobias
2017-01-01
Morphological design and the relationship between form and function have great influence on the functionality of a biological organ. However, the simultaneous investigation of morphological diversity and function is difficult in complex natural systems. We have developed a multiobjective optimization (MOO) approach in association with cluster analysis to study the form-function relation in vocal folds. An evolutionary algorithm (NSGA-II) was used to integrate MOO with an existing finite element model of the laryngeal sound source. Vocal fold morphology parameters served as decision variables and acoustic requirements (fundamental frequency, sound pressure level) as objective functions. A two-layer and a three-layer vocal fold configuration were explored to produce the targeted acoustic requirements. The mutation and crossover parameters of the NSGA-II algorithm were chosen to maximize a hypervolume indicator. The results were expressed using cluster analysis and were validated against a brute force method. Results from the MOO and the brute force approaches were comparable. The MOO approach demonstrated greater resolution in the exploration of the morphological space. In association with cluster analysis, MOO can efficiently explore vocal fold functional morphology. PMID:24771563
Perspective: Size selected clusters for catalysis and electrochemistry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halder, Avik; Curtiss, Larry A.; Fortunelli, Alessandro
We report that size-selected clusters containing a handful of atoms may possess noble catalytic properties different from nano-sized or bulk catalysts. Size- and composition-selected clusters can also serve as models of the catalytic active site, where an addition or removal of a single atom can have a dramatic effect on their activity and selectivity. In this Perspective, we provide an overview of studies performed under both ultra-high vacuum and realistic reaction conditions aimed at the interrogation, characterization and understanding of the performance of supported size-selected clusters in heterogeneous and electrochemical reactions, which address the effects of cluster size, cluster composition,more » cluster-support interactions and reaction conditions, the key parameters for the understanding and control of catalyst functionality. Computational modelling based on density functional theory sampling of local minima and energy barriers or ab initio Molecular Dynamics simulations is an integral part of this research by providing fundamental understanding of the catalytic processes at the atomic level, as well as by predicting new materials compositions which can be validated in experiments. Lastly, we discuss approaches which aim at the scale up of the production of well-defined clusters for use in real world applications.« less
Perspective: Size selected clusters for catalysis and electrochemistry
Halder, Avik; Curtiss, Larry A.; Fortunelli, Alessandro; ...
2018-03-15
We report that size-selected clusters containing a handful of atoms may possess noble catalytic properties different from nano-sized or bulk catalysts. Size- and composition-selected clusters can also serve as models of the catalytic active site, where an addition or removal of a single atom can have a dramatic effect on their activity and selectivity. In this Perspective, we provide an overview of studies performed under both ultra-high vacuum and realistic reaction conditions aimed at the interrogation, characterization and understanding of the performance of supported size-selected clusters in heterogeneous and electrochemical reactions, which address the effects of cluster size, cluster composition,more » cluster-support interactions and reaction conditions, the key parameters for the understanding and control of catalyst functionality. Computational modelling based on density functional theory sampling of local minima and energy barriers or ab initio Molecular Dynamics simulations is an integral part of this research by providing fundamental understanding of the catalytic processes at the atomic level, as well as by predicting new materials compositions which can be validated in experiments. Lastly, we discuss approaches which aim at the scale up of the production of well-defined clusters for use in real world applications.« less
Perspective: Size selected clusters for catalysis and electrochemistry
NASA Astrophysics Data System (ADS)
Halder, Avik; Curtiss, Larry A.; Fortunelli, Alessandro; Vajda, Stefan
2018-03-01
Size-selected clusters containing a handful of atoms may possess noble catalytic properties different from nano-sized or bulk catalysts. Size- and composition-selected clusters can also serve as models of the catalytic active site, where an addition or removal of a single atom can have a dramatic effect on their activity and selectivity. In this perspective, we provide an overview of studies performed under both ultra-high vacuum and realistic reaction conditions aimed at the interrogation, characterization, and understanding of the performance of supported size-selected clusters in heterogeneous and electrochemical reactions, which address the effects of cluster size, cluster composition, cluster-support interactions, and reaction conditions, the key parameters for the understanding and control of catalyst functionality. Computational modeling based on density functional theory sampling of local minima and energy barriers or ab initio molecular dynamics simulations is an integral part of this research by providing fundamental understanding of the catalytic processes at the atomic level, as well as by predicting new materials compositions which can be validated in experiments. Finally, we discuss approaches which aim at the scale up of the production of well-defined clusters for use in real world applications.
Burte, Emilie; Bousquet, Jean; Varraso, Raphaëlle; Gormand, Frédéric; Just, Jocelyne; Matran, Régis; Pin, Isabelle; Siroux, Valérie; Jacquemin, Bénédicte; Nadif, Rachel
2015-01-01
The classification of rhinitis in adults is missing in epidemiological studies. To identify phenotypes of adult rhinitis using an unsupervised approach (data-driven) compared with a classical hypothesis-driven approach. 983 adults of the French Epidemiological Study on the Genetics and Environment of Asthma (EGEA) were studied. Self-reported symptoms related to rhinitis such as nasal symptoms, hay fever, sinusitis, conjunctivitis, and sensitivities to different triggers (dust, animals, hay/flowers, cold air…) were used. Allergic sensitization was defined by at least one positive skin prick test to 12 aeroallergens. Mixture model was used to cluster participants, independently in those without (Asthma-, n = 582) and with asthma (Asthma+, n = 401). Three clusters were identified in both groups: 1) Cluster A (55% in Asthma-, and 22% in Asthma+) mainly characterized by the absence of nasal symptoms, 2) Cluster B (23% in Asthma-, 36% in Asthma+) mainly characterized by nasal symptoms all over the year, sinusitis and a low prevalence of positive skin prick tests, and 3) Cluster C (22% in Asthma-, 42% in Asthma+) mainly characterized by a peak of nasal symptoms during spring, a high prevalence of positive skin prick tests and a high report of hay fever, allergic rhinitis and conjunctivitis. The highest rate of polysensitization (80%) was found in participants with comorbid asthma and allergic rhinitis. This cluster analysis highlighted three clusters of rhinitis with similar characteristics than those known by clinicians but differing according to allergic sensitization, and this whatever the asthma status. These clusters could be easily rebuilt using a small number of variables.
Vera, José Fernando; de Rooij, Mark; Heiser, Willem J
2014-11-01
In this paper we propose a latent class distance association model for clustering in the predictor space of large contingency tables with a categorical response variable. The rows of such a table are characterized as profiles of a set of explanatory variables, while the columns represent a single outcome variable. In many cases such tables are sparse, with many zero entries, which makes traditional models problematic. By clustering the row profiles into a few specific classes and representing these together with the categories of the response variable in a low-dimensional Euclidean space using a distance association model, a parsimonious prediction model can be obtained. A generalized EM algorithm is proposed to estimate the model parameters and the adjusted Bayesian information criterion statistic is employed to test the number of mixture components and the dimensionality of the representation. An empirical example highlighting the advantages of the new approach and comparing it with traditional approaches is presented. © 2014 The British Psychological Society.
Crawford, Megan R.; Chirinos, Diana A.; Iurcotta, Toni; Edinger, Jack D.; Wyatt, James K.; Manber, Rachel; Ong, Jason C.
2017-01-01
Study Objectives: This study examined empirically derived symptom cluster profiles among patients who present with insomnia using clinical data and polysomnography. Methods: Latent profile analysis was used to identify symptom cluster profiles of 175 individuals (63% female) with insomnia disorder based on total scores on validated self-report instruments of daytime and nighttime symptoms (Insomnia Severity Index, Glasgow Sleep Effort Scale, Fatigue Severity Scale, Beliefs and Attitudes about Sleep, Epworth Sleepiness Scale, Pre-Sleep Arousal Scale), mean values from a 7-day sleep diary (sleep onset latency, wake after sleep onset, and sleep efficiency), and total sleep time derived from an in-laboratory PSG. Results: The best-fitting model had three symptom cluster profiles: “High Subjective Wakefulness” (HSW), “Mild Insomnia” (MI) and “Insomnia-Related Distress” (IRD). The HSW symptom cluster profile (26.3% of the sample) reported high wake after sleep onset, high sleep onset latency, and low sleep efficiency. Despite relatively comparable PSG-derived total sleep time, they reported greater levels of daytime sleepiness. The MI symptom cluster profile (45.1%) reported the least disturbance in the sleep diary and questionnaires and had the highest sleep efficiency. The IRD symptom cluster profile (28.6%) reported the highest mean scores on the insomnia-related distress measures (eg, sleep effort and arousal) and waking correlates (fatigue). Covariates associated with symptom cluster membership were older age for the HSW profile, greater obstructive sleep apnea severity for the MI profile, and, when adjusting for obstructive sleep apnea severity, being overweight/obese for the IRD profile. Conclusions: The heterogeneous nature of insomnia disorder is captured by this data-driven approach to identify symptom cluster profiles. The adaptation of a symptom cluster-based approach could guide tailored patient-centered management of patients presenting with insomnia, and enhance patient care. Citation: Crawford MR, Chirinos DA, Iurcotta T, Edinger JD, Wyatt JK, Manber R, Ong JC. Characterization of patients who present with insomnia: is there room for a symptom cluster-based approach? J Clin Sleep Med. 2017;13(7):911–921. PMID:28633722
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-01-01
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices’ service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes’ life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN. PMID:26703619
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things.
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-12-23
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices' service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes' life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN.
2n-emission from 205Pb* nucleus using clusterization approach at Ebeam˜14-20 MeV
NASA Astrophysics Data System (ADS)
Kaur, Amandeep; Sandhu, Kiran; Sharma, Manoj Kumar
2016-05-01
The dynamics involved in n-induced reaction with 204Pb target is analyzed and the decay of the composite system 205Pb* is governed within the collective clusterization approach of the Dynamical Cluster-decay Model (DCM). The experimental data for 2n-evaporation channel is available for neutron energy range of 14-20 MeV and is addressed by optimizing the only parameter of the model, the neck-length parameter (ΔR). The calculations are done by taking the quadrupole (β2) deformations of the decaying fragments and the calculated 2n-emission cross-sections find nice agreement with available data. An effort is made to study the role of level density parameter in the decay of hot-rotating nucleus, and the mass dependence in level density parameter is exercised for the first time in DCM based calculations. It is to be noted that the effect of deformation, temperature and angular momentum etc. is studied to extract better description of the dynamics involved.
NASA Astrophysics Data System (ADS)
Abbasi Baharanchi, Ahmadreza
This dissertation focused on development and utilization of numerical and experimental approaches to improve the CFD modeling of fluidization flow of cohesive micron size particles. The specific objectives of this research were: (1) Developing a cluster prediction mechanism applicable to Two-Fluid Modeling (TFM) of gas-solid systems (2) Developing more accurate drag models for Two-Fluid Modeling (TFM) of gas-solid fluidization flow with the presence of cohesive interparticle forces (3) using the developed model to explore the improvement of accuracy of TFM in simulation of fluidization flow of cohesive powders (4) Understanding the causes and influential factor which led to improvements and quantification of improvements (5) Gathering data from a fast fluidization flow and use these data for benchmark validations. Simulation results with two developed cluster-aware drag models showed that cluster prediction could effectively influence the results in both the first and second cluster-aware models. It was proven that improvement of accuracy of TFM modeling using three versions of the first hybrid model was significant and the best improvements were obtained by using the smallest values of the switch parameter which led to capturing the smallest chances of cluster prediction. In the case of the second hybrid model, dependence of critical model parameter on only Reynolds number led to the fact that improvement of accuracy was significant only in dense section of the fluidized bed. This finding may suggest that a more sophisticated particle resolved DNS model, which can span wide range of solid volume fraction, can be used in the formulation of the cluster-aware drag model. The results of experiment suing high speed imaging indicated the presence of particle clusters in the fluidization flow of FCC inside the riser of FIU-CFB facility. In addition, pressure data was successfully captured along the fluidization column of the facility and used as benchmark validation data for the second hybrid model developed in the present dissertation. It was shown the second hybrid model could predict the pressure data in the dense section of the fluidization column with better accuracy.
A comparison of regional flood frequency analysis approaches in a simulation framework
NASA Astrophysics Data System (ADS)
Ganora, D.; Laio, F.
2016-07-01
Regional frequency analysis (RFA) is a well-established methodology to provide an estimate of the flood frequency curve at ungauged (or scarcely gauged) sites. Different RFA approaches exist, depending on the way the information is transferred to the site of interest, but it is not clear in the literature if a specific method systematically outperforms the others. The aim of this study is to provide a framework wherein carrying out the intercomparison by building up a virtual environment based on synthetically generated data. The considered regional approaches include: (i) a unique regional curve for the whole region; (ii) a multiple-region model where homogeneous subregions are determined through cluster analysis; (iii) a Region-of-Influence model which defines a homogeneous subregion for each site; (iv) a spatially smooth estimation procedure where the parameters of the regional model vary continuously along the space. Virtual environments are generated considering different patterns of heterogeneity, including step change and smooth variations. If the region is heterogeneous, with the parent distribution changing continuously within the region, the spatially smooth regional approach outperforms the others, with overall errors 10-50% lower than the other methods. In the case of a step-change, the spatially smooth and clustering procedures perform similarly if the heterogeneity is moderate, while clustering procedures work better when the step-change is severe. To extend our findings, an extensive sensitivity analysis has been performed to investigate the effect of sample length, number of virtual stations, return period of the predicted quantile, variability of the scale parameter of the parent distribution, number of predictor variables and different parent distribution. Overall, the spatially smooth approach appears as the most robust approach as its performances are more stable across different patterns of heterogeneity, especially when short records are considered.
Excitonic Order and Superconductivity in the Two-Orbital Hubbard Model: Variational Cluster Approach
NASA Astrophysics Data System (ADS)
Fujiuchi, Ryo; Sugimoto, Koudai; Ohta, Yukinori
2018-06-01
Using the variational cluster approach based on the self-energy functional theory, we study the possible occurrence of excitonic order and superconductivity in the two-orbital Hubbard model with intra- and inter-orbital Coulomb interactions. It is known that an antiferromagnetic Mott insulator state appears in the regime of strong intra-orbital interaction, a band insulator state appears in the regime of strong inter-orbital interaction, and an excitonic insulator state appears between them. In addition to these states, we find that the s±-wave superconducting state appears in the small-correlation regime, and the dx2 - y2-wave superconducting state appears on the boundary of the antiferromagnetic Mott insulator state. We calculate the single-particle spectral function of the model and compare the band gap formation due to the superconducting and excitonic orders.
Banerjee, Amit; Misra, Milind; Pai, Deepa; Shih, Liang-Yu; Woodley, Rohan; Lu, Xiang-Jun; Srinivasan, A R; Olson, Wilma K; Davé, Rajesh N; Venanzi, Carol A
2007-01-01
Six rigid-body parameters (Shift, Slide, Rise, Tilt, Roll, Twist) are commonly used to describe the relative displacement and orientation of successive base pairs in a nucleic acid structure. The present work adapts this approach to describe the relative displacement and orientation of any two planes in an arbitrary molecule-specifically, planes which contain important pharmacophore elements. Relevant code from the 3DNA software package (Nucleic Acids Res. 2003, 31, 5108-5121) was generalized to treat molecular fragments other than DNA bases as input for the calculation of the corresponding rigid-body (or "planes") parameters. These parameters were used to construct feature vectors for a fuzzy relational clustering study of over 700 conformations of a flexible analogue of the dopamine reuptake inhibitor, GBR 12909. Several cluster validity measures were used to determine the optimal number of clusters. Translational (Shift, Slide, Rise) rather than rotational (Tilt, Roll, Twist) features dominate clustering based on planes that are relatively far apart, whereas both types of features are important to clustering when the pair of planes are close by. This approach was able to classify the data set of molecular conformations into groups and to identify representative conformers for use as template conformers in future Comparative Molecular Field Analysis studies of GBR 12909 analogues. The advantage of using the planes parameters, rather than the combination of atomic coordinates and angles between molecular planes used in our previous fuzzy relational clustering of the same data set (J. Chem. Inf. Model. 2005, 45, 610-623), is that the present clustering results are independent of molecular superposition and the technique is able to identify clusters in the molecule considered as a whole. This approach is easily generalizable to any two planes in any molecule.
A Study of Pupil Control Ideology: A Person-Oriented Approach to Data Analysis
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2010-01-01
Responses of urban school teachers to the Pupil Control Ideology questionnaire were studied using Latent Class Analysis. The results of the analysis suggest that the best fitting model to the data is a two-cluster solution. In particular, the pupil control ideology of the sample delineates into two clusters of teachers, those with humanistic and…
Vasylkivska, Veronika S.; Huerta, Nicolas J.
2017-06-24
Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less
NASA Astrophysics Data System (ADS)
Vasylkivska, Veronika S.; Huerta, Nicolas J.
2017-07-01
Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog's inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable with respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vasylkivska, Veronika S.; Huerta, Nicolas J.
Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less
A Game Theoretic Approach for Balancing Energy Consumption in Clustered Wireless Sensor Networks.
Yang, Liu; Lu, Yinzhi; Xiong, Lian; Tao, Yang; Zhong, Yuanchang
2017-11-17
Clustering is an effective topology control method in wireless sensor networks (WSNs), since it can enhance the network lifetime and scalability. To prolong the network lifetime in clustered WSNs, an efficient cluster head (CH) optimization policy is essential to distribute the energy among sensor nodes. Recently, game theory has been introduced to model clustering. Each sensor node is considered as a rational and selfish player which will play a clustering game with an equilibrium strategy. Then it decides whether to act as the CH according to this strategy for a tradeoff between providing required services and energy conservation. However, how to get the equilibrium strategy while maximizing the payoff of sensor nodes has rarely been addressed to date. In this paper, we present a game theoretic approach for balancing energy consumption in clustered WSNs. With our novel payoff function, realistic sensor behaviors can be captured well. The energy heterogeneity of nodes is considered by incorporating a penalty mechanism in the payoff function, so the nodes with more energy will compete for CHs more actively. We have obtained the Nash equilibrium (NE) strategy of the clustering game through convex optimization. Specifically, each sensor node can achieve its own maximal payoff when it makes the decision according to this strategy. Through plenty of simulations, our proposed game theoretic clustering is proved to have a good energy balancing performance and consequently the network lifetime is greatly enhanced.
Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria
2003-01-01
OBJECTIVE: To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. METHODS: Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. FINDINGS: A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. CONCLUSION: Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic. PMID:12973640
Carabin, Hélène; Escalona, Marisela; Marshall, Clare; Vivas-Martínez, Sarai; Botto, Carlos; Joseph, Lawrence; Basáñez, María-Gloria
2003-01-01
To develop a Bayesian hierarchical model for human onchocerciasis with which to explore the factors that influence prevalence of microfilariae in the Amazonian focus of onchocerciasis and predict the probability of any community being at least mesoendemic (>20% prevalence of microfilariae), and thus in need of priority ivermectin treatment. Models were developed with data from 732 individuals aged > or =15 years who lived in 29 Yanomami communities along four rivers of the south Venezuelan Orinoco basin. The models' abilities to predict prevalences of microfilariae in communities were compared. The deviance information criterion, Bayesian P-values, and residual values were used to select the best model with an approximate cross-validation procedure. A three-level model that acknowledged clustering of infection within communities performed best, with host age and sex included at the individual level, a river-dependent altitude effect at the community level, and additional clustering of communities along rivers. This model correctly classified 25/29 (86%) villages with respect to their need for priority ivermectin treatment. Bayesian methods are a flexible and useful approach for public health research and control planning. Our model acknowledges the clustering of infection within communities, allows investigation of links between individual- or community-specific characteristics and infection, incorporates additional uncertainty due to missing covariate data, and informs policy decisions by predicting the probability that a new community is at least mesoendemic.
Hong, H. L.; Wang, Q.; Dong, C.; Liaw, Peter K.
2014-01-01
Metallic alloys show complex chemistries that are not yet understood so far. It has been widely accepted that behind the composition selection lies a short-range-order mechanism for solid solutions. The present paper addresses this fundamental question by examining the face-centered-cubic Cu-Zn α-brasses. A new structural approach, the cluster-plus-glue-atom model, is introduced, which suits specifically for the description of short-range-order structures in disordered systems. Two types of formulas are pointed out, [Zn-Cu12]Zn1~6 and [Zn-Cu12](Zn,Cu)6, which explain the α-brasses listed in the American Society for Testing and Materials (ASTM) specifications. In these formulas, the bracketed parts represent the 1st-neighbor cluster, and each cluster is matched with one to six 2nd-neighbor Zn atoms or with six mixed (Zn,Cu) atoms. Such a cluster-based formulism describes the 1st- and 2nd-neighbor local atomic units where the solute and solvent interactions are ideally satisfied. The Cu-Ni industrial alloys are also explained, thus proving the universality of the cluster-formula approach in understanding the alloy selections. The revelation of the composition formulas for the Cu-(Zn,Ni) industrial alloys points to the common existence of simple composition rules behind seemingly complex chemistries of industrial alloys, thus offering a fundamental and practical method towards composition interpretations of all kinds of alloys. PMID:25399835
Hong, H. L.; Wang, Q.; Dong, C.; ...
2014-11-17
Metallic alloys show complex chemistries that are not yet understood so far. It has been widely accepted that behind the composition selection lies a short-range-order mechanism for solid solutions. The present paper addresses this fundamental question by examining the face-centered-cubic Cu-Zn α-brasses. A new structural approach, the cluster-plus-glue-atom model, is introduced, which suits specifically for the description of short-range-order structures in disordered systems. Two types of formulas are pointed out, [Zn-Cu 12]Zn 1~6 and [Zn-Cu 12](Zn,Cu) 6, which explain the α-brasses listed in the American Society for Testing and Materials (ASTM) specifications. In these formulas, the bracketed parts represent themore » 1 st-neighbor cluster, and each cluster is matched with one to six 2 nd-neighbor Zn atoms or with six mixed (Zn,Cu) atoms. Such a cluster-based formulism describes the 1 st- and 2 nd-neighbor local atomic units where the solute and solvent interactions are ideally satisfied. The Cu-Ni industrial alloys are also explained, thus proving the universality of the cluster-formula approach in understanding the alloy selections. As a result, the revelation of the composition formulas for the Cu-(Zn,Ni) industrial alloys points to the common existence of simple composition rules behind seemingly complex chemistries of industrial alloys, thus offering a fundamental and practical method towards composition interpretations of all kinds of alloys.« less
Wright, Mark H.; Tung, Chih-Wei; Zhao, Keyan; Reynolds, Andy; McCouch, Susan R.; Bustamante, Carlos D.
2010-01-01
Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster. Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called ‘ALCHEMY’ based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples. Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/ Contact: mhw6@cornell.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20926420
Alignment and integration of complex networks by hypergraph-based spectral clustering
NASA Astrophysics Data System (ADS)
Michoel, Tom; Nachtergaele, Bruno
2012-11-01
Complex networks possess a rich, multiscale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction, or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.
Alignment and integration of complex networks by hypergraph-based spectral clustering.
Michoel, Tom; Nachtergaele, Bruno
2012-11-01
Complex networks possess a rich, multiscale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction, or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
*K-means and cluster models for cancer signatures.
Kakushadze, Zura; Yu, Willie
2017-09-01
We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bansal, Artee; Asthagiri, D.; Cox, Kenneth R.
A mixture of solvent particles with short-range, directional interactions and solute particles with short-range, isotropic interactions that can bond multiple times is of fundamental interest in understanding liquids and colloidal mixtures. Because of multi-body correlations, predicting the structure and thermodynamics of such systems remains a challenge. Earlier Marshall and Chapman [J. Chem. Phys. 139, 104904 (2013)] developed a theory wherein association effects due to interactions multiply the partition function for clustering of particles in a reference hard-sphere system. The multi-body effects are incorporated in the clustering process, which in their work was obtained in the absence of the bulk medium.more » The bulk solvent effects were then modeled approximately within a second order perturbation approach. However, their approach is inadequate at high densities and for large association strengths. Based on the idea that the clustering of solvent in a defined coordination volume around the solute is related to occupancy statistics in that defined coordination volume, we develop an approach to incorporate the complete information about hard-sphere clustering in a bulk solvent at the density of interest. The occupancy probabilities are obtained from enhanced sampling simulations but we also develop a concise parametric form to model these probabilities using the quasichemical theory of solutions. We show that incorporating the complete reference information results in an approach that can predict the bonding state and thermodynamics of the colloidal solute for a wide range of system conditions.« less
Modular analysis of the probabilistic genetic interaction network.
Hou, Lin; Wang, Lin; Qian, Minping; Li, Dong; Tang, Chao; Zhu, Yunping; Deng, Minghua; Li, Fangting
2011-03-15
Epistatic Miniarray Profiles (EMAP) has enabled the mapping of large-scale genetic interaction networks; however, the quantitative information gained from EMAP cannot be fully exploited since the data are usually interpreted as a discrete network based on an arbitrary hard threshold. To address such limitations, we adopted a mixture modeling procedure to construct a probabilistic genetic interaction network and then implemented a Bayesian approach to identify densely interacting modules in the probabilistic network. Mixture modeling has been demonstrated as an effective soft-threshold technique of EMAP measures. The Bayesian approach was applied to an EMAP dataset studying the early secretory pathway in Saccharomyces cerevisiae. Twenty-seven modules were identified, and 14 of those were enriched by gold standard functional gene sets. We also conducted a detailed comparison with state-of-the-art algorithms, hierarchical cluster and Markov clustering. The experimental results show that the Bayesian approach outperforms others in efficiently recovering biologically significant modules.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rishi, Varun; Perera, Ajith; Bartlett, Rodney J., E-mail: bartlett@qtp.ufl.edu
2016-03-28
Obtaining the correct potential energy curves for the dissociation of multiple bonds is a challenging problem for ab initio methods which are affected by the choice of a spin-restricted reference function. Coupled cluster (CC) methods such as CCSD (coupled cluster singles and doubles model) and CCSD(T) (CCSD + perturbative triples) correctly predict the geometry and properties at equilibrium but the process of bond dissociation, particularly when more than one bond is simultaneously broken, is much more complicated. New modifications of CC theory suggest that the deleterious role of the reference function can be diminished, provided a particular subset of termsmore » is retained in the CC equations. The Distinguishable Cluster (DC) approach of Kats and Manby [J. Chem. Phys. 139, 021102 (2013)], seemingly overcomes the deficiencies for some bond-dissociation problems and might be of use in quasi-degenerate situations in general. DC along with other approximate coupled cluster methods such as ACCD (approximate coupled cluster doubles), ACP-D45, ACP-D14, 2CC, and pCCSD(α, β) (all defined in text) falls under a category of methods that are basically obtained by the deletion of some quadratic terms in the double excitation amplitude equation for CCD/CCSD (coupled cluster doubles model/coupled cluster singles and doubles model). Here these approximate methods, particularly those based on the DC approach, are studied in detail for the nitrogen molecule bond-breaking. The N{sub 2} problem is further addressed with conventional single reference methods but based on spatial symmetry-broken restricted Hartree–Fock (HF) solutions to assess the use of these references for correlated calculations in the situation where CC methods using fully symmetry adapted SCF solutions fail. The distinguishable cluster method is generalized: 1) to different orbitals for different spins (unrestricted HF based DCD and DCSD), 2) by adding triples correction perturbatively (DCSD(T)) and iteratively (DCSDT-n), and 3) via an excited state approximation through the equation of motion (EOM) approach (EOM-DCD, EOM-DCSD). The EOM-CC method is used to identify lower-energy CC solutions to overcome singularities in the CC potential energy curves. It is also shown that UHF based CC and DC methods behave very similarly in bond-breaking of N{sub 2}, and that using spatially broken but spin preserving SCF references makes the CCSD solutions better than those for DCSD.« less
Torheim, Turid; Groendahl, Aurora R; Andersen, Erlend K F; Lyng, Heidi; Malinen, Eirik; Kvaal, Knut; Futsaether, Cecilia M
2016-11-01
Solid tumors are known to be spatially heterogeneous. Detection of treatment-resistant tumor regions can improve clinical outcome, by enabling implementation of strategies targeting such regions. In this study, K-means clustering was used to group voxels in dynamic contrast enhanced magnetic resonance images (DCE-MRI) of cervical cancers. The aim was to identify clusters reflecting treatment resistance that could be used for targeted radiotherapy with a dose-painting approach. Eighty-one patients with locally advanced cervical cancer underwent DCE-MRI prior to chemoradiotherapy. The resulting image time series were fitted to two pharmacokinetic models, the Tofts model (yielding parameters K trans and ν e ) and the Brix model (A Brix , k ep and k el ). K-means clustering was used to group similar voxels based on either the pharmacokinetic parameter maps or the relative signal increase (RSI) time series. The associations between voxel clusters and treatment outcome (measured as locoregional control) were evaluated using the volume fraction or the spatial distribution of each cluster. One voxel cluster based on the RSI time series was significantly related to locoregional control (adjusted p-value 0.048). This cluster consisted of low-enhancing voxels. We found that tumors with poor prognosis had this RSI-based cluster gathered into few patches, making this cluster a potential candidate for targeted radiotherapy. None of the voxels clusters based on Tofts or Brix parameter maps were significantly related to treatment outcome. We identified one group of tumor voxels significantly associated with locoregional relapse that could potentially be used for dose painting. This tumor voxel cluster was identified using the raw MRI time series rather than the pharmacokinetic maps.
A comparison of heuristic and model-based clustering methods for dietary pattern analysis.
Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia
2016-02-01
Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.
Brownian model of transcriptome evolution and phylogenetic network visualization between tissues.
Gu, Xun; Ruan, Hang; Su, Zhixi; Zou, Yangyun
2017-09-01
While phylogenetic analysis of transcriptomes of the same tissue is usually congruent with the species tree, the controversy emerges when multiple tissues are included, that is, whether species from the same tissue are clustered together, or different tissues from the same species are clustered together. Recent studies have suggested that phylogenetic network approach may shed some lights on our understanding of multi-tissue transcriptome evolution; yet the underlying evolutionary mechanism remains unclear. In this paper we develop a Brownian-based model of transcriptome evolution under the phylogenetic network that can statistically distinguish between the patterns of species-clustering and tissue-clustering. Our model can be used as a null hypothesis (neutral transcriptome evolution) for testing any correlation in tissue evolution, can be applied to cancer transcriptome evolution to study whether two tumors of an individual appeared independently or via metastasis, and can be useful to detect convergent evolution at the transcriptional level. Copyright © 2017. Published by Elsevier Inc.
Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol
2007-11-27
A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries.
Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol
2007-01-01
Background A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Results Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Conclusion Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries. PMID:18047705
Lu, Liqiang; Liu, Xiaowen; Li, Tingwen; ...
2017-08-12
For this study, gas–solids flow in a three-dimension periodic domain was numerically investigated by direct numerical simulation (DNS), computational fluid dynamic-discrete element method (CFD-DEM) and two-fluid model (TFM). DNS data obtained by finely resolving the flow around every particle are used as a benchmark to assess the validity of coarser DEM and TFM approaches. The CFD-DEM predicts the correct cluster size distribution and under-predicts the macro-scale slip velocity even with a grid size as small as twice the particle diameter. The TFM approach predicts larger cluster size and lower slip velocity with a homogeneous drag correlation. Although the slip velocitymore » can be matched by a simple modification to the drag model, the predicted voidage distribution is still different from DNS: Both CFD-DEM and TFM over-predict the fraction of particles in dense regions and under-predict the fraction of particles in regions of intermediate void fractions. Also, the cluster aspect ratio of DNS is smaller than CFD-DEM and TFM. Since a simple correction to the drag model can predict a correct slip velocity, it is hopeful that drag corrections based on more elaborate theories that consider voidage gradient and particle fluctuations may be able to improve the current predictions of cluster distribution.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Liqiang; Liu, Xiaowen; Li, Tingwen
For this study, gas–solids flow in a three-dimension periodic domain was numerically investigated by direct numerical simulation (DNS), computational fluid dynamic-discrete element method (CFD-DEM) and two-fluid model (TFM). DNS data obtained by finely resolving the flow around every particle are used as a benchmark to assess the validity of coarser DEM and TFM approaches. The CFD-DEM predicts the correct cluster size distribution and under-predicts the macro-scale slip velocity even with a grid size as small as twice the particle diameter. The TFM approach predicts larger cluster size and lower slip velocity with a homogeneous drag correlation. Although the slip velocitymore » can be matched by a simple modification to the drag model, the predicted voidage distribution is still different from DNS: Both CFD-DEM and TFM over-predict the fraction of particles in dense regions and under-predict the fraction of particles in regions of intermediate void fractions. Also, the cluster aspect ratio of DNS is smaller than CFD-DEM and TFM. Since a simple correction to the drag model can predict a correct slip velocity, it is hopeful that drag corrections based on more elaborate theories that consider voidage gradient and particle fluctuations may be able to improve the current predictions of cluster distribution.« less
Liu, Jingxia; Colditz, Graham A
2018-05-01
There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the "working correlation structure" is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs-exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.
ERIC Educational Resources Information Center
Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald
2002-01-01
Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)
Performance analysis of clustering techniques over microarray data: A case study
NASA Astrophysics Data System (ADS)
Dash, Rasmita; Misra, Bijan Bihari
2018-03-01
Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
Rhodes, Scott D.; McCoy, Thomas P.
2014-01-01
This study explored correlates of condom use within a respondent-driven sample of 190 Spanish-speaking immigrant Latino sexual minorities, including gay and bisexual men, other men who have sex with men (MSM), and transgender person, in North Carolina. Five analytic approaches for modeling data collected using respondent-driven sampling (RDS) were compared. Across most approaches, knowledge of HIV and sexually transmitted infections (STIs) and increased condom use self-efficacy predicted consistent condom use and increased homophobia predicted decreased consistent condom use. The same correlates were not significant in all analyses but were consistent in most. Clustering due to recruitment chains was low, while clustering due to recruiter was substantial. This highlights the importance accounting for clustering when analyzing RDS data. PMID:25646728
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Junghyun; Gangwon, Jo; Jaehoon, Jung
Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined inmore » a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.« less
Statistical mechanics of high-density bond percolation
NASA Astrophysics Data System (ADS)
Timonin, P. N.
2018-05-01
High-density (HD) percolation describes the percolation of specific κ -clusters, which are the compact sets of sites each connected to κ nearest filled sites at least. It takes place in the classical patterns of independently distributed sites or bonds in which the ordinary percolation transition also exists. Hence, the study of series of κ -type HD percolations amounts to the description of classical clusters' structure for which κ -clusters constitute κ -cores nested one into another. Such data are needed for description of a number of physical, biological, and information properties of complex systems on random lattices, graphs, and networks. They range from magnetic properties of semiconductor alloys to anomalies in supercooled water and clustering in biological and social networks. Here we present the statistical mechanics approach to study HD bond percolation on an arbitrary graph. It is shown that the generating function for κ -clusters' size distribution can be obtained from the partition function of the specific q -state Potts-Ising model in the q →1 limit. Using this approach we find exact κ -clusters' size distributions for the Bethe lattice and Erdos-Renyi graph. The application of the method to Euclidean lattices is also discussed.
Probing potential Li-ion battery electrolyte through first principles simulation of atomic clusters
NASA Astrophysics Data System (ADS)
Kushwaha, Anoop Kumar; Sahoo, Mihir Ranjan; Nayak, Saroj
2018-04-01
Li-ion battery has wide area of application starting from low power consumer electronics to high power electric vehicles. However, their large scale application in electric vehicles requires further improvement due to their low specific power density which is an essential parameter and is closely related to the working potential windows of the battery system. Several studies have found that these parameters can be taken care of by considering different cathode/anode materials and electrolytes. Recently, a unique approach has been reported on the basis of cluster size in which the use of Li3 cluster has been suggested as a potential component of the battery electrode material. The cluster based approach significantly enhances the working electrode potential up to 0.6V in the acetonitrile solvent. In the present work, using ab-initio quantum chemical calculation and the dielectric continuum model, we have investigated various dielectric solvent medium for the suitable electrolyte for the potential component Li3 cluster. This study suggests that high dielectric electrolytic solvent (ethylene carbonate and propylene carbonate) could be better for lithium cluster due to improvement in the total electrode potential in comparison to the other dielectric solvent.
Metal Cluster Models for Heterogeneous Catalysis: A Matrix-Isolation Perspective.
Hübner, Olaf; Himmel, Hans-Jörg
2018-02-19
Metal cluster models are of high relevance for establishing new mechanistic concepts for heterogeneous catalysis. The high reactivity and particular selectivity of metal clusters is caused by the wealth of low-lying electronically excited states that are often thermally populated. Thereby the metal clusters are flexible with regard to their electronic structure and can adjust their states to be appropriate for the reaction with a particular substrate. The matrix isolation technique is ideally suited for studying excited state reactivity. The low matrix temperatures (generally 4-40 K) of the noble gas matrix host guarantee that all clusters are in their electronic ground-state (with only a very few exceptions). Electronically excited states can then be selectively populated and their reactivity probed. Unfortunately, a systematic research in this direction has not been made up to date. The purpose of this review is to provide the grounds for a directed approach to understand cluster reactivity through matrix-isolation studies combined with quantum chemical calculations. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Donoso, Roberto; Fuentealba, Patricio, E-mail: pfuentea@hotmail.es, E-mail: cardena@macul.ciencias.uchile.cl; Cárdenas, Carlos, E-mail: pfuentea@hotmail.es, E-mail: cardena@macul.ciencias.uchile.cl
In this work, a model to explain the unusual stability of atomic lithium clusters in their highest spin multiplicity is presented and used to describe the ferromagnetic bonding of high-spin Li{sub 10} and Li{sub 8} clusters. The model associates the (lack of-)fitness of Heisenberg Hamiltonian with the degree of (de-)localization of the valence electrons in the cluster. It is shown that a regular Heisenberg Hamiltonian with four coupling constants cannot fully explain the energy of the different spin states. However, a more simple model in which electrons are located not at the position of the nuclei but at the positionmore » of the attractors of the electron localization function succeeds in explaining the energy spectrum and, at the same time, explains the ferromagnetic bond found by Shaik using arguments of valence bond theory. In this way, two different points of view, one more often used in physics, the Heisenberg model, and the other in chemistry, valence bond, come to the same answer to explain those atypical bonds.« less
A stress sensitivity model for the permeability of porous media based on bi-dispersed fractal theory
NASA Astrophysics Data System (ADS)
Tan, X.-H.; Liu, C.-Y.; Li, X.-P.; Wang, H.-Q.; Deng, H.
A stress sensitivity model for the permeability of porous media based on bidispersed fractal theory is established, considering the change of the flow path, the fractal geometry approach and the mechanics of porous media. It is noted that the two fractal parameters of the porous media construction perform differently when the stress changes. The tortuosity fractal dimension of solid cluster DcTσ become bigger with an increase of stress. However, the pore fractal dimension of solid cluster Dcfσ and capillary bundle Dpfσ remains the same with an increase of stress. The definition of normalized permeability is introduced for the analyzation of the impacts of stress sensitivity on permeability. The normalized permeability is related to solid cluster tortuosity dimension, pore fractal dimension, solid cluster maximum diameter, Young’s modulus and Poisson’s ratio. Every parameter has clear physical meaning without the use of empirical constants. Predictions of permeability of the model is accordant with the obtained experimental data. Thus, the proposed model can precisely depict the flow of fluid in porous media under stress.
Image segmentation using fuzzy LVQ clustering networks
NASA Technical Reports Server (NTRS)
Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.
1992-01-01
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
MODEL-FREE MULTI-PROBE LENSING RECONSTRUCTION OF CLUSTER MASS PROFILES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Umetsu, Keiichi
2013-05-20
Lens magnification by galaxy clusters induces characteristic spatial variations in the number counts of background sources, amplifying their observed fluxes and expanding the area of sky, the net effect of which, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function. The bias is strongly negative for red galaxies, dominated by the geometric area distortion, whereas it is mildly positive for blue galaxies, enhancing the blue counts toward the cluster center. We generalize the Bayesian approach of Umetsu et al. for reconstructing projected cluster mass profiles, by incorporating multiple populations of background sources for magnification-biasmore » measurements and combining them with complementary lens-distortion measurements, effectively breaking the mass-sheet degeneracy and improving the statistical precision of cluster mass measurements. The approach can be further extended to include strong-lensing projected mass estimates, thus allowing for non-parametric absolute mass determinations in both the weak and strong regimes. We apply this method to our recent CLASH lensing measurements of MACS J1206.2-0847, and demonstrate how combining multi-probe lensing constraints can improve the reconstruction of cluster mass profiles. This method will also be useful for a stacked lensing analysis, combining all lensing-related effects in the cluster regime, for a definitive determination of the averaged mass profile.« less
Helium segregation on surfaces of plasma-exposed tungsten
Maroudas, Dimitrios; Blondel, Sophie; Hu, Lin; ...
2016-01-21
Here we report a hierarchical multi-scale modeling study of implanted helium segregation on surfaces of tungsten, considered as a plasma facing component in nuclear fusion reactors. We employ a hierarchy of atomic-scale simulations based on a reliable interatomic interaction potential, including molecular-statics simulations to understand the origin of helium surface segregation, targeted molecular-dynamics (MD) simulations of near-surface cluster reactions, and large-scale MD simulations of implanted helium evolution in plasma-exposed tungsten. We find that small, mobile He-n (1 <= n <= 7) clusters in the near-surface region are attracted to the surface due to an elastic interaction force that provides themore » thermodynamic driving force for surface segregation. Elastic interaction force induces drift fluxes of these mobile Hen clusters, which increase substantially as the migrating clusters approach the surface, facilitating helium segregation on the surface. Moreover, the clusters' drift toward the surface enables cluster reactions, most importantly trap mutation, in the near-surface region at rates much higher than in the bulk material. Moreover, these near-surface cluster dynamics have significant effects on the surface morphology, near-surface defect structures, and the amount of helium retained in the material upon plasma exposure. We integrate the findings of such atomic-scale simulations into a properly parameterized and validated spatially dependent, continuum-scale reaction-diffusion cluster dynamics model, capable of predicting implanted helium evolution, surface segregation, and its near-surface effects in tungsten. This cluster-dynamics model sets the stage for development of fully atomistically informed coarse-grained models for computationally efficient simulation predictions of helium surface segregation, as well as helium retention and surface morphological evolution, toward optimal design of plasma facing components.« less
Multilevel SEM Strategies for Evaluating Mediation in Three-Level Data
ERIC Educational Resources Information Center
Preacher, Kristopher J.
2011-01-01
Strategies for modeling mediation effects in multilevel data have proliferated over the past decade, keeping pace with the demands of applied research. Approaches for testing mediation hypotheses with 2-level clustered data were first proposed using multilevel modeling (MLM) and subsequently using multilevel structural equation modeling (MSEM) to…
Pakhomov, Serguei V.S.; Hemmy, Laura S.
2014-01-01
Generative semantic verbal fluency (SVF) tests show early and disproportionate decline relative to other abilities in individuals developing Alzheimer’s disease. Optimal performance on SVF tests depends on the efficiency of using clustered organization of semantically related items and the ability to switch between clusters. Traditional approaches to clustering and switching have relied on manual determination of clusters. We evaluated a novel automated computational linguistic approach for quantifying clustering behavior. Our approach is based on Latent Semantic Analysis (LSA) for computing strength of semantic relatedness between pairs of words produced in response to SVF test. The mean size of semantic clusters (MCS) and semantic chains (MChS) are calculated based on pairwise relatedness values between words. We evaluated the predictive validity of these measures on a set of 239 participants in the Nun Study, a longitudinal study of aging. All were cognitively intact at baseline assessment, measured with the CERAD battery, and were followed in 18 month waves for up to 20 years. The onset of either dementia or memory impairment were used as outcomes in Cox proportional hazards models adjusted for age and education and censored at follow up waves 5 (6.3 years) and 13 (16.96 years). Higher MCS was associated with 38% reduction in dementia risk at wave 5 and 26% reduction at wave 13, but not with the onset of memory impairment. Higher (+1 SD) MChS was associated with 39% dementia risk reduction at wave 5 but not wave 13, and association with memory impairment was not significant. Higher traditional SVF scores were associated with 22–29% memory impairment and 35–40% dementia risk reduction. SVF scores were not correlated with either MCS or MChS. Our study suggests that an automated approach to measuring clustering behavior can be used to estimate dementia risk in cognitively normal individuals. PMID:23845236
Pakhomov, Serguei V S; Hemmy, Laura S
2014-06-01
Generative semantic verbal fluency (SVF) tests show early and disproportionate decline relative to other abilities in individuals developing Alzheimer's disease. Optimal performance on SVF tests depends on the efficiency of using clustered organization of semantically related items and the ability to switch between clusters. Traditional approaches to clustering and switching have relied on manual determination of clusters. We evaluated a novel automated computational linguistic approach for quantifying clustering behavior. Our approach is based on Latent Semantic Analysis (LSA) for computing strength of semantic relatedness between pairs of words produced in response to SVF test. The mean size of semantic clusters (MCS) and semantic chains (MChS) are calculated based on pairwise relatedness values between words. We evaluated the predictive validity of these measures on a set of 239 participants in the Nun Study, a longitudinal study of aging. All were cognitively intact at baseline assessment, measured with the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) battery, and were followed in 18-month waves for up to 20 years. The onset of either dementia or memory impairment were used as outcomes in Cox proportional hazards models adjusted for age and education and censored at follow-up waves 5 (6.3 years) and 13 (16.96 years). Higher MCS was associated with 38% reduction in dementia risk at wave 5 and 26% reduction at wave 13, but not with the onset of memory impairment. Higher [+1 standard deviation (SD)] MChS was associated with 39% dementia risk reduction at wave 5 but not wave 13, and association with memory impairment was not significant. Higher traditional SVF scores were associated with 22-29% memory impairment and 35-40% dementia risk reduction. SVF scores were not correlated with either MCS or MChS. Our study suggests that an automated approach to measuring clustering behavior can be used to estimate dementia risk in cognitively normal individuals. Copyright © 2013 Elsevier Ltd. All rights reserved.
The clustering of baryonic matter. I: a halo-model approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fedeli, C., E-mail: cosimo.fedeli@oabo.inaf.it
2014-04-01
In this paper I generalize the halo model for the clustering of dark matter in order to produce the power spectra of the two main baryonic matter components in the Universe: stars and hot gas. As a natural extension, this can be also used to describe the clustering of all mass. According to the design of the halo model, the large-scale power spectra of the various matter components are physically connected with the distribution of each component within bound structures and thus, ultimately, with the complete set of physical processes that drive the formation of galaxies and galaxy clusters. Besidesmore » being practical for cosmological and parametric studies, the semi-analytic model presented here has also other advantages. Most importantly, it allows one to understand on physical ground what is the relative contribution of each matter component to the total clustering of mass as a function of scale, and thus it opens an interesting new window to infer the distribution of baryons through high precision cosmic shear measurements. This is particularly relevant for future wide-field photometric surveys such as Euclid. In this work the concept of the model and its uncertainties are illustrated in detail, while in a companion paper we use a set of numerical hydrodynamic simulations to show a practical application and to investigate where the model itself needs to be improved.« less
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.
Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain
2015-10-01
Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-01-01
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095
Clustering-based ensemble learning for activity recognition in smart homes.
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-07-10
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
On simulations of rarefied vapor flows with condensation
NASA Astrophysics Data System (ADS)
Bykov, Nikolay; Gorbachev, Yuriy; Fyodorov, Stanislav
2018-05-01
Results of the direct simulation Monte Carlo of 1D spherical and 2D axisymmetric expansions into vacuum of condens-ing water vapor are presented. Two models based on the kinetic approach and the size-corrected classical nucleation theory are employed for simulations. The difference in obtained results is discussed and advantages of the kinetic approach in comparison with the modified classical theory are demonstrated. The impact of clusterization on flow parameters is observed when volume fraction of clusters in the expansion region exceeds 5%. Comparison of the simulation data with the experimental results demonstrates good agreement.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Improving the Statistical Modeling of the TRMM Extreme Precipitation Monitoring System
NASA Astrophysics Data System (ADS)
Demirdjian, L.; Zhou, Y.; Huffman, G. J.
2016-12-01
This project improves upon an existing extreme precipitation monitoring system based on the Tropical Rainfall Measuring Mission (TRMM) daily product (3B42) using new statistical models. The proposed system utilizes a regional modeling approach, where data from similar grid locations are pooled to increase the quality and stability of the resulting model parameter estimates to compensate for the short data record. The regional frequency analysis is divided into two stages. In the first stage, the region defined by the TRMM measurements is partitioned into approximately 27,000 non-overlapping clusters using a recursive k-means clustering scheme. In the second stage, a statistical model is used to characterize the extreme precipitation events occurring in each cluster. Instead of utilizing the block-maxima approach used in the existing system, where annual maxima are fit to the Generalized Extreme Value (GEV) probability distribution at each cluster separately, the present work adopts the peak-over-threshold (POT) method of classifying points as extreme if they exceed a pre-specified threshold. Theoretical considerations motivate the use of the Generalized-Pareto (GP) distribution for fitting threshold exceedances. The fitted parameters can be used to construct simple and intuitive average recurrence interval (ARI) maps which reveal how rare a particular precipitation event is given its spatial location. The new methodology eliminates much of the random noise that was produced by the existing models due to a short data record, producing more reasonable ARI maps when compared with NOAA's long-term Climate Prediction Center (CPC) ground based observations. The resulting ARI maps can be useful for disaster preparation, warning, and management, as well as increased public awareness of the severity of precipitation events. Furthermore, the proposed methodology can be applied to various other extreme climate records.
A transversal approach to predict gene product networks from ontology-based similarity
Chabalier, Julie; Mosser, Jean; Burgun, Anita
2007-01-01
Background Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression. Results The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity. Conclusion Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression. PMID:17605807
Mathematical modelling of complex contagion on clustered networks
NASA Astrophysics Data System (ADS)
O'sullivan, David J.; O'Keeffe, Gary; Fennell, Peter; Gleeson, James
2015-09-01
The spreading of behavior, such as the adoption of a new innovation, is influenced bythe structure of social networks that interconnect the population. In the experiments of Centola (Science, 2010), adoption of new behavior was shown to spread further and faster across clustered-lattice networks than across corresponding random networks. This implies that the “complex contagion” effects of social reinforcement are important in such diffusion, in contrast to “simple” contagion models of disease-spread which predict that epidemics would grow more efficiently on random networks than on clustered networks. To accurately model complex contagion on clustered networks remains a challenge because the usual assumptions (e.g. of mean-field theory) regarding tree-like networks are invalidated by the presence of triangles in the network; the triangles are, however, crucial to the social reinforcement mechanism, which posits an increased probability of a person adopting behavior that has been adopted by two or more neighbors. In this paper we modify the analytical approach that was introduced by Hebert-Dufresne et al. (Phys. Rev. E, 2010), to study disease-spread on clustered networks. We show how the approximation method can be adapted to a complex contagion model, and confirm the accuracy of the method with numerical simulations. The analytical results of the model enable us to quantify the level of social reinforcement that is required to observe—as in Centola’s experiments—faster diffusion on clustered topologies than on random networks.
Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald
2011-06-01
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852
Dynamic structural disorder in supported nanoscale catalysts
NASA Astrophysics Data System (ADS)
Rehr, J. J.; Vila, F. D.
2014-04-01
We investigate the origin and physical effects of "dynamic structural disorder" (DSD) in supported nano-scale catalysts. DSD refers to the intrinsic fluctuating, inhomogeneous structure of such nano-scale systems. In contrast to bulk materials, nano-scale systems exhibit substantial fluctuations in structure, charge, temperature, and other quantities, as well as large surface effects. The DSD is driven largely by the stochastic librational motion of the center of mass and fluxional bonding at the nanoparticle surface due to thermal coupling with the substrate. Our approach for calculating and understanding DSD is based on a combination of real-time density functional theory/molecular dynamics simulations, transient coupled-oscillator models, and statistical mechanics. This approach treats thermal and dynamic effects over multiple time-scales, and includes bond-stretching and -bending vibrations, and transient tethering to the substrate at longer ps time-scales. Potential effects on the catalytic properties of these clusters are briefly explored. Model calculations of molecule-cluster interactions and molecular dissociation reaction paths are presented in which the reactant molecules are adsorbed on the surface of dynamically sampled clusters. This model suggests that DSD can affect both the prefactors and distribution of energy barriers in reaction rates, and thus can significantly affect catalytic activity at the nano-scale.
NASA Technical Reports Server (NTRS)
Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara
2000-01-01
We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hartman, Joshua D.; Beran, Gregory J. O., E-mail: gregory.beran@ucr.edu; Monaco, Stephen
2015-09-14
We assess the quality of fragment-based ab initio isotropic {sup 13}C chemical shift predictions for a collection of 25 molecular crystals with eight different density functionals. We explore the relative performance of cluster, two-body fragment, combined cluster/fragment, and the planewave gauge-including projector augmented wave (GIPAW) models relative to experiment. When electrostatic embedding is employed to capture many-body polarization effects, the simple and computationally inexpensive two-body fragment model predicts both isotropic {sup 13}C chemical shifts and the chemical shielding tensors as well as both cluster models and the GIPAW approach. Unlike the GIPAW approach, hybrid density functionals can be used readilymore » in a fragment model, and all four hybrid functionals tested here (PBE0, B3LYP, B3PW91, and B97-2) predict chemical shifts in noticeably better agreement with experiment than the four generalized gradient approximation (GGA) functionals considered (PBE, OPBE, BLYP, and BP86). A set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided based on these benchmark calculations. Statistical cross-validation procedures are used to demonstrate the robustness of these fits.« less
SEMIPARAMETRIC EFFICIENT ESTIMATION FOR SHARED-FRAILTY MODELS WITH DOUBLY-CENSORED CLUSTERED DATA
Wang, Jane-Ling
2018-01-01
In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed “doubly-censored data”. This model extends current survival literature by broadening the application of frailty models from right-censoring to a more complicated situation with additional left censoring. Our approach is motivated by a recent Hepatitis B study where the sample consists of families. We adopt a likelihood approach that aims at the nonparametric maximum likelihood estimators (NPMLE). A new algorithm is proposed, which not only works well for clustered data but also improve over existing algorithm for independent and doubly-censored data, a special case when the frailty variable is a constant equal to one. This special case is well known to be a computational challenge due to the left censoring feature of the data. The new algorithm not only resolves this challenge but also accommodate the additional frailty variable effectively. Asymptotic properties of the NPMLE are established along with semi-parametric efficiency of the NPMLE for the finite-dimensional parameters. The consistency of Bootstrap estimators for the standard errors of the NPMLE is also discussed. We conducted some simulations to illustrate the numerical performance and robustness of the proposed algorithm, which is also applied to the Hepatitis B data. PMID:29527068
Android Malware Classification Using K-Means Clustering Algorithm
NASA Astrophysics Data System (ADS)
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
Estimating Function Approaches for Spatial Point Processes
NASA Astrophysics Data System (ADS)
Deng, Chong
Spatial point pattern data consist of locations of events that are often of interest in biological and ecological studies. Such data are commonly viewed as a realization from a stochastic process called spatial point process. To fit a parametric spatial point process model to such data, likelihood-based methods have been widely studied. However, while maximum likelihood estimation is often too computationally intensive for Cox and cluster processes, pairwise likelihood methods such as composite likelihood, Palm likelihood usually suffer from the loss of information due to the ignorance of correlation among pairs. For many types of correlated data other than spatial point processes, when likelihood-based approaches are not desirable, estimating functions have been widely used for model fitting. In this dissertation, we explore the estimating function approaches for fitting spatial point process models. These approaches, which are based on the asymptotic optimal estimating function theories, can be used to incorporate the correlation among data and yield more efficient estimators. We conducted a series of studies to demonstrate that these estmating function approaches are good alternatives to balance the trade-off between computation complexity and estimating efficiency. First, we propose a new estimating procedure that improves the efficiency of pairwise composite likelihood method in estimating clustering parameters. Our approach combines estimating functions derived from pairwise composite likeli-hood estimation and estimating functions that account for correlations among the pairwise contributions. Our method can be used to fit a variety of parametric spatial point process models and can yield more efficient estimators for the clustering parameters than pairwise composite likelihood estimation. We demonstrate its efficacy through a simulation study and an application to the longleaf pine data. Second, we further explore the quasi-likelihood approach on fitting second-order intensity function of spatial point processes. However, the original second-order quasi-likelihood is barely feasible due to the intense computation and high memory requirement needed to solve a large linear system. Motivated by the existence of geometric regular patterns in the stationary point processes, we find a lower dimension representation of the optimal weight function and propose a reduced second-order quasi-likelihood approach. Through a simulation study, we show that the proposed method not only demonstrates superior performance in fitting the clustering parameter but also merits in the relaxation of the constraint of the tuning parameter, H. Third, we studied the quasi-likelihood type estimating funciton that is optimal in a certain class of first-order estimating functions for estimating the regression parameter in spatial point process models. Then, by using a novel spectral representation, we construct an implementation that is computationally much more efficient and can be applied to more general setup than the original quasi-likelihood method.
Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach.
Liang, Muxuan; Li, Zhizhong; Chen, Ting; Zeng, Jianyang
2015-01-01
Identification of cancer subtypes plays an important role in revealing useful insights into disease pathogenesis and advancing personalized therapy. The recent development of high-throughput sequencing technologies has enabled the rapid collection of multi-platform genomic data (e.g., gene expression, miRNA expression, and DNA methylation) for the same set of tumor samples. Although numerous integrative clustering approaches have been developed to analyze cancer data, few of them are particularly designed to exploit both deep intrinsic statistical properties of each input modality and complex cross-modality correlations among multi-platform input data. In this paper, we propose a new machine learning model, called multimodal deep belief network (DBN), to cluster cancer patients from multi-platform observation data. In our integrative clustering framework, relationships among inherent features of each single modality are first encoded into multiple layers of hidden variables, and then a joint latent model is employed to fuse common features derived from multiple input modalities. A practical learning algorithm, called contrastive divergence (CD), is applied to infer the parameters of our multimodal DBN model in an unsupervised manner. Tests on two available cancer datasets show that our integrative data analysis approach can effectively extract a unified representation of latent features to capture both intra- and cross-modality correlations, and identify meaningful disease subtypes from multi-platform cancer data. In addition, our approach can identify key genes and miRNAs that may play distinct roles in the pathogenesis of different cancer subtypes. Among those key miRNAs, we found that the expression level of miR-29a is highly correlated with survival time in ovarian cancer patients. These results indicate that our multimodal DBN based data analysis approach may have practical applications in cancer pathogenesis studies and provide useful guidelines for personalized cancer therapy.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
Individual participant data meta-analyses should not ignore clustering
Abo-Zaid, Ghada; Guo, Boliang; Deeks, Jonathan J.; Debray, Thomas P.A.; Steyerberg, Ewout W.; Moons, Karel G.M.; Riley, Richard David
2013-01-01
Objectives Individual participant data (IPD) meta-analyses often analyze their IPD as if coming from a single study. We compare this approach with analyses that rather account for clustering of patients within studies. Study Design and Setting Comparison of effect estimates from logistic regression models in real and simulated examples. Results The estimated prognostic effect of age in patients with traumatic brain injury is similar, regardless of whether clustering is accounted for. However, a family history of thrombophilia is found to be a diagnostic marker of deep vein thrombosis [odds ratio, 1.30; 95% confidence interval (CI): 1.00, 1.70; P = 0.05] when clustering is accounted for but not when it is ignored (odds ratio, 1.06; 95% CI: 0.83, 1.37; P = 0.64). Similarly, the treatment effect of nicotine gum on smoking cessation is severely attenuated when clustering is ignored (odds ratio, 1.40; 95% CI: 1.02, 1.92) rather than accounted for (odds ratio, 1.80; 95% CI: 1.29, 2.52). Simulations show models accounting for clustering perform consistently well, but downwardly biased effect estimates and low coverage can occur when ignoring clustering. Conclusion Researchers must routinely account for clustering in IPD meta-analyses; otherwise, misleading effect estimates and conclusions may arise. PMID:23651765
Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu
2017-04-01
The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.
Patterns of Dysmorphic Features in Schizophrenia
Scutt, L.E.; Chow, E.W.C.; Weksberg, R.; Honer, W.G.; Bassett, Anne S.
2011-01-01
Congenital dysmorphic features are prevalent in schizophrenia and may reflect underlying neurodevelopmental abnormalities. A cluster analysis approach delineating patterns of dysmorphic features has been used in genetics to classify individuals into more etiologically homogeneous subgroups. In the present study, this approach was applied to schizophrenia, using a sample with a suspected genetic syndrome as a testable model. Subjects (n = 159) with schizophrenia or schizoaffective disorder were ascertained from chronic patient populations (random, n=123) or referred with possible 22q11 deletion syndrome (referred, n = 36). All subjects were evaluated for presence or absence of 70 reliably assessed dysmorphic features, which were used in a three-step cluster analysis. The analysis produced four major clusters with different patterns of dysmorphic features. Significant between-cluster differences were found for rates of 37 dysmorphic features (P < 0.05), median number of dysmorphic features (P = 0.0001), and validating features not used in the cluster analysis: mild mental retardation (P = 0.001) and congenital heart defects (P = 0.002). Two clusters (1 and 4) appeared to represent more developmental subgroups of schizophrenia with elevated rates of dysmorphic features and validating features. Cluster 1 (n = 27) comprised mostly referred subjects. Cluster 4 (n= 18) had a different pattern of dysmorphic features; one subject had a mosaic Turner syndrome variant. Two other clusters had lower rates and patterns of features consistent with those found in previous studies of schizophrenia. Delineating patterns of dysmorphic features may help identify subgroups that could represent neurodevelopmental forms of schizophrenia with more homogeneous origins. PMID:11803519
A Game Theoretic Approach for Balancing Energy Consumption in Clustered Wireless Sensor Networks
Lu, Yinzhi; Xiong, Lian; Tao, Yang; Zhong, Yuanchang
2017-01-01
Clustering is an effective topology control method in wireless sensor networks (WSNs), since it can enhance the network lifetime and scalability. To prolong the network lifetime in clustered WSNs, an efficient cluster head (CH) optimization policy is essential to distribute the energy among sensor nodes. Recently, game theory has been introduced to model clustering. Each sensor node is considered as a rational and selfish player which will play a clustering game with an equilibrium strategy. Then it decides whether to act as the CH according to this strategy for a tradeoff between providing required services and energy conservation. However, how to get the equilibrium strategy while maximizing the payoff of sensor nodes has rarely been addressed to date. In this paper, we present a game theoretic approach for balancing energy consumption in clustered WSNs. With our novel payoff function, realistic sensor behaviors can be captured well. The energy heterogeneity of nodes is considered by incorporating a penalty mechanism in the payoff function, so the nodes with more energy will compete for CHs more actively. We have obtained the Nash equilibrium (NE) strategy of the clustering game through convex optimization. Specifically, each sensor node can achieve its own maximal payoff when it makes the decision according to this strategy. Through plenty of simulations, our proposed game theoretic clustering is proved to have a good energy balancing performance and consequently the network lifetime is greatly enhanced. PMID:29149075
NASA Astrophysics Data System (ADS)
Zhang, Ying; Moges, Semu; Block, Paul
2018-01-01
Prediction of seasonal precipitation can provide actionable information to guide management of various sectoral activities. For instance, it is often translated into hydrological forecasts for better water resources management. However, many studies assume homogeneity in precipitation across an entire study region, which may prove ineffective for operational and local-level decisions, particularly for locations with high spatial variability. This study proposes advancing local-level seasonal precipitation predictions by first conditioning on regional-level predictions, as defined through objective cluster analysis, for western Ethiopia. To our knowledge, this is the first study predicting seasonal precipitation at high resolution in this region, where lives and livelihoods are vulnerable to precipitation variability given the high reliance on rain-fed agriculture and limited water resources infrastructure. The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances in prediction skill and resolution, as compared with previous studies. The statistical model improves versus the non-clustered case or dynamical models for a number of specific clusters in northwestern Ethiopia, with clusters having regional average correlation and ranked probability skill score (RPSS) values of up to 0.5 and 33 %, respectively. The general skill (after bias correction) of the two best-performing dynamical models over the entire study region is superior to that of the statistical models, although the dynamical models issue predictions at a lower resolution and the raw predictions require bias correction to guarantee comparable skills.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pal, Ranjan; Chelmis, Charalampos; Aman, Saima
The advent of smart meters and advanced communication infrastructures catalyzes numerous smart grid applications such as dynamic demand response, and paves the way to solve challenging research problems in sustainable energy consumption. The space of solution possibilities are restricted primarily by the huge amount of generated data requiring considerable computational resources and efficient algorithms. To overcome this Big Data challenge, data clustering techniques have been proposed. Current approaches however do not scale in the face of the “increasing dimensionality” problem where a cluster point is represented by the entire customer consumption time series. To overcome this aspect we first rethinkmore » the way cluster points are created and designed, and then design an efficient online clustering technique for demand response (DR) in order to analyze high volume, high dimensional energy consumption time series data at scale, and on the fly. Our online algorithm is randomized in nature, and provides optimal performance guarantees in a computationally efficient manner. Unlike prior work we (i) study the consumption properties of the whole population simultaneously rather than developing individual models for each customer separately, claiming it to be a ‘killer’ approach that breaks the “curse of dimensionality” in online time series clustering, and (ii) provide tight performance guarantees in theory to validate our approach. Our insights are driven by the field of sociology, where collective behavior often emerges as the result of individual patterns and lifestyles.« less
Multiple imputation methods for bivariate outcomes in cluster randomised trials.
DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R
2016-09-10
Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Scaling behavior of ground-state energy cluster expansion for linear polyenes
NASA Astrophysics Data System (ADS)
Griffin, L. L.; Wu, Jian; Klein, D. J.; Schmalz, T. G.; Bytautas, L.
Ground-state energies for linear-chain polyenes are additively expanded in a sequence of terms for chemically relevant conjugated substructures of increasing size. The asymptotic behavior of the large-substructure limit (i.e., high-polymer limit) is investigated as a means of characterizing the rapidity of convergence and consequent utility of this energy cluster expansion. Consideration is directed to computations via: simple Hückel theory, a refined Hückel scheme with geometry optimization, restricted Hartree-Fock self-consistent field (RHF-SCF) solutions of fixed bond-length Parisier-Parr-Pople (PPP)/Hubbard models, and ab initio SCF approaches with and without geometry optimization. The cluster expansion in what might be described as the more "refined" approaches appears to lead to qualitatively more rapid convergence: exponentially fast as opposed to an inverse power at the simple Hückel or SCF-Hubbard levels. The substructural energy cluster expansion then seems to merit special attention. Its possible utility in making accurate extrapolations from finite systems to extended polymers is noted.
Cluster Analysis of Weighted Bipartite Networks: A New Copula-Based Approach
Chessa, Alessandro; Crimaldi, Irene; Riccaboni, Massimo; Trapin, Luca
2014-01-01
In this work we are interested in identifying clusters of “positional equivalent” actors, i.e. actors who play a similar role in a system. In particular, we analyze weighted bipartite networks that describes the relationships between actors on one side and features or traits on the other, together with the intensity level to which actors show their features. We develop a methodological approach that takes into account the underlying multivariate dependence among groups of actors. The idea is that positions in a network could be defined on the basis of the similar intensity levels that the actors exhibit in expressing some features, instead of just considering relationships that actors hold with each others. Moreover, we propose a new clustering procedure that exploits the potentiality of copula functions, a mathematical instrument for the modelization of the stochastic dependence structure. Our clustering algorithm can be applied both to binary and real-valued matrices. We validate it with simulations and applications to real-world data. PMID:25303095
Social aggregation as a cooperative game
NASA Astrophysics Data System (ADS)
Vilone, Daniele; Guazzini, Andrea
2011-07-01
A new approach for the description of phenomena of social aggregation is suggested. On the basis of psychological concepts (as for instance social norms and cultural coordinates), we deduce a general mechanism for social aggregation in which different clusters of individuals can merge according to cooperation among the agents. In their turn, the agents can cooperate or defect according to the clusters' distribution inside the system. The fitness of an individual increases with the size of its cluster, but decreases with the work the individual had to do in order to join it. In order to test the reliability of such a new approach, we introduce a couple of simple toy models with the features illustrated above. We see, from this preliminary study, how cooperation is the most convenient strategy only in the presence of very large clusters, while on the other hand it is not necessary to have one hundred percent of cooperators for reaching a totally ordered configuration with only one megacluster filling the whole system.
NASA Technical Reports Server (NTRS)
Li, Zhenlong; Hu, Fei; Schnase, John L.; Duffy, Daniel Q.; Lee, Tsengdar; Bowen, Michael K.; Yang, Chaowei
2016-01-01
Climate observations and model simulations are producing vast amounts of array-based spatiotemporal data. Efficient processing of these data is essential for assessing global challenges such as climate change, natural disasters, and diseases. This is challenging not only because of the large data volume, but also because of the intrinsic high-dimensional nature of geoscience data. To tackle this challenge, we propose a spatiotemporal indexing approach to efficiently manage and process big climate data with MapReduce in a highly scalable environment. Using this approach, big climate data are directly stored in a Hadoop Distributed File System in its original, native file format. A spatiotemporal index is built to bridge the logical array-based data model and the physical data layout, which enables fast data retrieval when performing spatiotemporal queries. Based on the index, a data-partitioning algorithm is applied to enable MapReduce to achieve high data locality, as well as balancing the workload. The proposed indexing approach is evaluated using the National Aeronautics and Space Administration (NASA) Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. The experimental results show that the index can significantly accelerate querying and processing (10 speedup compared to the baseline test using the same computing cluster), while keeping the index-to-data ratio small (0.0328). The applicability of the indexing approach is demonstrated by a climate anomaly detection deployed on a NASA Hadoop cluster. This approach is also able to support efficient processing of general array-based spatiotemporal data in various geoscience domains without special configuration on a Hadoop cluster.
Mallants, Dirk; Batelaan, Okke; Gedeon, Matej; Huysmans, Marijke; Dassargues, Alain
2017-01-01
Cone penetration testing (CPT) is one of the most efficient and versatile methods currently available for geotechnical, lithostratigraphic and hydrogeological site characterization. Currently available methods for soil behaviour type classification (SBT) of CPT data however have severe limitations, often restricting their application to a local scale. For parameterization of regional groundwater flow or geotechnical models, and delineation of regional hydro- or lithostratigraphy, regional SBT classification would be very useful. This paper investigates the use of model-based clustering for SBT classification, and the influence of different clustering approaches on the properties and spatial distribution of the obtained soil classes. We additionally propose a methodology for automated lithostratigraphic mapping of regionally occurring sedimentary units using SBT classification. The methodology is applied to a large CPT dataset, covering a groundwater basin of ~60 km2 with predominantly unconsolidated sandy sediments in northern Belgium. Results show that the model-based approach is superior in detecting the true lithological classes when compared to more frequently applied unsupervised classification approaches or literature classification diagrams. We demonstrate that automated mapping of lithostratigraphic units using advanced SBT classification techniques can provide a large gain in efficiency, compared to more time-consuming manual approaches and yields at least equally accurate results. PMID:28467468
Rogiers, Bart; Mallants, Dirk; Batelaan, Okke; Gedeon, Matej; Huysmans, Marijke; Dassargues, Alain
2017-01-01
Cone penetration testing (CPT) is one of the most efficient and versatile methods currently available for geotechnical, lithostratigraphic and hydrogeological site characterization. Currently available methods for soil behaviour type classification (SBT) of CPT data however have severe limitations, often restricting their application to a local scale. For parameterization of regional groundwater flow or geotechnical models, and delineation of regional hydro- or lithostratigraphy, regional SBT classification would be very useful. This paper investigates the use of model-based clustering for SBT classification, and the influence of different clustering approaches on the properties and spatial distribution of the obtained soil classes. We additionally propose a methodology for automated lithostratigraphic mapping of regionally occurring sedimentary units using SBT classification. The methodology is applied to a large CPT dataset, covering a groundwater basin of ~60 km2 with predominantly unconsolidated sandy sediments in northern Belgium. Results show that the model-based approach is superior in detecting the true lithological classes when compared to more frequently applied unsupervised classification approaches or literature classification diagrams. We demonstrate that automated mapping of lithostratigraphic units using advanced SBT classification techniques can provide a large gain in efficiency, compared to more time-consuming manual approaches and yields at least equally accurate results.
Attempting to physically explain space-time correlation of extremes
NASA Astrophysics Data System (ADS)
Bernardara, Pietro; Gailhard, Joel
2010-05-01
Spatial and temporal clustering of hydro-meteorological extreme events is scientific evidence. Moreover, the statistical parameters characterizing their local frequencies of occurrence show clear spatial patterns. Thus, in order to robustly assess the hydro-meteorological hazard, statistical models need to be able to take into account spatial and temporal dependencies. Statistical models considering long term correlation for quantifying and qualifying temporal and spatial dependencies are available, such as multifractal approach. Furthermore, the development of regional frequency analysis techniques allows estimating the frequency of occurrence of extreme events taking into account spatial patterns on the extreme quantiles behaviour. However, in order to understand the origin of spatio-temporal clustering, an attempt to find physical explanation should be done. Here, some statistical evidences of spatio-temporal correlation and spatial patterns of extreme behaviour are given on a large database of more than 400 rainfall and discharge series in France. In particular, the spatial distribution of multifractal and Generalized Pareto distribution parameters shows evident correlation patterns in the behaviour of frequency of occurrence of extremes. It is then shown that the identification of atmospheric circulation pattern (weather types) can physically explain the temporal clustering of extreme rainfall events (seasonality) and the spatial pattern of the frequency of occurrence. Moreover, coupling this information with the hydrological modelization of a watershed (as in the Schadex approach) an explanation of spatio-temporal distribution of extreme discharge can also be provided. We finally show that a hydro-meteorological approach (as the Schadex approach) can explain and take into account space and time dependencies of hydro-meteorological extreme events.
Impact of network topology on self-organized criticality
NASA Astrophysics Data System (ADS)
Hoffmann, Heiko
2018-02-01
The general mechanisms behind self-organized criticality (SOC) are still unknown. Several microscopic and mean-field theory approaches have been suggested, but they do not explain the dependence of the exponents on the underlying network topology of the SOC system. Here, we first report the phenomena that in the Bak-Tang-Wiesenfeld (BTW) model, sites inside an avalanche area largely return to their original state after the passing of an avalanche, forming, effectively, critically arranged clusters of sites. Then, we hypothesize that SOC relies on the formation process of these clusters, and present a model of such formation. For low-dimensional networks, we show theoretically and in simulation that the exponent of the cluster-size distribution is proportional to the ratio of the fractal dimension of the cluster boundary and the dimensionality of the network. For the BTW model, in our simulations, the exponent of the avalanche-area distribution matched approximately our prediction based on this ratio for two-dimensional networks, but deviated for higher dimensions. We hypothesize a transition from cluster formation to the mean-field theory process with increasing dimensionality. This work sheds light onto the mechanisms behind SOC, particularly, the impact of the network topology.
NASA Astrophysics Data System (ADS)
Ferrari, Francesco; Parola, Alberto; Sorella, Sandro; Becca, Federico
2018-06-01
The dynamical spin structure factor is computed within a variational framework to study the one-dimensional J1-J2 Heisenberg model. Starting from Gutzwiller-projected fermionic wave functions, the low-energy spectrum is constructed from two-spinon excitations. The direct comparison with Lanczos calculations on small clusters demonstrates the excellent description of both gapless and gapped (dimerized) phases, including incommensurate structures for J2/J1>0.5 . Calculations on large clusters show how the intensity evolves when increasing the frustrating ratio and give an unprecedented accurate characterization of the dynamical properties of (nonintegrable) frustrated spin models.
NASA Astrophysics Data System (ADS)
Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor
2017-05-01
Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.
Model Selection for Monitoring CO2 Plume during Sequestration
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-12-31
The model selection method developed as part of this project mainly includes four steps: (1) assessing the connectivity/dynamic characteristics of a large prior ensemble of models, (2) model clustering using multidimensional scaling coupled with k-mean clustering, (3) model selection using the Bayes' rule in the reduced model space, (4) model expansion using iterative resampling of the posterior models. The fourth step expresses one of the advantages of the method: it provides a built-in means of quantifying the uncertainty in predictions made with the selected models. In our application to plume monitoring, by expanding the posterior space of models, the finalmore » ensemble of representations of geological model can be used to assess the uncertainty in predicting the future displacement of the CO2 plume. The software implementation of this approach is attached here.« less
Hensman, James; Lawrence, Neil D; Rattray, Magnus
2013-08-20
Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.
Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai
2016-09-07
Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications.
Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai
2016-01-01
Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications. PMID:27599720
Singlet-paired coupled cluster theory for open shells
NASA Astrophysics Data System (ADS)
Gomez, John A.; Henderson, Thomas M.; Scuseria, Gustavo E.
2016-06-01
Restricted single-reference coupled cluster theory truncated to single and double excitations accurately describes weakly correlated systems, but often breaks down in the presence of static or strong correlation. Good coupled cluster energies in the presence of degeneracies can be obtained by using a symmetry-broken reference, such as unrestricted Hartree-Fock, but at the cost of good quantum numbers. A large body of work has shown that modifying the coupled cluster ansatz allows for the treatment of strong correlation within a single-reference, symmetry-adapted framework. The recently introduced singlet-paired coupled cluster doubles (CCD0) method is one such model, which recovers correct behavior for strong correlation without requiring symmetry breaking in the reference. Here, we extend singlet-paired coupled cluster for application to open shells via restricted open-shell singlet-paired coupled cluster singles and doubles (ROCCSD0). The ROCCSD0 approach retains the benefits of standard coupled cluster theory and recovers correct behavior for strongly correlated, open-shell systems using a spin-preserving ROHF reference.
Statistical Significance for Hierarchical Clustering
Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.
2017-01-01
Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
Liu, L L; Liu, M J; Ma, M
2015-09-28
The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.
Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses
ERIC Educational Resources Information Center
Huang, Guan-Hua; Wang, Su-Mei; Hsu, Chung-Chu
2011-01-01
Statisticians typically estimate the parameters of latent class and latent profile models using the Expectation-Maximization algorithm. This paper proposes an alternative two-stage approach to model fitting. The first stage uses the modified k-means and hierarchical clustering algorithms to identify the latent classes that best satisfy the…
Features of asthma which provide meaningful insights for understanding the disease heterogeneity.
Deliu, M; Yavuz, T S; Sperrin, M; Belgrave, D; Sahiner, U M; Sackesen, C; Kalayci, O; Custovic, A
2018-01-01
Data-driven methods such as hierarchical clustering (HC) and principal component analysis (PCA) have been used to identify asthma subtypes, with inconsistent results. To develop a framework for the discovery of stable and clinically meaningful asthma subtypes. We performed HC in a rich data set from 613 asthmatic children, using 45 clinical variables (Model 1), and after PCA dimensionality reduction (Model 2). Clinical experts then identified a set of asthma features/domains which informed clusters in the two analyses. In Model 3, we reclustered the data using these features to ascertain whether this improved the discovery process. Cluster stability was poor in Models 1 and 2. Clinical experts highlighted four asthma features/domains which differentiated the clusters in two models: age of onset, allergic sensitization, severity, and recent exacerbations. In Model 3 (HC using these four features), cluster stability improved substantially. The cluster assignment changed, providing more clinically interpretable results. In a 5-cluster model, we labelled the clusters as: "Difficult asthma" (n = 132); "Early-onset mild atopic" (n = 210); "Early-onset mild non-atopic: (n = 153); "Late-onset" (n = 105); and "Exacerbation-prone asthma" (n = 13). Multinomial regression demonstrated that lung function was significantly diminished among children with "Difficult asthma"; blood eosinophilia was a significant feature of "Difficult," "Early-onset mild atopic," and "Late-onset asthma." Children with moderate-to-severe asthma were present in each cluster. An integrative approach of blending the data with clinical expert domain knowledge identified four features, which may be informative for ascertaining asthma endotypes. These findings suggest that variables which are key determinants of asthma presence, severity, or control may not be the most informative for determining asthma subtypes. Our results indicate that exacerbation-prone asthma may be a separate asthma endotype and that severe asthma is not a single entity, but an extreme end of the spectrum of several different asthma endotypes. © 2017 The Authors. Clinical & Experimental Allergy published by John Wiley & Sons Ltd.
Multilevel covariance regression with correlated random effects in the mean and variance structure.
Quintero, Adrian; Lesaffre, Emmanuel
2017-09-01
Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oberreit, Derek; Fluid Measurement Technologies, Inc., Saint Paul, Minnesota 55110; Rawat, Vivek K.
The sorption of vapor molecules onto pre-existing nanometer sized clusters is of importance in understanding particle formation and growth in gas phase environments and devising gas phase separation schemes. Here, we apply a differential mobility analyzer-mass spectrometer based approach to observe directly the sorption of vapor molecules onto iodide cluster ions of the form (MI){sub x}M{sup +} (x = 1-13, M = Na, K, Rb, or Cs) in air at 300 K and with water saturation ratios in the 0.01-0.64 range. The extent of vapor sorption is quantified in measurements by the shift in collision cross section (CCS) for eachmore » ion. We find that CCS measurements are sensitive enough to detect the transient binding of several vapor molecules to clusters, which shift CCSs by only several percent. At the same time, for the highest saturation ratios examined, we observed CCS shifts of up to 45%. For x < 4, cesium, rubidium, and potassium iodide cluster ions are found to uptake water to a similar extent, while sodium iodide clusters uptake less water. For x ≥ 4, sodium iodide cluster ions uptake proportionally more water vapor than rubidium and potassium iodide cluster ions, while cesium iodide ions exhibit less uptake. Measured CCS shifts are compared to predictions based upon a Kelvin-Thomson-Raoult (KTR) model as well as a Langmuir adsorption model. We find that the Langmuir adsorption model can be fit well to measurements. Meanwhile, KTR predictions deviate from measurements, which suggests that the earliest stages of vapor uptake by nanometer scale species are not well described by the KTR model.« less
Helium segregation on surfaces of plasma-exposed tungsten
NASA Astrophysics Data System (ADS)
Maroudas, Dimitrios; Blondel, Sophie; Hu, Lin; Hammond, Karl D.; Wirth, Brian D.
2016-02-01
We report a hierarchical multi-scale modeling study of implanted helium segregation on surfaces of tungsten, considered as a plasma facing component in nuclear fusion reactors. We employ a hierarchy of atomic-scale simulations based on a reliable interatomic interaction potential, including molecular-statics simulations to understand the origin of helium surface segregation, targeted molecular-dynamics (MD) simulations of near-surface cluster reactions, and large-scale MD simulations of implanted helium evolution in plasma-exposed tungsten. We find that small, mobile He n (1 ⩽ n ⩽ 7) clusters in the near-surface region are attracted to the surface due to an elastic interaction force that provides the thermodynamic driving force for surface segregation. This elastic interaction force induces drift fluxes of these mobile He n clusters, which increase substantially as the migrating clusters approach the surface, facilitating helium segregation on the surface. Moreover, the clusters’ drift toward the surface enables cluster reactions, most importantly trap mutation, in the near-surface region at rates much higher than in the bulk material. These near-surface cluster dynamics have significant effects on the surface morphology, near-surface defect structures, and the amount of helium retained in the material upon plasma exposure. We integrate the findings of such atomic-scale simulations into a properly parameterized and validated spatially dependent, continuum-scale reaction-diffusion cluster dynamics model, capable of predicting implanted helium evolution, surface segregation, and its near-surface effects in tungsten. This cluster-dynamics model sets the stage for development of fully atomistically informed coarse-grained models for computationally efficient simulation predictions of helium surface segregation, as well as helium retention and surface morphological evolution, toward optimal design of plasma facing components.
2n-emission from {sup 205}Pb* nucleus using clusterization approach at E{sub beam}∼14-20 MeV
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaur, Amandeep, E-mail: adeepkaur89@gmail.com; Sandhu, Kiran; Sharma, Manoj Kumar, E-mail: msharma@thapar.edu
2016-05-06
The dynamics involved in n-induced reaction with {sup 204}Pb target is analyzed and the decay of the composite system {sup 205}Pb* is governed within the collective clusterization approach of the Dynamical Cluster-decay Model (DCM). The experimental data for 2n-evaporation channel is available for neutron energy range of 14-20 MeV and is addressed by optimizing the only parameter of the model, the neck-length parameter (ΔR). The calculations are done by taking the quadrupole (β{sub 2}) deformations of the decaying fragments and the calculated 2n-emission cross-sections find nice agreement with available data. An effort is made to study the role of levelmore » density parameter in the decay of hot-rotating nucleus, and the mass dependence in level density parameter is exercised for the first time in DCM based calculations. It is to be noted that the effect of deformation, temperature and angular momentum etc. is studied to extract better description of the dynamics involved.« less
Brea, Oriana; Luna, Alberto; Díaz, Cristina; Corral, Inés
2018-06-05
Hydrogen has been proposed as a long-term non-fossil fuel to be used in a future ideal carbon-neutral energetic economy. However, its low volumetric energy density hinders its storage and transportation. Metal-organic frameworks (MOFs) represent very promising materials for this purpose due to their very extended surface areas. Azolates, in particular tetrazolates, are - together with carboxylate functionalities - very common organic linkers connecting metallic secondary building units in MOFs. This study addresses, from a theoretical perspective, the H 2 adsorptive properties of tetrazolate linkers at the molecular level, following a size-progressive approach. Specifically, we have investigated how the physisorption energies and geometries are affected when changing the environment of the linker by considering the azolates in the gas phase, immersed in a finite cluster, or being part of an infinite extended crystal material. Furthermore, we also study the H 2 adsorptive capacity of these linkers within the cluster model. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Scalable clustering algorithms for continuous environmental flow cytometry.
Hyrkas, Jeremy; Clayton, Sophie; Ribalet, Francois; Halperin, Daniel; Armbrust, E Virginia; Howe, Bill
2016-02-01
Recent technological innovations in flow cytometry now allow oceanographers to collect high-frequency flow cytometry data from particles in aquatic environments on a scale far surpassing conventional flow cytometers. The SeaFlow cytometer continuously profiles microbial phytoplankton populations across thousands of kilometers of the surface ocean. The data streams produced by instruments such as SeaFlow challenge the traditional sample-by-sample approach in cytometric analysis and highlight the need for scalable clustering algorithms to extract population information from these large-scale, high-frequency flow cytometers. We explore how available algorithms commonly used for medical applications perform at classification of such a large-scale, environmental flow cytometry data. We apply large-scale Gaussian mixture models to massive datasets using Hadoop. This approach outperforms current state-of-the-art cytometry classification algorithms in accuracy and can be coupled with manual or automatic partitioning of data into homogeneous sections for further classification gains. We propose the Gaussian mixture model with partitioning approach for classification of large-scale, high-frequency flow cytometry data. Source code available for download at https://github.com/jhyrkas/seaflow_cluster, implemented in Java for use with Hadoop. hyrkas@cs.washington.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Image quality guided approach for adaptive modelling of biometric intra-class variations
NASA Astrophysics Data System (ADS)
Abboud, Ali J.; Jassim, Sabah A.
2010-04-01
The high intra-class variability of acquired biometric data can be attributed to several factors such as quality of acquisition sensor (e.g. thermal), environmental (e.g. lighting), behavioural (e.g. change face pose). Such large fuzziness of biometric data can cause a big difference between an acquired and stored biometric data that will eventually lead to reduced performance. Many systems store multiple templates in order to account for such variations in the biometric data during enrolment stage. The number and typicality of these templates are the most important factors that affect system performance than other factors. In this paper, a novel offline approach is proposed for systematic modelling of intra-class variability and typicality in biometric data by regularly selecting new templates from a set of available biometric images. Our proposed technique is a two stage algorithm whereby in the first stage image samples are clustered in terms of their image quality profile vectors, rather than their biometric feature vectors, and in the second stage a per cluster template is selected from a small number of samples in each clusters to create an ultimate template sets. These experiments have been conducted on five face image databases and their results will demonstrate the effectiveness of proposed quality guided approach.
Patterns of Childhood Abuse and Neglect in a Representative German Population Sample
Schilling, Christoph; Weidner, Kerstin; Brähler, Elmar; Glaesmer, Heide; Häuser, Winfried; Pöhlmann, Karin
2016-01-01
Background Different types of childhood maltreatment, like emotional abuse, emotional neglect, physical abuse, physical neglect and sexual abuse are interrelated because of their co-occurrence. Different patterns of childhood abuse and neglect are associated with the degree of severity of mental disorders in adulthood. The purpose of this study was (a) to identify different patterns of childhood maltreatment in a representative German community sample, (b) to replicate the patterns of childhood neglect and abuse recently found in a clinical German sample, (c) to examine whether participants reporting exposure to specific patterns of child maltreatment would report different levels of psychological distress, and (d) to compare the results of the typological approach and the results of a cumulative risk model based on our data set. Methods In a cross-sectional survey conducted in 2010, a representative random sample of 2504 German participants aged between 14 and 92 years completed the Childhood Trauma Questionnaire (CTQ). General anxiety and depression were assessed by standardized questionnaires (GAD-2, PHQ-2). Cluster analysis was conducted with the CTQ-subscales to identify different patterns of childhood maltreatment. Results Three different patterns of childhood abuse and neglect could be identified by cluster analysis. Cluster one showed low values on all CTQ-scales. Cluster two showed high values in emotional and physical neglect. Only cluster three showed high values in physical and sexual abuse. The three patterns of childhood maltreatment showed different degrees of depression (PHQ-2) and anxiety (GAD-2). Cluster one showed lowest levels of psychological distress, cluster three showed highest levels of mental distress. Conclusion The results show that different types of childhood maltreatment are interrelated and can be grouped into specific patterns of childhood abuse and neglect, which are associated with differing severity of psychological distress in adulthood. The results correspond to those recently found in a German clinical sample and support a typological approach in the research of maltreatment. While cumulative risk models focus on the number of maltreatment types, the typological approach takes the number as well as the severity of the maltreatment types into account. Thus, specific patterns of maltreatment can be examined with regard to specific long-term psychological consequences. PMID:27442446
A Hidden Markov Model for Urban-Scale Traffic Estimation Using Floating Car Data.
Wang, Xiaomeng; Peng, Ling; Chi, Tianhe; Li, Mengzhu; Yao, Xiaojing; Shao, Jing
2015-01-01
Urban-scale traffic monitoring plays a vital role in reducing traffic congestion. Owing to its low cost and wide coverage, floating car data (FCD) serves as a novel approach to collecting traffic data. However, sparse probe data represents the vast majority of the data available on arterial roads in most urban environments. In order to overcome the problem of data sparseness, this paper proposes a hidden Markov model (HMM)-based traffic estimation model, in which the traffic condition on a road segment is considered as a hidden state that can be estimated according to the conditions of road segments having similar traffic characteristics. An algorithm based on clustering and pattern mining rather than on adjacency relationships is proposed to find clusters with road segments having similar traffic characteristics. A multi-clustering strategy is adopted to achieve a trade-off between clustering accuracy and coverage. Finally, the proposed model is designed and implemented on the basis of a real-time algorithm. Results of experiments based on real FCD confirm the applicability, accuracy, and efficiency of the model. In addition, the results indicate that the model is practicable for traffic estimation on urban arterials and works well even when more than 70% of the probe data are missing.
McParland, D; Phillips, C M; Brennan, L; Roche, H M; Gormley, I C
2017-12-10
The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes ('healthy' and 'at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Fast Whole-Engine Stirling Analysis
NASA Technical Reports Server (NTRS)
Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako
2006-01-01
This presentation discusses the simulation approach to whole-engine for physical consistency, REV regenerator modeling, grid layering for smoothness, and quality, conjugate heat transfer method adjustment, high-speed low cost parallel cluster, and debugging.
First principles calculations for interaction of tyrosine with (ZnO)3 cluster
NASA Astrophysics Data System (ADS)
Singh, Satvinder; Singh, Gurinder; Kaura, Aman; Tripathi, S. K.
2018-04-01
First Principles Calculations have been performed to study interactions of Phenol ring of Tyrosine (C6H5OH) with (ZnO)3 atomic cluster. All the calculations have been performed under the Density Functional Theory (DFT) framework. Structural and electronic properties of (ZnO)3/C6H5OH have been studied. Gaussian basis set approach has been adopted for the calculations. A ring type most stable (ZnO)3 atomic cluster has been modeled, analyzed and used for the calculations. The compatibility of the results with previous studies has been presented here.
Wobbled electronic properties of lithium clusters: Deterministic approach through first principles
NASA Astrophysics Data System (ADS)
Kushwaha, Anoop Kumar; Nayak, Saroj Kumar
2018-03-01
The innate tendency to form dendritic growth promoted through cluster formation leading to the failure of a Li-ion battery system have drawn significant attention of the researchers towards the effective destabilization of the cluster growth through selective implementation of electrolytic media such as acetonitrile (MeCN). In the present work, using first principles density functional theory and continuum dielectric model, we have investigated the origin of oscillatory nature of binding energy per atom of Lin (n ≤ 8) under the influence of MeCN. In the gas phase, we found that static mean polarizability is strongly correlated with binding energy and shows oscillatory nature with cluster size due to the open shell of Lin cluster. However, in acetonitrile medium, the binding energy has been correlated with electrostatic Lin -MeCN interaction and it has been found that both of them possess wobbled behavior characterized by the cluster size.
NASA Astrophysics Data System (ADS)
Chen, Xiuhong; Huang, Xianglei; Jiao, Chaoyi; Flanner, Mark G.; Raeker, Todd; Palen, Brock
2017-01-01
The suites of numerical models used for simulating climate of our planet are usually run on dedicated high-performance computing (HPC) resources. This study investigates an alternative to the usual approach, i.e. carrying out climate model simulations on commercially available cloud computing environment. We test the performance and reliability of running the CESM (Community Earth System Model), a flagship climate model in the United States developed by the National Center for Atmospheric Research (NCAR), on Amazon Web Service (AWS) EC2, the cloud computing environment by Amazon.com, Inc. StarCluster is used to create virtual computing cluster on the AWS EC2 for the CESM simulations. The wall-clock time for one year of CESM simulation on the AWS EC2 virtual cluster is comparable to the time spent for the same simulation on a local dedicated high-performance computing cluster with InfiniBand connections. The CESM simulation can be efficiently scaled with the number of CPU cores on the AWS EC2 virtual cluster environment up to 64 cores. For the standard configuration of the CESM at a spatial resolution of 1.9° latitude by 2.5° longitude, increasing the number of cores from 16 to 64 reduces the wall-clock running time by more than 50% and the scaling is nearly linear. Beyond 64 cores, the communication latency starts to outweigh the benefit of distributed computing and the parallel speedup becomes nearly unchanged.
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-08-13
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.
Cognitive Clusters in Specific Learning Disorder.
Poletti, Michele; Carretta, Elisa; Bonvicini, Laura; Giorgi-Rossi, Paolo
The heterogeneity among children with learning disabilities still represents a barrier and a challenge in their conceptualization. Although a dimensional approach has been gaining support, the categorical approach is still the most adopted, as in the recent fifth edition of the Diagnostic and Statistical Manual of Mental Disorders. The introduction of the single overarching diagnostic category of specific learning disorder (SLD) could underemphasize interindividual clinical differences regarding intracategory cognitive functioning and learning proficiency, according to current models of multiple cognitive deficits at the basis of neurodevelopmental disorders. The characterization of specific cognitive profiles associated with an already manifest SLD could help identify possible early cognitive markers of SLD risk and distinct trajectories of atypical cognitive development leading to SLD. In this perspective, we applied a cluster analysis to identify groups of children with a Diagnostic and Statistical Manual-based diagnosis of SLD with similar cognitive profiles and to describe the association between clusters and SLD subtypes. A sample of 205 children with a diagnosis of SLD were enrolled. Cluster analyses (agglomerative hierarchical and nonhierarchical iterative clustering technique) were used successively on 10 core subtests of the Wechsler Intelligence Scale for Children-Fourth Edition. The 4-cluster solution was adopted, and external validation found differences in terms of SLD subtype frequencies and learning proficiency among clusters. Clinical implications of these findings are discussed, tracing directions for further studies.
NASA Astrophysics Data System (ADS)
Kawahara, Hajime; Reese, Erik D.; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi
2008-11-01
Our previous analysis indicates that small-scale fluctuations in the intracluster medium (ICM) from cosmological hydrodynamic simulations follow the lognormal probability density function. In order to test the lognormal nature of the ICM directly against X-ray observations of galaxy clusters, we develop a method of extracting statistical information about the three-dimensional properties of the fluctuations from the two-dimensional X-ray surface brightness. We first create a set of synthetic clusters with lognormal fluctuations around their mean profile given by spherical isothermal β-models, later considering polytropic temperature profiles as well. Performing mock observations of these synthetic clusters, we find that the resulting X-ray surface brightness fluctuations also follow the lognormal distribution fairly well. Systematic analysis of the synthetic clusters provides an empirical relation between the three-dimensional density fluctuations and the two-dimensional X-ray surface brightness. We analyze Chandra observations of the galaxy cluster Abell 3667, and find that its X-ray surface brightness fluctuations follow the lognormal distribution. While the lognormal model was originally motivated by cosmological hydrodynamic simulations, this is the first observational confirmation of the lognormal signature in a real cluster. Finally we check the synthetic cluster results against clusters from cosmological hydrodynamic simulations. As a result of the complex structure exhibited by simulated clusters, the empirical relation between the two- and three-dimensional fluctuation properties calibrated with synthetic clusters when applied to simulated clusters shows large scatter. Nevertheless we are able to reproduce the true value of the fluctuation amplitude of simulated clusters within a factor of 2 from their two-dimensional X-ray surface brightness alone. Our current methodology combined with existing observational data is useful in describing and inferring the statistical properties of the three-dimensional inhomogeneity in galaxy clusters.
Review of methods for handling confounding by cluster and informative cluster size in clustered data
Seaman, Shaun; Pavlou, Menelaos; Copas, Andrew
2014-01-01
Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland. PMID:25087978
Improving stability of prediction models based on correlated omics data by using network approaches.
Tissier, Renaud; Houwing-Duistermaat, Jeanine; Rodríguez-Girondo, Mar
2018-01-01
Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.
A quasi-static approach to structure formation in black hole universes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Durk, Jessie; Clifton, Timothy, E-mail: j.durk@qmul.ac.uk, E-mail: t.clifton@qmul.ac.uk
Motivated by the existence of hierarchies of structure in the Universe, we present four new families of exact initial data for inhomogeneous cosmological models at their maximum of expansion. These data generalise existing black hole lattice models to situations that contain clusters of masses, and hence allow the consequences of cosmological structures to be considered in a well-defined and non-perturbative fashion. The degree of clustering is controlled by a parameter λ, in such a way that for λ ∼ 0 or 1 we have very tightly clustered masses, whilst for λ ∼ 0.5 all masses are separated by cosmological distancemore » scales. We study the consequences of structure formation on the total net mass in each of our clusters, as well as calculating the cosmological consequences of the interaction energies both within and between clusters. The locations of the shared horizons that appear around groups of black holes, when they are brought sufficiently close together, are also identified and studied. We find that clustering can have surprisingly large effects on the scale of the cosmology, with models that contain thousands of black holes sometimes being as little as 30% of the size of comparable Friedmann models with the same total proper mass. This deficit is comparable to what might be expected to occur from neglecting gravitational interaction energies in Friedmann cosmology, and suggests that these quantities may have a significant influence on the properties of the large-scale cosmology.« less
Wang, H B; Wang, Q; Dong, C; Yuan, L; Xu, F; Sun, L X
2008-03-19
This paper analyzes the characteristics of alloy compositions with large hydrogen storage capacities in Laves phase-related body-centered cubic (bcc) solid solution alloy systems using the cluster line approach. Since a dense-packed icosahedral cluster A(6)B(7) characterizes the local structure of AB(2) Laves phases, in an A-B-C ternary system, such as Ti-Cr (Mn, Fe)-V, where A-B forms AB(2) Laves phases while A-C and B-C tend to form solid solutions, a cluster line A(6)B(7)-C is constructed by linking A(6)B(7) to C. The alloy compositions with large hydrogen storage capacities are generally located near this line and are approximately expressed with the cluster-plus-glue-atom model. The cluster line alloys (Ti(6)Cr(7))(100-x)V(x) (x = 2.5-70 at.%) exhibit different structures and hence different hydrogen storage capacities with increasing V content. The alloys (Ti(6)Cr(7))(95)V(5) and Ti(30)Cr(40)V(30) with bcc solid solution structure satisfy the cluster-plus-glue-atom model.
Construction of ground-state preserving sparse lattice models for predictive materials simulations
NASA Astrophysics Data System (ADS)
Huang, Wenxuan; Urban, Alexander; Rong, Ziqin; Ding, Zhiwei; Luo, Chuan; Ceder, Gerbrand
2017-08-01
First-principles based cluster expansion models are the dominant approach in ab initio thermodynamics of crystalline mixtures enabling the prediction of phase diagrams and novel ground states. However, despite recent advances, the construction of accurate models still requires a careful and time-consuming manual parameter tuning process for ground-state preservation, since this property is not guaranteed by default. In this paper, we present a systematic and mathematically sound method to obtain cluster expansion models that are guaranteed to preserve the ground states of their reference data. The method builds on the recently introduced compressive sensing paradigm for cluster expansion and employs quadratic programming to impose constraints on the model parameters. The robustness of our methodology is illustrated for two lithium transition metal oxides with relevance for Li-ion battery cathodes, i.e., Li2xFe2(1-x)O2 and Li2xTi2(1-x)O2, for which the construction of cluster expansion models with compressive sensing alone has proven to be challenging. We demonstrate that our method not only guarantees ground-state preservation on the set of reference structures used for the model construction, but also show that out-of-sample ground-state preservation up to relatively large supercell size is achievable through a rapidly converging iterative refinement. This method provides a general tool for building robust, compressed and constrained physical models with predictive power.
NASA Astrophysics Data System (ADS)
Alagha, Jawad S.; Seyam, Mohammed; Md Said, Md Azlin; Mogheir, Yunes
2017-12-01
Artificial intelligence (AI) techniques have increasingly become efficient alternative modeling tools in the water resources field, particularly when the modeled process is influenced by complex and interrelated variables. In this study, two AI techniques—artificial neural networks (ANNs) and support vector machine (SVM)—were employed to achieve deeper understanding of the salinization process (represented by chloride concentration) in complex coastal aquifers influenced by various salinity sources. Both models were trained using 11 years of groundwater quality data from 22 municipal wells in Khan Younis Governorate, Gaza, Palestine. Both techniques showed satisfactory prediction performance, where the mean absolute percentage error (MAPE) and correlation coefficient ( R) for the test data set were, respectively, about 4.5 and 99.8% for the ANNs model, and 4.6 and 99.7% for SVM model. The performances of the developed models were further noticeably improved through preprocessing the wells data set using a k-means clustering method, then conducting AI techniques separately for each cluster. The developed models with clustered data were associated with higher performance, easiness and simplicity. They can be employed as an analytical tool to investigate the influence of input variables on coastal aquifer salinity, which is of great importance for understanding salinization processes, leading to more effective water-resources-related planning and decision making.
Marginal regression approach for additive hazards models with clustered current status data.
Su, Pei-Fang; Chi, Yunchan
2014-01-15
Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated. Copyright © 2013 John Wiley & Sons, Ltd.
Activity-induced clustering in model dumbbell swimmers: the role of hydrodynamic interactions.
Furukawa, Akira; Marenduzzo, Davide; Cates, Michael E
2014-08-01
Using a fluid-particle dynamics approach, we numerically study the effects of hydrodynamic interactions on the collective dynamics of active suspensions within a simple model for bacterial motility: each microorganism is modeled as a stroke-averaged dumbbell swimmer with prescribed dipolar force pairs. Using both simulations and qualitative arguments, we show that, when the separation between swimmers is comparable to their size, the swimmers' motions are strongly affected by activity-induced hydrodynamic forces. To further understand these effects, we investigate semidilute suspensions of swimmers in the presence of thermal fluctuations. A direct comparison between simulations with and without hydrodynamic interactions shows these to enhance the dynamic clustering at a relatively small volume fraction; with our chosen model the key ingredient for this clustering behavior is hydrodynamic trapping of one swimmer by another, induced by the active forces. Furthermore, the density dependence of the motility (of both the translational and rotational motions) exhibits distinctly different behaviors with and without hydrodynamic interactions; we argue that this is linked to the clustering tendency. Our study illustrates the fact that hydrodynamic interactions not only affect kinetic pathways in active suspensions, but also cause major changes in their steady state properties.
Activity-induced clustering in model dumbbell swimmers: The role of hydrodynamic interactions
NASA Astrophysics Data System (ADS)
Furukawa, Akira; Marenduzzo, Davide; Cates, Michael E.
2014-08-01
Using a fluid-particle dynamics approach, we numerically study the effects of hydrodynamic interactions on the collective dynamics of active suspensions within a simple model for bacterial motility: each microorganism is modeled as a stroke-averaged dumbbell swimmer with prescribed dipolar force pairs. Using both simulations and qualitative arguments, we show that, when the separation between swimmers is comparable to their size, the swimmers' motions are strongly affected by activity-induced hydrodynamic forces. To further understand these effects, we investigate semidilute suspensions of swimmers in the presence of thermal fluctuations. A direct comparison between simulations with and without hydrodynamic interactions shows these to enhance the dynamic clustering at a relatively small volume fraction; with our chosen model the key ingredient for this clustering behavior is hydrodynamic trapping of one swimmer by another, induced by the active forces. Furthermore, the density dependence of the motility (of both the translational and rotational motions) exhibits distinctly different behaviors with and without hydrodynamic interactions; we argue that this is linked to the clustering tendency. Our study illustrates the fact that hydrodynamic interactions not only affect kinetic pathways in active suspensions, but also cause major changes in their steady state properties.
Zhang, X; Patel, L A; Beckwith, O; Schneider, R; Weeden, C J; Kindt, J T
2017-11-14
Micelle cluster distributions from molecular dynamics simulations of a solvent-free coarse-grained model of sodium octyl sulfate (SOS) were analyzed using an improved method to extract equilibrium association constants from small-system simulations containing one or two micelle clusters at equilibrium with free surfactants and counterions. The statistical-thermodynamic and mathematical foundations of this partition-enabled analysis of cluster histograms (PEACH) approach are presented. A dramatic reduction in computational time for analysis was achieved through a strategy similar to the selector variable method to circumvent the need for exhaustive enumeration of the possible partitions of surfactants and counterions into clusters. Using statistics from a set of small-system (up to 60 SOS molecules) simulations as input, equilibrium association constants for micelle clusters were obtained as a function of both number of surfactants and number of associated counterions through a global fitting procedure. The resulting free energies were able to accurately predict micelle size and charge distributions in a large (560 molecule) system. The evolution of micelle size and charge with SOS concentration as predicted by the PEACH-derived free energies and by a phenomenological four-parameter model fit, along with the sensitivity of these predictions to variations in cluster definitions, are analyzed and discussed.
Qualitative mechanism models and the rationalization of procedures
NASA Technical Reports Server (NTRS)
Farley, Arthur M.
1989-01-01
A qualitative, cluster-based approach to the representation of hydraulic systems is described and its potential for generating and explaining procedures is demonstrated. Many ideas are formalized and implemented as part of an interactive, computer-based system. The system allows for designing, displaying, and reasoning about hydraulic systems. The interactive system has an interface consisting of three windows: a design/control window, a cluster window, and a diagnosis/plan window. A qualitative mechanism model for the ORS (Orbital Refueling System) is presented to coordinate with ongoing research on this system being conducted at NASA Ames Research Center.
Andridge, Rebecca. R.
2011-01-01
In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. PMID:21259309
Evaluating Data Clustering Approach for Life-Cycle Facility Control
2013-04-01
produce 90% matching accuracy with noise/variations up to 55%. KEYWORDS: Building Information Modelling ( BIM ), machine learning, pattern detection...reconciled to building information model elements and ultimately to an expected resource utilization schedule. The motivation for this integration is to...by interoperable data sources and building information models . Building performance modelling and simulation efforts such as those by Maile et al
Efficient Deployment of Key Nodes for Optimal Coverage of Industrial Mobile Wireless Networks
Li, Xiaomin; Li, Di; Dong, Zhijie; Hu, Yage; Liu, Chengliang
2018-01-01
In recent years, industrial wireless networks (IWNs) have been transformed by the introduction of mobile nodes, and they now offer increased extensibility, mobility, and flexibility. Nevertheless, mobile nodes pose efficiency and reliability challenges. Efficient node deployment and management of channel interference directly affect network system performance, particularly for key node placement in clustered wireless networks. This study analyzes this system model, considering both industrial properties of wireless networks and their mobility. Then, static and mobile node coverage problems are unified and simplified to target coverage problems. We propose a novel strategy for the deployment of clustered heads in grouped industrial mobile wireless networks (IMWNs) based on the improved maximal clique model and the iterative computation of new candidate cluster head positions. The maximal cliques are obtained via a double-layer Tabu search. Each cluster head updates its new position via an improved virtual force while moving with full coverage to find the minimal inter-cluster interference. Finally, we develop a simulation environment. The simulation results, based on a performance comparison, show the efficacy of the proposed strategies and their superiority over current approaches. PMID:29439439
Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario
2014-01-01
Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
Substructures in DAFT/FADA survey clusters based on XMM and optical data
NASA Astrophysics Data System (ADS)
Durret, F.; DAFT/FADA Team
2014-07-01
The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.
NASA Astrophysics Data System (ADS)
Christou, Michalis; Christoudias, Theodoros; Morillo, Julián; Alvarez, Damian; Merx, Hendrik
2016-09-01
We examine an alternative approach to heterogeneous cluster-computing in the many-core era for Earth system models, using the European Centre for Medium-Range Weather Forecasts Hamburg (ECHAM)/Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model as a pilot application on the Dynamical Exascale Entry Platform (DEEP). A set of autonomous coprocessors interconnected together, called Booster, complements a conventional HPC Cluster and increases its computing performance, offering extra flexibility to expose multiple levels of parallelism and achieve better scalability. The EMAC model atmospheric chemistry code (Module Efficiently Calculating the Chemistry of the Atmosphere (MECCA)) was taskified with an offload mechanism implemented using OmpSs directives. The model was ported to the MareNostrum 3 supercomputer to allow testing with Intel Xeon Phi accelerators on a production-size machine. The changes proposed in this paper are expected to contribute to the eventual adoption of Cluster-Booster division and Many Integrated Core (MIC) accelerated architectures in presently available implementations of Earth system models, towards exploiting the potential of a fully Exascale-capable platform.
Asymptotic stability of spectral-based PDF modeling for homogeneous turbulent flows
NASA Astrophysics Data System (ADS)
Campos, Alejandro; Duraisamy, Karthik; Iaccarino, Gianluca
2015-11-01
Engineering models of turbulence, based on one-point statistics, neglect spectral information inherent in a turbulence field. It is well known, however, that the evolution of turbulence is dictated by a complex interplay between the spectral modes of velocity. For example, for homogeneous turbulence, the pressure-rate-of-strain depends on the integrated energy spectrum weighted by components of the wave vectors. The Interacting Particle Representation Model (IPRM) (Kassinos & Reynolds, 1996) and the Velocity/Wave-Vector PDF model (Van Slooten & Pope, 1997) emulate spectral information in an attempt to improve the modeling of turbulence. We investigate the evolution and asymptotic stability of the IPRM using three different approaches. The first approach considers the Lagrangian evolution of individual realizations (idealized as particles) of the stochastic process defined by the IPRM. The second solves Lagrangian evolution equations for clusters of realizations conditional on a given wave vector. The third evolves the solution of the Eulerian conditional PDF corresponding to the aforementioned clusters. This last method avoids issues related to discrete particle noise and slow convergence associated with Lagrangian particle-based simulations.
NASA Astrophysics Data System (ADS)
Li, Xiwang
Buildings consume about 41.1% of primary energy and 74% of the electricity in the U.S. Moreover, it is estimated by the National Energy Technology Laboratory that more than 1/4 of the 713 GW of U.S. electricity demand in 2010 could be dispatchable if only buildings could respond to that dispatch through advanced building energy control and operation strategies and smart grid infrastructure. In this study, it is envisioned that neighboring buildings will have the tendency to form a cluster, an open cyber-physical system to exploit the economic opportunities provided by a smart grid, distributed power generation, and storage devices. Through optimized demand management, these building clusters will then reduce overall primary energy consumption and peak time electricity consumption, and be more resilient to power disruptions. Therefore, this project seeks to develop a Net-zero building cluster simulation testbed and high fidelity energy forecasting models for adaptive and real-time control and decision making strategy development that can be used in a Net-zero building cluster. The following research activities are summarized in this thesis: 1) Development of a building cluster emulator for building cluster control and operation strategy assessment. 2) Development of a novel building energy forecasting methodology using active system identification and data fusion techniques. In this methodology, a systematic approach for building energy system characteristic evaluation, system excitation and model adaptation is included. The developed methodology is compared with other literature-reported building energy forecasting methods; 3) Development of the high fidelity on-line building cluster energy forecasting models, which includes energy forecasting models for buildings, PV panels, batteries and ice tank thermal storage systems 4) Small scale real building validation study to verify the performance of the developed building energy forecasting methodology. The outcomes of this thesis can be used for building cluster energy forecasting model development and model based control and operation optimization. The thesis concludes with a summary of the key outcomes of this research, as well as a list of recommendations for future work.
NASA Astrophysics Data System (ADS)
Bonatto, C.; Lima, E. F.; Bica, E.
2012-04-01
Context. Usually, important parameters of young, low-mass star clusters are very difficult to obtain by means of photometry, especially when differential reddening and/or binaries occur in large amounts. Aims: We present a semi-analytical approach (ASAmin) that, when applied to the Hess diagram of a young star cluster, is able to retrieve the values of mass, age, star-formation spread, distance modulus, foreground and differential reddening, and binary fraction. Methods: The global optimisation method known as adaptive simulated annealing (ASA) is used to minimise the residuals between the observed and simulated Hess diagrams of a star cluster. The simulations are realistic and take the most relevant parameters of young clusters into account. Important features of the simulations are a normal (Gaussian) differential reddening distribution, a time-decreasing star-formation rate, the unresolved binaries, and the smearing effect produced by photometric uncertainties on Hess diagrams. Free parameters are cluster mass, age, distance modulus, star-formation spread, foreground and differential reddening, and binary fraction. Results: Tests with model clusters built with parameters spanning a broad range of values show that ASAmin retrieves the input values with a high precision for cluster mass, distance modulus, and foreground reddening, but they are somewhat lower for the remaining parameters. Given the statistical nature of the simulations, several runs should be performed to obtain significant convergence patterns. Specifically, we find that the retrieved (absolute minimum) parameters converge to mean values with a low dispersion as the Hess residuals decrease. When applied to actual young clusters, the retrieved parameters follow convergence patterns similar to the models. We show how the stochasticity associated with the early phases may affect the results, especially in low-mass clusters. This effect can be minimised by averaging out several twin clusters in the simulated Hess diagrams. Conclusions: Even for low-mass star clusters, ASAmin is sensitive to the values of cluster mass, age, distance modulus, star-formation spread, foreground and differential reddening, and to a lesser degree, binary fraction. Compared with simpler approaches, including binaries, a decaying star-formation rate, and a normally distributed differential reddening appears to yield more constrained parameters, especially the mass, age, and distance from the Sun. A robust determination of cluster parameters may have a positive impact on many fields. For instance, age, mass, and binary fraction are important for establishing the dynamical state of a cluster or for deriving a more precise star-formation rate in the Galaxy.
Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.
Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun
2017-01-01
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
Graphical Representations and Cluster Algorithms for Ice Rule Vertex Models.
NASA Astrophysics Data System (ADS)
Shtengel, Kirill; Chayes, L.
2002-03-01
We introduce a new class of polymer models which is closely related to loop models, recently a topic of intensive studies. These particular models arise as graphical representations for ice-rule vertex models. The associated cluster algorithms provide a unification and generalisation of most of the existing algorithms. For many lattices, percolation in the polymer models evidently indicates first order phase transitions in the vertex models. Critical phases can be understood as being susceptible to colour symmetry breaking in the polymer models. The analysis includes, but is certainly not limited to the square lattice six-vertex model. In particular, analytic criteria can be found for low temperature phases in other even coordinated 2D lattices such as the triangular lattice, or higher dimensional lattices such as the hyper-cubic lattices of arbitrary dimensionality. Finally, our approach can be generalised to the vertex models that do not obey the ice rule, such as the eight-vertex model.
Hybrid approach of selecting hyperparameters of support vector machine for regression.
Jeng, Jin-Tsong
2006-06-01
To select the hyperparameters of the support vector machine for regression (SVR), a hybrid approach is proposed to determine the kernel parameter of the Gaussian kernel function and the epsilon value of Vapnik's epsilon-insensitive loss function. The proposed hybrid approach includes a competitive agglomeration (CA) clustering algorithm and a repeated SVR (RSVR) approach. Since the CA clustering algorithm is used to find the nearly "optimal" number of clusters and the centers of clusters in the clustering process, the CA clustering algorithm is applied to select the Gaussian kernel parameter. Additionally, an RSVR approach that relies on the standard deviation of a training error is proposed to obtain an epsilon in the loss function. Finally, two functions, one real data set (i.e., a time series of quarterly unemployment rate for West Germany) and an identification of nonlinear plant are used to verify the usefulness of the hybrid approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gomez, John A.; Henderson, Thomas M.; Scuseria, Gustavo E.
Restricted single-reference coupled cluster theory truncated to single and double excitations accurately describes weakly correlated systems, but often breaks down in the presence of static or strong correlation. Good coupled cluster energies in the presence of degeneracies can be obtained by using a symmetry-broken reference, such as unrestricted Hartree-Fock, but at the cost of good quantum numbers. A large body of work has shown that modifying the coupled cluster ansatz allows for the treatment of strong correlation within a single-reference, symmetry-adapted framework. The recently introduced singlet-paired coupled cluster doubles (CCD0) method is one such model, which recovers correct behavior formore » strong correlation without requiring symmetry breaking in the reference. Here, we extend singlet-paired coupled cluster for application to open shells via restricted open-shell singlet-paired coupled cluster singles and doubles (ROCCSD0). The ROCCSD0 approach retains the benefits of standard coupled cluster theory and recovers correct behavior for strongly correlated, open-shell systems using a spin-preserving ROHF reference.« less
NASA Astrophysics Data System (ADS)
Yannopapas, V.; Paspalakis, E.
2018-05-01
We study theoretically the optical response of a hybrid spherical cluster containing quantum emitters and metallic nanoparticles. The quantum emitters are modeled as two-level quantum systems whose dielectric function is obtained via a density matrix approach wherein the modified spontaneous emission decay rate at the position of each quantum emitter is calculated via the electromagnetic Green's tensor. The problem of light scattering off the hybrid cluster is solved by employing the coupled-dipole method. We find, in particular, that the presence of the quantum emitters in the cluster, even in small fractions, can significantly alter the absorption and extinction spectra of the sole cluster of the metallic nanoparticles, where the corresponding electromagnetic modes can have a weak plexcitonic character under suitable conditions.
Combinatoric analysis of heterogeneous stochastic self-assembly.
D'Orsogna, Maria R; Zhao, Bingyu; Berenji, Bijan; Chou, Tom
2013-09-28
We analyze a fully stochastic model of heterogeneous nucleation and self-assembly in a closed system with a fixed total particle number M, and a fixed number of seeds Ns. Each seed can bind a maximum of N particles. A discrete master equation for the probability distribution of the cluster sizes is derived and the corresponding cluster concentrations are found using kinetic Monte-Carlo simulations in terms of the density of seeds, the total mass, and the maximum cluster size. In the limit of slow detachment, we also find new analytic expressions and recursion relations for the cluster densities at intermediate times and at equilibrium. Our analytic and numerical findings are compared with those obtained from classical mass-action equations and the discrepancies between the two approaches analyzed.
Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A; Kardia, Sharon L R; Allison, Matthew; Diez Roux, Ana V
2016-11-01
There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.
Roca, Josep; Vargas, Claudia; Cano, Isaac; Selivanov, Vitaly; Barreiro, Esther; Maier, Dieter; Falciani, Francesco; Wagner, Peter; Cascante, Marta; Garcia-Aymerich, Judith; Kalko, Susana; De Mas, Igor; Tegnér, Jesper; Escarrabill, Joan; Agustí, Alvar; Gomez-Cabrero, David
2014-11-28
Heterogeneity in clinical manifestations and disease progression in Chronic Obstructive Pulmonary Disease (COPD) lead to consequences for patient health risk assessment, stratification and management. Implicit with the classical "spill over" hypothesis is that COPD heterogeneity is driven by the pulmonary events of the disease. Alternatively, we hypothesized that COPD heterogeneities result from the interplay of mechanisms governing three conceptually different phenomena: 1) pulmonary disease, 2) systemic effects of COPD and 3) co-morbidity clustering, each of them with their own dynamics. To explore the potential of a systems analysis of COPD heterogeneity focused on skeletal muscle dysfunction and on co-morbidity clustering aiming at generating predictive modeling with impact on patient management. To this end, strategies combining deterministic modeling and network medicine analyses of the Biobridge dataset were used to investigate the mechanisms of skeletal muscle dysfunction. An independent data driven analysis of co-morbidity clustering examining associated genes and pathways was performed using a large dataset (ICD9-CM data from Medicare, 13 million people). Finally, a targeted network analysis using the outcomes of the two approaches (skeletal muscle dysfunction and co-morbidity clustering) explored shared pathways between these phenomena. (1) Evidence of abnormal regulation of skeletal muscle bioenergetics and skeletal muscle remodeling showing a significant association with nitroso-redox disequilibrium was observed in COPD; (2) COPD patients presented higher risk for co-morbidity clustering than non-COPD patients increasing with ageing; and, (3) the on-going targeted network analyses suggests shared pathways between skeletal muscle dysfunction and co-morbidity clustering. The results indicate the high potential of a systems approach to address COPD heterogeneity. Significant knowledge gaps were identified that are relevant to shape strategies aiming at fostering 4P Medicine for patients with COPD.
Influence of birth cohort on age of onset cluster analysis in bipolar I disorder.
Bauer, M; Glenn, T; Alda, M; Andreassen, O A; Angelopoulos, E; Ardau, R; Baethge, C; Bauer, R; Bellivier, F; Belmaker, R H; Berk, M; Bjella, T D; Bossini, L; Bersudsky, Y; Cheung, E Y W; Conell, J; Del Zompo, M; Dodd, S; Etain, B; Fagiolini, A; Frye, M A; Fountoulakis, K N; Garneau-Fournier, J; Gonzalez-Pinto, A; Harima, H; Hassel, S; Henry, C; Iacovides, A; Isometsä, E T; Kapczinski, F; Kliwicki, S; König, B; Krogh, R; Kunz, M; Lafer, B; Larsen, E R; Lewitzka, U; Lopez-Jaramillo, C; MacQueen, G; Manchia, M; Marsh, W; Martinez-Cengotitabengoa, M; Melle, I; Monteith, S; Morken, G; Munoz, R; Nery, F G; O'Donovan, C; Osher, Y; Pfennig, A; Quiroz, D; Ramesar, R; Rasgon, N; Reif, A; Ritter, P; Rybakowski, J K; Sagduyu, K; Scippa, A M; Severus, E; Simhandl, C; Stein, D J; Strejilevich, S; Hatim Sulaiman, A; Suominen, K; Tagata, H; Tatebayashi, Y; Torrent, C; Vieta, E; Viswanath, B; Wanchoo, M J; Zetin, M; Whybrow, P C
2015-01-01
Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset, using a large, international database. The database includes 4037 patients with a diagnosis of bipolar I disorder, previously collected at 36 collection sites in 23 countries. Generalized estimating equations (GEE) were used to adjust the data for country median age, and in some models, birth cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After adjusting for the birth cohort or when considering only those born after 1959, two subgroups were found. With results of either two or three subgroups, the youngest subgroup was more likely to have a family history of mood disorders and a first episode with depressed polarity. However, without adjusting for birth cohort (three subgroups), family history and polarity of the first episode could not be distinguished between the middle and oldest subgroups. These results using international data confirm prior findings using single country data, that there are subgroups of bipolar I disorder based on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more useful for research. Copyright © 2014 Elsevier Masson SAS. All rights reserved.
Clusters in nonsmooth oscillator networks
NASA Astrophysics Data System (ADS)
Nicks, Rachel; Chambon, Lucie; Coombes, Stephen
2018-03-01
For coupled oscillator networks with Laplacian coupling, the master stability function (MSF) has proven a particularly powerful tool for assessing the stability of the synchronous state. Using tools from group theory, this approach has recently been extended to treat more general cluster states. However, the MSF and its generalizations require the determination of a set of Floquet multipliers from variational equations obtained by linearization around a periodic orbit. Since closed form solutions for periodic orbits are invariably hard to come by, the framework is often explored using numerical techniques. Here, we show that further insight into network dynamics can be obtained by focusing on piecewise linear (PWL) oscillator models. Not only do these allow for the explicit construction of periodic orbits, their variational analysis can also be explicitly performed. The price for adopting such nonsmooth systems is that many of the notions from smooth dynamical systems, and in particular linear stability, need to be modified to take into account possible jumps in the components of Jacobians. This is naturally accommodated with the use of saltation matrices. By augmenting the variational approach for studying smooth dynamical systems with such matrices we show that, for a wide variety of networks that have been used as models of biological systems, cluster states can be explicitly investigated. By way of illustration, we analyze an integrate-and-fire network model with event-driven synaptic coupling as well as a diffusively coupled network built from planar PWL nodes, including a reduction of the popular Morris-Lecar neuron model. We use these examples to emphasize that the stability of network cluster states can depend as much on the choice of single node dynamics as it does on the form of network structural connectivity. Importantly, the procedure that we present here, for understanding cluster synchronization in networks, is valid for a wide variety of systems in biology, physics, and engineering that can be described by PWL oscillators.
Identification of different geologic units using fuzzy constrained resistivity tomography
NASA Astrophysics Data System (ADS)
Singh, Anand; Sharma, S. P.
2018-01-01
Different geophysical inversion strategies are utilized as a component of an interpretation process that tries to separate geologic units based on the resistivity distribution. In the present study, we present the results of separating different geologic units using fuzzy constrained resistivity tomography. This was accomplished using fuzzy c means, a clustering procedure to improve the 2D resistivity image and geologic separation within the iterative minimization through inversion. First, we developed a Matlab-based inversion technique to obtain a reliable resistivity image using different geophysical data sets (electrical resistivity and electromagnetic data). Following this, the recovered resistivity model was converted into a fuzzy constrained resistivity model by assigning the highest probability value of each model cell to the cluster utilizing fuzzy c means clustering procedure during the iterative process. The efficacy of the algorithm is demonstrated using three synthetic plane wave electromagnetic data sets and one electrical resistivity field dataset. The presented approach shows improvement on the conventional inversion approach to differentiate between different geologic units if the correct number of geologic units will be identified. Further, fuzzy constrained resistivity tomography was performed to examine the augmentation of uranium mineralization in the Beldih open cast mine as a case study. We also compared geologic units identified by fuzzy constrained resistivity tomography with geologic units interpreted from the borehole information.
The symptom cluster-based approach to individualize patient-centered treatment for major depression.
Lin, Steven Y; Stevens, Michael B
2014-01-01
Unipolar major depressive disorder is a common, disabling, and costly disease that is the leading cause of ill health, early death, and suicide in the United States. Primary care doctors, in particular family physicians, are the first responders in this silent epidemic. Although more than a dozen different antidepressants in 7 distinct classes are widely used to treat depression in primary care, there is no evidence that one drug is superior to another. Comparative effectiveness studies have produced mixed results, and no specialty organization has published recommendations on how to choose antidepressants in a rational, evidence-based manner. In this article we present the theory and evidence for an individualized, patient-centered treatment model for major depression designed around a targeted symptom cluster-based approach to antidepressant selection. When using this model for healthy adults with major depressive disorder, the choice of antidepressants should be guided by the presence of 1 of 4 common symptom clusters: anxiety, fatigue, insomnia, and pain. This model was built to foster future research, provide a logical framework for teaching residents how to select antidepressants, and equip primary care doctors with a structured treatment strategy to deliver optimal patient-centered care in the management of a debilitating disease: major depressive disorder.
NASA Astrophysics Data System (ADS)
Vathsala, H.; Koolagudi, Shashidhar G.
2017-01-01
In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.
Fast Geometric Consensus Approach for Protein Model Quality Assessment
Adamczak, Rafal; Pillardy, Jaroslaw; Vallat, Brinda K.
2011-01-01
Abstract Model quality assessment (MQA) is an integral part of protein structure prediction methods that typically generate multiple candidate models. The challenge lies in ranking and selecting the best models using a variety of physical, knowledge-based, and geometric consensus (GC)-based scoring functions. In particular, 3D-Jury and related GC methods assume that well-predicted (sub-)structures are more likely to occur frequently in a population of candidate models, compared to incorrectly folded fragments. While this approach is very successful in the context of diversified sets of models, identifying similar substructures is computationally expensive since all pairs of models need to be superimposed using MaxSub or related heuristics for structure-to-structure alignment. Here, we consider a fast alternative, in which structural similarity is assessed using 1D profiles, e.g., consisting of relative solvent accessibilities and secondary structures of equivalent amino acid residues in the respective models. We show that the new approach, dubbed 1D-Jury, allows to implicitly compare and rank N models in O(N) time, as opposed to quadratic complexity of 3D-Jury and related clustering-based methods. In addition, 1D-Jury avoids computationally expensive 3D superposition of pairs of models. At the same time, structural similarity scores based on 1D profiles are shown to correlate strongly with those obtained using MaxSub. In terms of the ability to select the best models as top candidates 1D-Jury performs on par with other GC methods. Other potential applications of the new approach, including fast clustering of large numbers of intermediate structures generated by folding simulations, are discussed as well. PMID:21244273
Jacquez, Geoffrey M; Meliker, Jaymie R; Avruskin, Gillian A; Goovaerts, Pierre; Kaufmann, Andy; Wilson, Mark L; Nriagu, Jerome
2006-08-03
Methods for analyzing space-time variation in risk in case-control studies typically ignore residential mobility. We develop an approach for analyzing case-control data for mobile individuals and apply it to study bladder cancer in 11 counties in southeastern Michigan. At this time data collection is incomplete and no inferences should be drawn - we analyze these data to demonstrate the novel methods. Global, local and focused clustering of residential histories for 219 cases and 437 controls is quantified using time-dependent nearest neighbor relationships. Business address histories for 268 industries that release known or suspected bladder cancer carcinogens are analyzed. A logistic model accounting for smoking, gender, age, race and education specifies the probability of being a case, and is incorporated into the cluster randomization procedures. Sensitivity of clustering to definition of the proximity metric is assessed for 1 to 75 k nearest neighbors. Global clustering is partly explained by the covariates but remains statistically significant at 12 of the 14 levels of k considered. After accounting for the covariates 26 Local clusters are found in Lapeer, Ingham, Oakland and Jackson counties, with the clusters in Ingham and Oakland counties appearing in 1950 and persisting to the present. Statistically significant focused clusters are found about the business address histories of 22 industries located in Oakland (19 clusters), Ingham (2) and Jackson (1) counties. Clusters in central and southeastern Oakland County appear in the 1930's and persist to the present day. These methods provide a systematic approach for evaluating a series of increasingly realistic alternative hypotheses regarding the sources of excess risk. So long as selection of cases and controls is population-based and not geographically biased, these tools can provide insights into geographic risk factors that were not specifically assessed in the case-control study design.
Send, Robert; Kaila, Ville R. I.; Sundholm, Dage
2011-01-01
We investigate how the reduction of the virtual space affects coupled-cluster excitation energies at the approximate singles and doubles coupled-cluster level (CC2). In this reduced-virtual-space (RVS) approach, all virtual orbitals above a certain energy threshold are omitted in the correlation calculation. The effects of the RVS approach are assessed by calculations on the two lowest excitation energies of 11 biochromophores using different sizes of the virtual space. Our set of biochromophores consists of common model systems for the chromophores of the photoactive yellow protein, the green fluorescent protein, and rhodopsin. The RVS calculations show that most of the high-lying virtual orbitals can be neglected without significantly affecting the accuracy of the obtained excitation energies. Omitting all virtual orbitals above 50 eV in the correlation calculation introduces errors in the excitation energies that are smaller than 0.1 eV . By using a RVS energy threshold of 50 eV , the CC2 calculations using triple-ζ basis sets (TZVP) on protonated Schiff base retinal are accelerated by a factor of 6. We demonstrate the applicability of the RVS approach by performing CC2∕TZVP calculations on the lowest singlet excitation energy of a rhodopsin model consisting of 165 atoms using RVS thresholds between 20 eV and 120 eV. The calculations on the rhodopsin model show that the RVS errors determined in the gas-phase are a very good approximation to the RVS errors in the protein environment. The RVS approach thus renders purely quantum mechanical treatments of chromophores in protein environments feasible and offers an ab initio alternative to quantum mechanics∕molecular mechanics separation schemes. PMID:21663351
Send, Robert; Kaila, Ville R I; Sundholm, Dage
2011-06-07
We investigate how the reduction of the virtual space affects coupled-cluster excitation energies at the approximate singles and doubles coupled-cluster level (CC2). In this reduced-virtual-space (RVS) approach, all virtual orbitals above a certain energy threshold are omitted in the correlation calculation. The effects of the RVS approach are assessed by calculations on the two lowest excitation energies of 11 biochromophores using different sizes of the virtual space. Our set of biochromophores consists of common model systems for the chromophores of the photoactive yellow protein, the green fluorescent protein, and rhodopsin. The RVS calculations show that most of the high-lying virtual orbitals can be neglected without significantly affecting the accuracy of the obtained excitation energies. Omitting all virtual orbitals above 50 eV in the correlation calculation introduces errors in the excitation energies that are smaller than 0.1 eV. By using a RVS energy threshold of 50 eV, the CC2 calculations using triple-ζ basis sets (TZVP) on protonated Schiff base retinal are accelerated by a factor of 6. We demonstrate the applicability of the RVS approach by performing CC2/TZVP calculations on the lowest singlet excitation energy of a rhodopsin model consisting of 165 atoms using RVS thresholds between 20 eV and 120 eV. The calculations on the rhodopsin model show that the RVS errors determined in the gas-phase are a very good approximation to the RVS errors in the protein environment. The RVS approach thus renders purely quantum mechanical treatments of chromophores in protein environments feasible and offers an ab initio alternative to quantum mechanics/molecular mechanics separation schemes. © 2011 American Institute of Physics
Review of Recent Development of Dynamic Wind Farm Equivalent Models Based on Big Data Mining
NASA Astrophysics Data System (ADS)
Wang, Chenggen; Zhou, Qian; Han, Mingzhe; Lv, Zhan’ao; Hou, Xiao; Zhao, Haoran; Bu, Jing
2018-04-01
Recently, the big data mining method has been applied in dynamic wind farm equivalent modeling. In this paper, its recent development with present research both domestic and overseas is reviewed. Firstly, the studies of wind speed prediction, equivalence and its distribution in the wind farm are concluded. Secondly, two typical approaches used in the big data mining method is introduced, respectively. For single wind turbine equivalent modeling, it focuses on how to choose and identify equivalent parameters. For multiple wind turbine equivalent modeling, the following three aspects are concentrated, i.e. aggregation of different wind turbine clusters, the parameters in the same cluster, and equivalence of collector system. Thirdly, an outlook on the development of dynamic wind farm equivalent models in the future is discussed.
Ekins, Sean; Freundlich, Joel S.; Hobrath, Judith V.; White, E. Lucile; Reynolds, Robert C
2013-01-01
Purpose Tuberculosis treatments need to be shorter and overcome drug resistance. Our previous large scale phenotypic high-throughput screening against Mycobacterium tuberculosis (Mtb) has identified 737 active compounds and thousands that are inactive. We have used this data for building computational models as an approach to minimize the number of compounds tested. Methods A cheminformatics clustering approach followed by Bayesian machine learning models (based on publicly available Mtb screening data) was used to illustrate that application of these models for screening set selections can enrich the hit rate. Results In order to explore chemical diversity around active cluster scaffolds of the dose-response hits obtained from our previous Mtb screens a set of 1924 commercially available molecules have been selected and evaluated for antitubercular activity and cytotoxicity using Vero, THP-1 and HepG2 cell lines with 4.3%, 4.2% and 2.7% hit rates, respectively. We demonstrate that models incorporating antitubercular and cytotoxicity data in Vero cells can significantly enrich the selection of non-toxic actives compared to random selection. Across all cell lines, the Molecular Libraries Small Molecule Repository (MLSMR) and cytotoxicity model identified ~10% of the hits in the top 1% screened (>10 fold enrichment). We also showed that seven out of nine Mtb active compounds from different academic published studies and eight out of eleven Mtb active compounds from a pharmaceutical screen (GSK) would have been identified by these Bayesian models. Conclusion Combining clustering and Bayesian models represents a useful strategy for compound prioritization and hit-to lead optimization of antitubercular agents. PMID:24132686
NASA Astrophysics Data System (ADS)
Shvartsburg, Alexandre A.; Siu, K. W. Michael
2001-06-01
Modeling the delayed dissociation of clusters had been over the last decade a frontline development area in chemical physics. It is of fundamental interest how statistical kinetics methods previously validated for regular molecules and atomic nuclei may apply to clusters, as this would help to understand the transferability of statistical models for disintegration of complex systems across various classes of physical objects. From a practical perspective, accurate simulation of unimolecular decomposition is critical for the extraction of true thermochemical values from measurements on the decay of energized clusters. Metal clusters are particularly challenging because of the multitude of low-lying electronic states that are coupled to vibrations. This has previously been accounted for assuming the average electronic structure of a conducting cluster approximated by the levels of electron in a cavity. While this provides a reasonable time-averaged description, it ignores the distribution of instantaneous electronic structures in a "boiling" cluster around that average. Here we set up a new treatment that incorporates the statistical distribution of electronic levels around the average picture using random matrix theory. This approach faithfully reflects the completely chaotic "vibronic soup" nature of hot metal clusters. We found that the consideration of electronic level statistics significantly promotes electronic excitation and thus increases the magnitude of its effect. As this excitation always depresses the decay rates, the inclusion of level statistics results in slower dissociation of metal clusters.
Improved Test Planning and Analysis Through the Use of Advanced Statistical Methods
NASA Technical Reports Server (NTRS)
Green, Lawrence L.; Maxwell, Katherine A.; Glass, David E.; Vaughn, Wallace L.; Barger, Weston; Cook, Mylan
2016-01-01
The goal of this work is, through computational simulations, to provide statistically-based evidence to convince the testing community that a distributed testing approach is superior to a clustered testing approach for most situations. For clustered testing, numerous, repeated test points are acquired at a limited number of test conditions. For distributed testing, only one or a few test points are requested at many different conditions. The statistical techniques of Analysis of Variance (ANOVA), Design of Experiments (DOE) and Response Surface Methods (RSM) are applied to enable distributed test planning, data analysis and test augmentation. The D-Optimal class of DOE is used to plan an optimally efficient single- and multi-factor test. The resulting simulated test data are analyzed via ANOVA and a parametric model is constructed using RSM. Finally, ANOVA can be used to plan a second round of testing to augment the existing data set with new data points. The use of these techniques is demonstrated through several illustrative examples. To date, many thousands of comparisons have been performed and the results strongly support the conclusion that the distributed testing approach outperforms the clustered testing approach.
Algebraic approach to small-world network models
NASA Astrophysics Data System (ADS)
Rudolph-Lilith, Michelle; Muller, Lyle E.
2014-01-01
We introduce an analytic model for directed Watts-Strogatz small-world graphs and deduce an algebraic expression of its defining adjacency matrix. The latter is then used to calculate the small-world digraph's asymmetry index and clustering coefficient in an analytically exact fashion, valid nonasymptotically for all graph sizes. The proposed approach is general and can be applied to all algebraically well-defined graph-theoretical measures, thus allowing for an analytical investigation of finite-size small-world graphs.
Bahlmann, Claus; Burkhardt, Hans
2004-03-01
In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.
Worldwide Topology of the Scientific Subject Profile: A Macro Approach in the Country Level
Moya-Anegón, Félix; Herrero-Solana, Víctor
2013-01-01
Background Models for the production of knowledge and systems of innovation and science are key elements for characterizing a country in view of its scientific thematic profile. With regard to scientific output and publication in journals of international visibility, the countries of the world may be classified into three main groups according to their thematic bias. Methodology/Principal Findings This paper aims to classify the countries of the world in several broad groups, described in terms of behavioural models that attempt to sum up the characteristics of their systems of knowledge and innovation. We perceive three clusters in our analysis: 1) the biomedical cluster, 2) the basic science & engineering cluster, and 3) the agricultural cluster. The countries are conceptually associated with the clusters via Principal Component Analysis (PCA), and a Multidimensional Scaling (MDS) map with all the countries is presented. Conclusions/Significance As we have seen, insofar as scientific output and publication in journals of international visibility is concerned, the countries of the world may be classified into three main groups according to their thematic profile. These groups can be described in terms of behavioral models that attempt to sum up the characteristics of their systems of knowledge and innovation. PMID:24349467
Spatiotemporal clusters of malaria cases at village level, northwest Ethiopia.
Alemu, Kassahun; Worku, Alemayehu; Berhane, Yemane; Kumie, Abera
2014-06-06
Malaria attacks are not evenly distributed in space and time. In highland areas with low endemicity, malaria transmission is highly variable and malaria acquisition risk for individuals is unevenly distributed even within a neighbourhood. Characterizing the spatiotemporal distribution of malaria cases in high-altitude villages is necessary to prioritize the risk areas and facilitate interventions. Spatial scan statistics using the Bernoulli method were employed to identify spatial and temporal clusters of malaria in high-altitude villages. Daily malaria data were collected, using a passive surveillance system, from patients visiting local health facilities. Georeference data were collected at villages using hand-held global positioning system devices and linked to patient data. Bernoulli model using Bayesian approaches and Marcov Chain Monte Carlo (MCMC) methods were used to identify the effects of factors on spatial clusters of malaria cases. The deviance information criterion (DIC) was used to assess the goodness-of-fit of the different models. The smaller the DIC, the better the model fit. Malaria cases were clustered in both space and time in high-altitude villages. Spatial scan statistics identified a total of 56 spatial clusters of malaria in high-altitude villages. Of these, 39 were the most likely clusters (LLR = 15.62, p < 0.00001) and 17 were secondary clusters (LLR = 7.05, p < 0.03). The significant most likely temporal malaria clusters were detected between August and December (LLR = 17.87, p < 0.001). Travel away home, males and age above 15 years had statistically significant effect on malaria clusters at high-altitude villages. The study identified spatial clusters of malaria cases occurring at high elevation villages within the district. A patient who travelled away from home to a malaria-endemic area might be the most probable source of malaria infection in a high-altitude village. Malaria interventions in high altitude villages should address factors associated with malaria clustering.
Scalable cluster administration - Chiba City I approach and lessons learned.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Navarro, J. P.; Evard, R.; Nurmi, D.
2002-07-01
Systems administrators of large clusters often need to perform the same administrative activity hundreds or thousands of times. Often such activities are time-consuming, especially the tasks of installing and maintaining software. By combining network services such as DHCP, TFTP, FTP, HTTP, and NFS with remote hardware control, cluster administrators can automate all administrative tasks. Scalable cluster administration addresses the following challenge: What systems design techniques can cluster builders use to automate cluster administration on very large clusters? We describe the approach used in the Mathematics and Computer Science Division of Argonne National Laboratory on Chiba City I, a 314-node Linuxmore » cluster; and we analyze the scalability, flexibility, and reliability benefits and limitations from that approach.« less
Berry, Jack W; Schwebel, David C
2009-10-01
This study used two configural approaches to understand how temperament factors (surgency/extraversion, negative affect, and effortful control) might predict child injury risk. In the first approach, clustering procedures were applied to trait dimensions to identify discrete personality prototypes. In the second approach, two- and three-way trait interactions were considered dimensionally in regression models predicting injury outcomes. Injury risk was assessed through four measures: lifetime prevalence of injuries requiring professional medical attention, scores on the Injury Behavior Checklist, and frequency and severity of injuries reported in a 2-week injury diary. In the prototype analysis, three temperament clusters were obtained, which resembled resilient, overcontrolled, and undercontrolled types found in previous research. Undercontrolled children had greater risk of injury than children in the other groups. In the dimensional interaction analyses, an interaction between surgency/extraversion and negative affect tended to predict injury, especially when children lacked capacity for effortful control.
MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors.
Hedeker, D; Gibbons, R D
1996-05-01
MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal data, where individuals may be measured at a different number of timepoints, or even at different timepoints. Autocorrelated errors of a general form or following an AR(1), MA(1), or ARMA(1,1) form are allowable. This model can also be used for analysis of clustered data, where the mixed-effects model assumes data within clusters are dependent. The degree of dependency is estimated jointly with estimates of the usual model parameters, thus adjusting for clustering. MIXREG uses maximum marginal likelihood estimation, utilizing both the EM algorithm and a Fisher-scoring solution. For the scoring solution, the covariance matrix of the random effects is expressed in its Gaussian decomposition, and the diagonal matrix reparameterized using the exponential transformation. Estimation of the individual random effects is accomplished using an empirical Bayes approach. Examples illustrating usage and features of MIXREG are provided.
Nielsen, J D; Dean, C B
2008-09-01
A flexible semiparametric model for analyzing longitudinal panel count data arising from mixtures is presented. Panel count data refers here to count data on recurrent events collected as the number of events that have occurred within specific follow-up periods. The model assumes that the counts for each subject are generated by mixtures of nonhomogeneous Poisson processes with smooth intensity functions modeled with penalized splines. Time-dependent covariate effects are also incorporated into the process intensity using splines. Discrete mixtures of these nonhomogeneous Poisson process spline models extract functional information from underlying clusters representing hidden subpopulations. The motivating application is an experiment to test the effectiveness of pheromones in disrupting the mating pattern of the cherry bark tortrix moth. Mature moths arise from hidden, but distinct, subpopulations and monitoring the subpopulation responses was of interest. Within-cluster random effects are used to account for correlation structures and heterogeneity common to this type of data. An estimating equation approach to inference requiring only low moment assumptions is developed and the finite sample properties of the proposed estimating functions are investigated empirically by simulation.
Exact hierarchical clustering in one dimension. [in universe
NASA Technical Reports Server (NTRS)
Williams, B. G.; Heavens, A. F.; Peacock, J. A.; Shandarin, S. F.
1991-01-01
The present adhesion model-based one-dimensional simulations of gravitational clustering have yielded bound-object catalogs applicable in tests of analytical approaches to cosmological structure formation. Attention is given to Press-Schechter (1974) type functions, as well as to their density peak-theory modifications and the two-point correlation function estimated from peak theory. The extent to which individual collapsed-object locations can be predicted by linear theory is significant only for objects of near-characteristic nonlinear mass.
NASA Astrophysics Data System (ADS)
Tohsaki, Akihiro; Itagaki, Naoyuki
2018-01-01
We study α -cluster structure based on the geometric configurations with a microscopic framework, which takes full account of the Pauli principle, and which also employs an effective internucleon force including finite-range three-body terms suitable for microscopic α -cluster models. Here, special attention is focused upon the α clustering with a hollow structure; all the α clusters are put on the surface of a sphere. All the platonic solids (five regular polyhedra) and the fullerene-shaped polyhedron coming from icosahedral structure are considered. Furthermore, two configurations with dual polyhedra, hexahedron-octahedron and dodecahedron-icosahedron, are also scrutinized. When approaching each other from large distances with these symmetries, α clusters create certain local energy pockets. As a consequence, we insist on the possible existence of α clustering with a geometric shape and hollow structure, which is favored from Coulomb energy point of view. Especially, two configurations, that is, dual polyhedra of dodecahedron-icosahedron and fullerene, have a prominent hollow structure compared with the other six configurations.
Relative efficiency and sample size for cluster randomized trials with variable cluster sizes.
You, Zhiying; Williams, O Dale; Aban, Inmaculada; Kabagambe, Edmond Kato; Tiwari, Hemant K; Cutter, Gary
2011-02-01
The statistical power of cluster randomized trials depends on two sample size components, the number of clusters per group and the numbers of individuals within clusters (cluster size). Variable cluster sizes are common and this variation alone may have significant impact on study power. Previous approaches have taken this into account by either adjusting total sample size using a designated design effect or adjusting the number of clusters according to an assessment of the relative efficiency of unequal versus equal cluster sizes. This article defines a relative efficiency of unequal versus equal cluster sizes using noncentrality parameters, investigates properties of this measure, and proposes an approach for adjusting the required sample size accordingly. We focus on comparing two groups with normally distributed outcomes using t-test, and use the noncentrality parameter to define the relative efficiency of unequal versus equal cluster sizes and show that statistical power depends only on this parameter for a given number of clusters. We calculate the sample size required for an unequal cluster sizes trial to have the same power as one with equal cluster sizes. Relative efficiency based on the noncentrality parameter is straightforward to calculate and easy to interpret. It connects the required mean cluster size directly to the required sample size with equal cluster sizes. Consequently, our approach first determines the sample size requirements with equal cluster sizes for a pre-specified study power and then calculates the required mean cluster size while keeping the number of clusters unchanged. Our approach allows adjustment in mean cluster size alone or simultaneous adjustment in mean cluster size and number of clusters, and is a flexible alternative to and a useful complement to existing methods. Comparison indicated that we have defined a relative efficiency that is greater than the relative efficiency in the literature under some conditions. Our measure of relative efficiency might be less than the measure in the literature under some conditions, underestimating the relative efficiency. The relative efficiency of unequal versus equal cluster sizes defined using the noncentrality parameter suggests a sample size approach that is a flexible alternative and a useful complement to existing methods.
Using "big data" to optimally model hydrology and water quality across expansive regions
Roehl, E.A.; Cook, J.B.; Conrads, P.A.
2009-01-01
This paper describes a new divide and conquer approach that leverages big environmental data, utilizing all available categorical and time-series data without subjectivity, to empirically model hydrologic and water-quality behaviors across expansive regions. The approach decomposes large, intractable problems into smaller ones that are optimally solved; decomposes complex signals into behavioral components that are easier to model with "sub- models"; and employs a sequence of numerically optimizing algorithms that include time-series clustering, nonlinear, multivariate sensitivity analysis and predictive modeling using multi-layer perceptron artificial neural networks, and classification for selecting the best sub-models to make predictions at new sites. This approach has many advantages over traditional modeling approaches, including being faster and less expensive, more comprehensive in its use of available data, and more accurate in representing a system's physical processes. This paper describes the application of the approach to model groundwater levels in Florida, stream temperatures across Western Oregon and Wisconsin, and water depths in the Florida Everglades. ?? 2009 ASCE.
Siudem, Grzegorz; Fronczak, Agata; Fronczak, Piotr
2016-10-10
In this paper, we provide the exact expression for the coefficients in the low-temperature series expansion of the partition function of the two-dimensional Ising model on the infinite square lattice. This is equivalent to exact determination of the number of spin configurations at a given energy. With these coefficients, we show that the ferromagnetic-to-paramagnetic phase transition in the square lattice Ising model can be explained through equivalence between the model and the perfect gas of energy clusters model, in which the passage through the critical point is related to the complete change in the thermodynamic preferences on the size of clusters. The combinatorial approach reported in this article is very general and can be easily applied to other lattice models.
Siudem, Grzegorz; Fronczak, Agata; Fronczak, Piotr
2016-01-01
In this paper, we provide the exact expression for the coefficients in the low-temperature series expansion of the partition function of the two-dimensional Ising model on the infinite square lattice. This is equivalent to exact determination of the number of spin configurations at a given energy. With these coefficients, we show that the ferromagnetic–to–paramagnetic phase transition in the square lattice Ising model can be explained through equivalence between the model and the perfect gas of energy clusters model, in which the passage through the critical point is related to the complete change in the thermodynamic preferences on the size of clusters. The combinatorial approach reported in this article is very general and can be easily applied to other lattice models. PMID:27721435
NASA Astrophysics Data System (ADS)
Chen, C. W.; Chung, H. Y.; Chiang, H.-P.; Lu, J. Y.; Chang, R.; Tsai, D. P.; Leung, P. T.
2010-10-01
The optical properties of composites with metallic nanoparticles are studied, taking into account the effects due to the nonlocal dielectric response of the metal and the coalescing of the particles to form clusters. An approach based on various effective medium theories is followed, and the modeling results are compared with those from the cases with local response and particles randomly distributed through the host medium. Possible observations of our modeling results are illustrated via a calculation of the transmission of light through a thin film made of these materials. It is found that the nonlocal effects are particularly significant when the particles coalesce, leading to blue-shifted resonances and slightly lower values in the dielectric functions. The dependence of these effects on the volume fraction and fractal dimension of the metal clusters is studied in detail.
Catalysis applications of size-selected cluster deposition
Vajda, Stefan; White, Michael G.
2015-10-23
In this Perspective, we review recent studies of size-selected cluster deposition for catalysis applications performed at the U.S. DOE National Laboratories, with emphasis on work at Argonne National Laboratory (ANL) and Brookhaven National Laboratory (BNL). The focus is on the preparation of model supported catalysts in which the number of atoms in the deposited clusters is precisely controlled using a combination of gas-phase cluster ion sources, mass spectrometry, and soft-landing techniques. This approach is particularly effective for investigations of small nanoclusters, 0.5-2 nm (<200 atoms), where the rapid evolution of the atomic and electronic structure makes it essential to havemore » precise control over cluster size. Cluster deposition allows for independent control of cluster size, coverage, and stoichiometry (e.g., the metal-to-oxygen ratio in an oxide cluster) and can be used to deposit on any substrate without constraints of nucleation and growth. Examples are presented for metal, metal oxide, and metal sulfide cluster deposition on a variety of supports (metals, oxides, carbon/diamond) where the reactivity, cluster-support electronic interactions, and cluster stability and morphology are investigated. Both UHV and in situ/operando studies are presented that also make use of surface-sensitive X-ray characterization tools from synchrotron radiation facilities. Novel applications of cluster deposition to electrochemistry and batteries are also presented. This review also highlights the application of modern ab initio electronic structure calculations (density functional theory), which can essentially model the exact experimental system used in the laboratory (i.e., cluster and support) to provide insight on atomic and electronic structure, reaction energetics, and mechanisms. As amply demonstrated in this review, the powerful combination of atomically precise cluster deposition and theory is able to address fundamental aspects of size-effects, cluster-support interactions, and reaction mechanisms of cluster materials that are central to how catalysts function. Lastly, the insight gained from such studies can be used to further the development of novel nanostructured catalysts with high activity and selectivity.« less
2014-01-01
Background Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. Results MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy. PMID:24731387
Cao, Renzhi; Wang, Zheng; Cheng, Jianlin
2014-04-15
Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy.
Economic 3D-printing approach for transplantation of human stem cell-derived β-like cells
Song, Jiwon; Millman, Jeffrey R.
2016-01-01
Transplantation of human pluripotent stem cells (hPSC) differentiated into insulin-producing β cells is a regenerative medicine approach being investigated for diabetes cell replacement therapy. This report presents a multifaceted transplantation strategy that combines differentiation into stem cell-derived β (SC-β) cells with 3D printing. By modulating the parameters of a low-cost 3D printer, we created a macroporous device composed of polylactic acid (PLA) that houses SC-β cell clusters within a degradable fibrin gel. Using finite element modeling of cellular oxygen diffusion-consumption and an in vitro culture system that allows for culture of devices at physiological oxygen levels, we identified cluster sizes that avoid severe hypoxia within 3D-printed devices and developed a microwell-based technique for resizing clusters within this range. Upon transplantation into mice, SC-β cell-embedded 3D-printed devices function for 12 weeks, are retrievable, and maintain structural integrity. Here, we demonstrate a novel 3D-printing approach that advances the use of differentiated hPSC for regenerative medicine applications and serves as a platform for future transplantation strategies. PMID:27906687
Economic 3D-printing approach for transplantation of human stem cell-derived β-like cells.
Song, Jiwon; Millman, Jeffrey R
2016-12-01
Transplantation of human pluripotent stem cells (hPSC) differentiated into insulin-producing β cells is a regenerative medicine approach being investigated for diabetes cell replacement therapy. This report presents a multifaceted transplantation strategy that combines differentiation into stem cell-derived β (SC-β) cells with 3D printing. By modulating the parameters of a low-cost 3D printer, we created a macroporous device composed of polylactic acid (PLA) that houses SC-β cell clusters within a degradable fibrin gel. Using finite element modeling of cellular oxygen diffusion-consumption and an in vitro culture system that allows for culture of devices at physiological oxygen levels, we identified cluster sizes that avoid severe hypoxia within 3D-printed devices and developed a microwell-based technique for resizing clusters within this range. Upon transplantation into mice, SC-β cell-embedded 3D-printed devices function for 12 weeks, are retrievable, and maintain structural integrity. Here, we demonstrate a novel 3D-printing approach that advances the use of differentiated hPSC for regenerative medicine applications and serves as a platform for future transplantation strategies.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Disparities in urban/rural environmental quality
Individuals experience simultaneous exposure to many pollutants and social factors, which cluster to affect human health outcomes. Because the optimal approach to combining these factors is unknown, we developed a method to model simultaneous exposure using criteria air pollutant...
Theoretical microbial ecology without species
NASA Astrophysics Data System (ADS)
Tikhonov, Mikhail
2017-09-01
Ecosystems are commonly conceptualized as networks of interacting species. However, partitioning natural diversity of organisms into discrete units is notoriously problematic and mounting experimental evidence raises the intriguing question whether this perspective is appropriate for the microbial world. Here an alternative formalism is proposed that does not require postulating the existence of species as fundamental ecological variables and provides a naturally hierarchical description of community dynamics. This formalism allows approaching the species problem from the opposite direction. While the classical models treat a world of imperfectly clustered organism types as a perturbation around well-clustered species, the presented approach allows gradually adding structure to a fully disordered background. The relevance of this theoretical construct for describing highly diverse natural ecosystems is discussed.
A multi-approach to the optical depth of a contrail cirrus cluster
NASA Astrophysics Data System (ADS)
Vazquez-Navarro, Margarita; Bugliaro, Luca; Schumann, Ulrich; Strandgren, Johan; Wirth, Martin; Voigt, Christiane
2017-04-01
Amongst the individual aviation emissions, contrail cirrus contribute the largest fraction to the aviation effects on climate. To investigate the optical depth from contrail cirrus, we selected a cirrus and contrail cloud outbreak on the 10th April 2014 between the North Sea and Switzerland detected during the ML-CIRRUS experiment (Voigt et al., 2017). The outbreak was not forecast by weather prediction models. We describe its origin and evolution using a combination of in-situ measurements, remote sensing approaches and contrail prediction model prognosis. The in-situ and lidar measurements were carried out with the HALO aircraft, where the cirrus was first identified. Model predictions from the contrail prediction model CoCiP (Schumann et al., 2012) point to an anthropogenic origin. The satellite pictures from the SEVIRI imager on MSG combined with the use of a contrail cluster tracking algorithm enable the automatic assessment of the origin, displacement and growth of the cloud and the correct labeling of cluster pixels. The evolution of the optical depth and particle size of the selected cluster pixels were derived using the CiPS algorithm, a neural network primarily based on SEVIRI images. The CoCiP forecast of the cluster compared to the actual cluster tracking show that the model correctly predicts the occurrence of the cluster and its advection direction although the cluster spreads faster than simulated. The optical depth derived from CiPS and from the airborne high spectral resolution lidar WALES are compared and show a remarkably good agreement. This confirms that the new CiPS algorithm is a very powerful tool for the assessment of the optical depth of even optically thinner cirrus clouds. References: Schumann, U.: A contrail cirrus prediction model, Geosci. Model Dev., 5, 543-580, doi: 10.5194/gmd-5-543-2012, 2012. Voigt, C., Schumann, U., Minikin, A., Abdelmonem, A., Afchine, A., Borrmann, S., Boettcher, M., Buchholz, B., Bugliaro, L., Costa, A., Curtius, J., Dollner, M., Dörnbrack, A., Dreiling, V., Ebert, V., Ehrlich, A., Fix, A., Forster, L., Frank, F., Fütterer, D., Giez, A., Graf, K., Grooß, J.-U., Groß, S., Heimerl, K., Heinold, B., Hüneke, T., Järvinen, E., Jurkat, T., Kaufmann, S., Kenntner, M., Klingebiel, M., Klimach, T., Kohl, R., Krämer, M., Krisna, T. C., Luebke, A., Mayer, B., Mertes, S., Molleker, S., Petzold, A., Pfeilsticker, K., Port, M., Rapp, M., Reutter, P., Rolf, C., Rose, D., Sauer, D., Schäfler, A., Schlage, R., Schnaiter, M., Schneider, J., Spelten, N., Spichtinger, P., Stock, P., Walser, A., Weigel, R., Weinzierl, B., Wendisch, M., Werner, F., Wernli, H., Wirth, M., Zahn, A., Ziereis, H., and Zöger, M.: ML-CIRRUS - The airborne experiment on natural cirrus and contrail cirrus with the high-altitude long-range research aircraft HALO, Bull. Amer. Meteorol. Soc., in press, doi: 10.1175/BAMS-D-15-00213.1, 2017.
McGrath, L M; Mustanski, B; Metzger, A; Pine, D S; Kistner-Griffin, E; Cook, E; Wakschlag, L S
2012-08-01
This study illustrates the application of a latent modeling approach to genotype-phenotype relationships and gene × environment interactions, using a novel, multidimensional model of adult female problem behavior, including maternal prenatal smoking. The gene of interest is the monoamine oxidase A (MAOA) gene which has been well studied in relation to antisocial behavior. Participants were adult women (N = 192) who were sampled from a prospective pregnancy cohort of non-Hispanic, white individuals recruited from a neighborhood health clinic. Structural equation modeling was used to model a female problem behavior phenotype, which included conduct problems, substance use, impulsive-sensation seeking, interpersonal aggression, and prenatal smoking. All of the female problem behavior dimensions clustered together strongly, with the exception of prenatal smoking. A main effect of MAOA genotype and a MAOA × physical maltreatment interaction were detected with the Conduct Problems factor. Our phenotypic model showed that prenatal smoking is not simply a marker of other maternal problem behaviors. The risk variant in the MAOA main effect and interaction analyses was the high activity MAOA genotype, which is discrepant from consensus findings in male samples. This result contributes to an emerging literature on sex-specific interaction effects for MAOA.
2012-01-01
Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526
Tremblay, Marlène; Hess, Justin P; Christenson, Brock M; McIntyre, Kolby K; Smink, Ben; van der Kamp, Arjen J; de Jong, Lisanne G; Döpfer, Dörte
2016-07-01
Automatic milking systems (AMS) are implemented in a variety of situations and environments. Consequently, there is a need to characterize individual farming practices and regional challenges to streamline management advice and objectives for producers. Benchmarking is often used in the dairy industry to compare farms by computing percentile ranks of the production values of groups of farms. Grouping for conventional benchmarking is commonly limited to the use of a few factors such as farms' geographic region or breed of cattle. We hypothesized that herds' production data and management information could be clustered in a meaningful way using cluster analysis and that this clustering approach would yield better peer groups of farms than benchmarking methods based on criteria such as country, region, breed, or breed and region. By applying mixed latent-class model-based cluster analysis to 529 North American AMS dairy farms with respect to 18 significant risk factors, 6 clusters were identified. Each cluster (i.e., peer group) represented unique management styles, challenges, and production patterns. When compared with peer groups based on criteria similar to the conventional benchmarking standards, the 6 clusters better predicted milk produced (kilograms) per robot per day. Each cluster represented a unique management and production pattern that requires specialized advice. For example, cluster 1 farms were those that recently installed AMS robots, whereas cluster 3 farms (the most northern farms) fed high amounts of concentrates through the robot to compensate for low-energy feed in the bunk. In addition to general recommendations for farms within a cluster, individual farms can generate their own specific goals by comparing themselves to farms within their cluster. This is very comparable to benchmarking but adds the specific characteristics of the peer group, resulting in better farm management advice. The improvement that cluster analysis allows for is characterized by the multivariable approach and the fact that comparisons between production units can be accomplished within a cluster and between clusters as a choice. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Classifying Correlation Matrices into Relatively Homogeneous Subgroups: A Cluster Analytic Approach
ERIC Educational Resources Information Center
Cheung, Mike W.-L.; Chan, Wai
2005-01-01
Researchers are becoming interested in combining meta-analytic techniques and structural equation modeling to test theoretical models from a pool of studies. Most existing procedures are based on the assumption that all correlation matrices are homogeneous. Few studies have addressed what the next step should be when studies being analyzed are…
Modeling Dynamic Regulatory Processes in Stroke.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDermott, Jason E.; Jarman, Kenneth D.; Taylor, Ronald C.
2012-10-11
The ability to examine in silico the behavior of biological systems can greatly accelerate the pace of discovery in disease pathologies, such as stroke, where in vivo experimentation is lengthy and costly. In this paper we describe an approach to in silico examination of blood genomic responses to neuroprotective agents and subsequent stroke through the development of dynamic models of the regulatory processes observed in the experimental gene expression data. First, we identified functional gene clusters from these data. Next, we derived ordinary differential equations (ODEs) relating regulators and functional clusters from the data. These ODEs were used to developmore » dynamic models that simulate the expression of regulated functional clusters using system dynamics as the modeling paradigm. The dynamic model has the considerable advantage of only requiring an initial starting state, and does not require measurement of regulatory influences at each time point in order to make accurate predictions. The manipulation of input model parameters, such as changing the magnitude of gene expression, made it possible to assess the behavior of the networks through time under varying conditions. We report that an optimized dynamic model can provide accurate predictions of overall system behavior under several different preconditioning paradigms.« less
NASA Astrophysics Data System (ADS)
Walz, Michael; Leckebusch, Gregor C.
2016-04-01
Extratropical wind storms pose one of the most dangerous and loss intensive natural hazards for Europe. However, due to only 50 years of high quality observational data, it is difficult to assess the statistical uncertainty of these sparse events just based on observations. Over the last decade seasonal ensemble forecasts have become indispensable in quantifying the uncertainty of weather prediction on seasonal timescales. In this study seasonal forecasts are used in a climatological context: By making use of the up to 51 ensemble members, a broad and physically consistent statistical base can be created. This base can then be used to assess the statistical uncertainty of extreme wind storm occurrence more accurately. In order to determine the statistical uncertainty of storms with different paths of progression, a probabilistic clustering approach using regression mixture models is used to objectively assign storm tracks (either based on core pressure or on extreme wind speeds) to different clusters. The advantage of this technique is that the entire lifetime of a storm is considered for the clustering algorithm. Quadratic curves are found to describe the storm tracks most accurately. Three main clusters (diagonal, horizontal or vertical progression of the storm track) can be identified, each of which have their own particulate features. Basic storm features like average velocity and duration are calculated and compared for each cluster. The main benefit of this clustering technique, however, is to evaluate if the clusters show different degrees of uncertainty, e.g. more (less) spread for tracks approaching Europe horizontally (diagonally). This statistical uncertainty is compared for different seasonal forecast products.
Managing distance and covariate information with point-based clustering.
Whigham, Peter A; de Graaf, Brandon; Srivastava, Rashmi; Glue, Paul
2016-09-01
Geographic perspectives of disease and the human condition often involve point-based observations and questions of clustering or dispersion within a spatial context. These problems involve a finite set of point observations and are constrained by a larger, but finite, set of locations where the observations could occur. Developing a rigorous method for pattern analysis in this context requires handling spatial covariates, a method for constrained finite spatial clustering, and addressing bias in geographic distance measures. An approach, based on Ripley's K and applied to the problem of clustering with deliberate self-harm (DSH), is presented. Point-based Monte-Carlo simulation of Ripley's K, accounting for socio-economic deprivation and sources of distance measurement bias, was developed to estimate clustering of DSH at a range of spatial scales. A rotated Minkowski L1 distance metric allowed variation in physical distance and clustering to be assessed. Self-harm data was derived from an audit of 2 years' emergency hospital presentations (n = 136) in a New Zealand town (population ~50,000). Study area was defined by residential (housing) land parcels representing a finite set of possible point addresses. Area-based deprivation was spatially correlated. Accounting for deprivation and distance bias showed evidence for clustering of DSH for spatial scales up to 500 m with a one-sided 95 % CI, suggesting that social contagion may be present for this urban cohort. Many problems involve finite locations in geographic space that require estimates of distance-based clustering at many scales. A Monte-Carlo approach to Ripley's K, incorporating covariates and models for distance bias, are crucial when assessing health-related clustering. The case study showed that social network structure defined at the neighbourhood level may account for aspects of neighbourhood clustering of DSH. Accounting for covariate measures that exhibit spatial clustering, such as deprivation, are crucial when assessing point-based clustering.
Adamczak, Rafal; Meller, Jarek
2016-12-28
Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust . uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs.
NASA Astrophysics Data System (ADS)
Quang Nguyen, Sang; Kong, Hyung Yun
2016-11-01
In this article, the presence of multi-hop relaying, eavesdropper and co-channel interference (CCI) in the same system model is investigated. Specifically, the effect of CCI on a secured multi-hop relaying network is studied, in which the source communicates with the destination via multi-relay-hopping under the presence of an eavesdropper and CCI at each node. The optimal relay at each cluster is selected to help forward the message from the source to the destination. We apply two relay selection approaches to such a system model, i.e. the optimal relay is chosen based on (1) the maximum channel gain from the transmitter to all relays in the desired cluster and (2) the minimum channel gain from the eavesdropper to all relays in each cluster. For the performance evaluation and comparison, we derived the exact closed form of the secrecy outage probability of the two approaches. That analysis is verified by Monte Carlo simulation. Finally, the effects of the number of hops, the transmit power at the source, relays and the external sources, the distance between the external sources and each node in the system, and the location of the eavesdropper are presented and discussed.
A possibilistic approach to clustering
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Keller, James M.
1993-01-01
Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
Medium-induced change of the optical response of metal clusters in rare-gas matrices
NASA Astrophysics Data System (ADS)
Xuan, Fengyuan; Guet, Claude
2017-10-01
Interaction with the surrounding medium modifies the optical response of embedded metal clusters. For clusters from about ten to a few hundreds of silver atoms, embedded in rare-gas matrices, we study the environment effect within the matrix random phase approximation with exact exchange (RPAE) quantum approach, which has proved successful for free silver clusters. The polarizable surrounding medium screens the residual two-body RPAE interaction, adds a polarization term to the one-body potential, and shifts the vacuum energy of the active delocalized valence electrons. Within this model, we calculate the dipole oscillator strength distribution for Ag clusters embedded in helium droplets, neon, argon, krypton, and xenon matrices. The main contribution to the dipole surface plasmon red shift originates from the rare-gas polarization screening of the two-body interaction. The large size limit of the dipole surface plasmon agrees well with the classical prediction.
Chiu, Ya-Fang; Sugden, Arthur U.
2017-01-01
Genetic elements that replicate extrachromosomally are rare in mammals; however, several human tumor viruses, including the papillomaviruses and the gammaherpesviruses, maintain their plasmid genomes by tethering them to cellular chromosomes. We have uncovered an unprecedented mechanism of viral replication: Kaposi’s sarcoma–associated herpesvirus (KSHV) stably clusters its genomes across generations to maintain itself extrachromosomally. To identify and characterize this mechanism, we developed two complementary, independent approaches: live-cell imaging and a predictive computational model. The clustering of KSHV requires the viral protein, LANA1, to bind viral genomes to nucleosomes arrayed on both cellular and viral DNA. Clustering affects both viral partitioning and viral genome numbers of KSHV. The clustering of KSHV plasmids provides it with an effective evolutionary strategy to rapidly increase copy numbers of genomes per cell at the expense of the total numbers of cells infected. PMID:28696226
Chiu, Ya-Fang; Sugden, Arthur U.; Fox, Kathryn; ...
2017-07-10
Genetic elements that replicate extrachromosomally are rare in mammals; however, several human tumor viruses, including the papillomaviruses and the gammaherpesviruses, maintain their plasmid genomes by tethering them to cellular chromosomes. We have uncovered an unprecedented mechanism of viral replication: Kaposi’s sarcoma–associated herpesvirus (KSHV) stably clusters its genomes across generations to maintain itself extrachromosomally. To identify and characterize this mechanism, we developed two complementary, independent approaches: live-cell imaging and a predictive computational model. The clustering of KSHV requires the viral protein, LANA1, to bind viral genomes to nucleosomes arrayed on both cellular and viral DNA. Clustering affects both viral partitioning andmore » viral genome numbers of KSHV. The clustering of KSHV plasmids provides it with an effective evolutionary strategy to rapidly increase copy numbers of genomes per cell at the expense of the total numbers of cells infected.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chiu, Ya-Fang; Sugden, Arthur U.; Fox, Kathryn
Genetic elements that replicate extrachromosomally are rare in mammals; however, several human tumor viruses, including the papillomaviruses and the gammaherpesviruses, maintain their plasmid genomes by tethering them to cellular chromosomes. We have uncovered an unprecedented mechanism of viral replication: Kaposi’s sarcoma–associated herpesvirus (KSHV) stably clusters its genomes across generations to maintain itself extrachromosomally. To identify and characterize this mechanism, we developed two complementary, independent approaches: live-cell imaging and a predictive computational model. The clustering of KSHV requires the viral protein, LANA1, to bind viral genomes to nucleosomes arrayed on both cellular and viral DNA. Clustering affects both viral partitioning andmore » viral genome numbers of KSHV. The clustering of KSHV plasmids provides it with an effective evolutionary strategy to rapidly increase copy numbers of genomes per cell at the expense of the total numbers of cells infected.« less
López-Carr, David; Davis, Jason; Jankowska, Marta; Grant, Laura; López-Carr, Anna Carla; Clark, Matthew
2013-01-01
The relative role of space and place has long been debated in geography. Yet modeling efforts applied to coupled human-natural systems seemingly favor models assuming continuous spatial relationships. We examine the relative importance of placebased hierarchical versus spatial clustering influences in tropical land use/cover change (LUCC). Guatemala was chosen as our study site given its high rural population growth and deforestation in recent decades. We test predictors of 2009 forest cover and forest cover change from 2001-2009 across Guatemala's 331 municipalities and 22 departments using spatial and multi-level statistical models. Our results indicate the emergence of several socio-economic predictors of LUCC regardless of model choice. Hierarchical model results suggest that significant differences exist at the municipal and departmental levels but largely maintain the magnitude and direction of single-level model coefficient estimates. They are also intervention-relevant since policies tend to be applicable to distinct political units rather than to continuous space. Spatial models complement hierarchical approaches by indicating where and to what magnitude significant negative and positive clustering associations emerge. Appreciating the comparative advantages and limitations of spatial and nested models enhances a holistic approach to geographical analysis of tropical LUCC and human-environment interactions. PMID:24013908
Deckersbach, Thilo; Peters, Amy T.; Sylvia, Louisa G.; Gold, Alexandra K.; da Silva Magalhaes, Pedro Vieira; Henry, David B.; Frank, Ellen; Otto, Michael W.; Berk, Michael; Dougherty, Darin D.; Nierenberg, Andrew A.; Miklowitz, David J.
2016-01-01
Background We sought to address how predictors and moderators of psychotherapy for bipolar depression – identified individually in prior analyses – can inform the development of a metric for prospectively classifying treatment outcome in intensive psychotherapy (IP) versus collaborative care (CC) adjunctive to pharmacotherapy in the Systematic Treatment Enhancement Program (STEP-BD) study. Methods We conducted post-hoc analyses on 135 STEP-BD participants using cluster analysis to identify subsets of participants with similar clinical profiles and investigated this combined metric as a moderator and predictor of response to IP. We used agglomerative hierarchical cluster analyses and k-means clustering to determine the content of the clinical profiles. Logistic regression and Cox proportional hazard models were used to evaluate whether the resulting clusters predicted or moderated likelihood of recovery or time until recovery. Results The cluster analysis yielded a two-cluster solution: 1) “less-recurrent/severe” and 2) “chronic/recurrent.” Rates of recovery in IP were similar for less-recurrent/severe and chronic/recurrent participants. Less-recurrent/severe patients were more likely than chronic/recurrent patients to achieve recovery in CC (p = .040, OR = 4.56). IP yielded a faster recovery for chronic/recurrent participants, whereas CC led to recovery sooner in the less-recurrent/severe cluster (p = .034, OR = 2.62). Limitations Cluster analyses require list-wise deletion of cases with missing data so we were unable to conduct analyses on all STEP-BD participants. Conclusions A well-powered, parametric approach can distinguish patients based on illness history and provide clinicians with symptom profiles of patients that confer differential prognosis in CC vs. IP. PMID:27289316
Cluster ensemble based on Random Forests for genetic data.
Alhusain, Luluah; Hafez, Alaaeldin M
2017-01-01
Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.
Photoabsorption spectra of small HeN+ clusters (N = 3, 4, 10). A quantum Monte Carlo modeling
NASA Astrophysics Data System (ADS)
Ćosić, Rajko; Karlický, František; Kalus, René
2018-05-01
Photoabsorption cross-sections have been calculated for HeN+ clusters of selected sizes (N = 3, 4, 10) over a broad range of photon energies (Ephot = 2 - 14 eV) and compared with available experimental data. Semiempirical electronic Hamiltonians derived from the diatomics-in-molecules approach have been used for electronic structure calculations and a quantum, path-integral Monte Carlo method has been employed to model the delocalization of helium nuclei. While a quantitative agreement has been achieved between the theory and experiment for He3+ and He4+, only qualitative correspondence is seen for He10+ .
A genetic graph-based approach for partitional clustering.
Menéndez, Héctor D; Barrero, David F; Camacho, David
2014-05-01
Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.
NASA Astrophysics Data System (ADS)
Kim, SungKun; Lee, Hunpyo
2017-06-01
Via a dynamical cluster approximation with N c = 4 in combination with a semiclassical approximation (DCA+SCA), we study the doped two-dimensional Hubbard model. We obtain a plaquette antiferromagnetic (AF) Mott insulator, a plaquette AF ordered metal, a pseudogap (or d-wave superconductor) and a paramagnetic metal by tuning the doping concentration. These features are similar to the behaviors observed in copper-oxide superconductors and are in qualitative agreement with the results calculated by the cluster dynamical mean field theory with the continuous-time quantum Monte Carlo (CDMFT+CTQMC) approach. The results of our DCA+SCA differ from those of the CDMFT+CTQMC approach in that the d-wave superconducting order parameters are shown even in the high doped region, unlike the results of the CDMFT+CTQMC approach. We think that the strong plaquette AF orderings in the dynamical cluster approximation (DCA) with N c = 4 suppress superconducting states with increasing doping up to strongly doped region, because frozen dynamical fluctuations in a semiclassical approximation (SCA) approach are unable to destroy those orderings. Our calculation with short-range spatial fluctuations is initial research, because the SCA can manage long-range spatial fluctuations in feasible computational times beyond the CDMFT+CTQMC tool. We believe that our future DCA+SCA calculations should supply information on the fully momentum-resolved physical properties, which could be compared with the results measured by angle-resolved photoemission spectroscopy experiments.
Mwangi, Benson; Soares, Jair C; Hasan, Khader M
2014-10-30
Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
Nuclear Potential Clustering As a New Tool to Detect Patterns in High Dimensional Datasets
NASA Astrophysics Data System (ADS)
Tonkova, V.; Paulus, D.; Neeb, H.
2013-02-01
We present a new approach for the clustering of high dimensional data without prior assumptions about the structure of the underlying distribution. The proposed algorithm is based on a concept adapted from nuclear physics. To partition the data, we model the dynamic behaviour of nucleons interacting in an N-dimensional space. An adaptive nuclear potential, comprised of a short-range attractive (strong interaction) and a long-range repulsive term (Coulomb force) is assigned to each data point. By modelling the dynamics, nucleons that are densely distributed in space fuse to build nuclei (clusters) whereas single point clusters repel each other. The formation of clusters is completed when the system reaches the state of minimal potential energy. The data are then grouped according to the particles' final effective potential energy level. The performance of the algorithm is tested with several synthetic datasets showing that the proposed method can robustly identify clusters even when complex configurations are present. Furthermore, quantitative MRI data from 43 multiple sclerosis patients were analyzed, showing a reasonable splitting into subgroups according to the individual patients' disease grade. The good performance of the algorithm on such highly correlated non-spherical datasets, which are typical for MRI derived image features, shows that Nuclear Potential Clustering is a valuable tool for automated data analysis, not only in the MRI domain.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Govind, Niranjan; Sushko, Petr V.; Hess, Wayne P.
2009-03-05
We present a study of the electronic excitations in insulating materials using an embedded- cluster method. The excited states of the embedded cluster are studied systematically using time-dependent density functional theory (TDDFT) and high-level equation-of-motion coupled cluster (EOMCC) methods. In particular, we have used EOMCC models with singles and doubles (EOMCCSD) and two approaches which account for the e®ect of triply excited con¯gurations in non-iterative and iterative fashions. We present calculations of the lowest surface excitations of the well-studied potassium bromide (KBr) system and compare our results with experiment. The bulk-surface exciton shift is also calculated at the TDDFT levelmore » and compared with experiment.« less
Reaction-diffusion basis of retroviral infectivity
NASA Astrophysics Data System (ADS)
Sadiq, S. Kashif
2016-11-01
Retrovirus particle (virion) infectivity requires diffusion and clustering of multiple transmembrane envelope proteins (Env3) on the virion exterior, yet is triggered by protease-dependent degradation of a partially occluding, membrane-bound Gag polyprotein lattice on the virion interior. The physical mechanism underlying such coupling is unclear and only indirectly accessible via experiment. Modelling stands to provide insight but the required spatio-temporal range far exceeds current accessibility by all-atom or even coarse-grained molecular dynamics simulations. Nor do such approaches account for chemical reactions, while conversely, reaction kinetics approaches handle neither diffusion nor clustering. Here, a recently developed multiscale approach is considered that applies an ultra-coarse-graining scheme to treat entire proteins at near-single particle resolution, but which also couples chemical reactions with diffusion and interactions. A model is developed of Env3 molecules embedded in a truncated Gag lattice composed of membrane-bound matrix proteins linked to capsid subunits, with freely diffusing protease molecules. Simulations suggest that in the presence of Gag but in the absence of lateral lattice-forming interactions, Env3 diffuses comparably to Gag-absent Env3. Initial immobility of Env3 is conferred through lateral caging by matrix trimers vertically coupled to the underlying hexameric capsid layer. Gag cleavage by protease vertically decouples the matrix and capsid layers, induces both matrix and Env3 diffusion, and permits Env3 clustering. Spreading across the entire membrane surface reduces crowding, in turn, enhancing the effect and promoting infectivity. This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'.
Wang, Juan; Nishikawa, Robert M; Yang, Yongyi
2016-01-01
In computer-aided detection of microcalcifications (MCs), the detection accuracy is often compromised by frequent occurrence of false positives (FPs), which can be attributed to a number of factors, including imaging noise, inhomogeneity in tissue background, linear structures, and artifacts in mammograms. In this study, the authors investigated a unified classification approach for combating the adverse effects of these heterogeneous factors for accurate MC detection. To accommodate FPs caused by different factors in a mammogram image, the authors developed a classification model to which the input features were adapted according to the image context at a detection location. For this purpose, the input features were defined in two groups, of which one group was derived from the image intensity pattern in a local neighborhood of a detection location, and the other group was used to characterize how a MC is different from its structural background. Owing to the distinctive effect of linear structures in the detector response, the authors introduced a dummy variable into the unified classifier model, which allowed the input features to be adapted according to the image context at a detection location (i.e., presence or absence of linear structures). To suppress the effect of inhomogeneity in tissue background, the input features were extracted from different domains aimed for enhancing MCs in a mammogram image. To demonstrate the flexibility of the proposed approach, the authors implemented the unified classifier model by two widely used machine learning algorithms, namely, a support vector machine (SVM) classifier and an Adaboost classifier. In the experiment, the proposed approach was tested for two representative MC detectors in the literature [difference-of-Gaussians (DoG) detector and SVM detector]. The detection performance was assessed using free-response receiver operating characteristic (FROC) analysis on a set of 141 screen-film mammogram (SFM) images (66 cases) and a set of 188 full-field digital mammogram (FFDM) images (95 cases). The FROC analysis results show that the proposed unified classification approach can significantly improve the detection accuracy of two MC detectors on both SFM and FFDM images. Despite the difference in performance between the two detectors, the unified classifiers can reduce their FP rate to a similar level in the output of the two detectors. In particular, with true-positive rate at 85%, the FP rate on SFM images for the DoG detector was reduced from 1.16 to 0.33 clusters/image (unified SVM) and 0.36 clusters/image (unified Adaboost), respectively; similarly, for the SVM detector, the FP rate was reduced from 0.45 clusters/image to 0.30 clusters/image (unified SVM) and 0.25 clusters/image (unified Adaboost), respectively. Similar FP reduction results were also achieved on FFDM images for the two MC detectors. The proposed unified classification approach can be effective for discriminating MCs from FPs caused by different factors (such as MC-like noise patterns and linear structures) in MC detection. The framework is general and can be applicable for further improving the detection accuracy of existing MC detectors.
Evolvement of preformation probability of alpha cluster decay of parent nuclei 84≤Z≤92 having N=126
NASA Astrophysics Data System (ADS)
Kaur, Rupinder; Singh, Bir Bikram; Kaur, Mandeep; Sandhu, B. S.; Kaur, Maninder
2018-05-01
The preformed cluster decay model (PCM) based on collective clusterisation approach of quantum mechanical fragmentation theory (QMFT) has been applied to study the ground state decay of trans-lead parent nuclei 84≤Z≤92 with N=126 emitting α cluster. Within PCM, the α cluster is assumed to be preborn with certain preformation probability P0α before tunneling the potential barrier with penetrability Pα. The nuclear structure information of the emitted α cluster is carried out by P0α . The present work reveals that the relative P0α found to increase as the Z number of parent nuclei moves away from magic proton shell closure i.e. Z=82. It is observed that Pα also increases, consequently, shorter half life T1/2 α of α cluster decay of parent nuclei with increasing Z. The PCM calculated results for the T1/2 α of parent nuclei under study are very well compared with available experimental data.
Longo, Dario Livio; Dastrù, Walter; Consolino, Lorena; Espak, Miklos; Arigoni, Maddalena; Cavallo, Federica; Aime, Silvio
2015-07-01
The objective of this study was to compare a clustering approach to conventional analysis methods for assessing changes in pharmacokinetic parameters obtained from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) during antiangiogenic treatment in a breast cancer model. BALB/c mice bearing established transplantable her2+ tumors were treated with a DNA-based antiangiogenic vaccine or with an empty plasmid (untreated group). DCE-MRI was carried out by administering a dose of 0.05 mmol/kg of Gadocoletic acid trisodium salt, a Gd-based blood pool contrast agent (CA) at 1T. Changes in pharmacokinetic estimates (K(trans) and vp) in a nine-day interval were compared between treated and untreated groups on a voxel-by-voxel analysis. The tumor response to therapy was assessed by a clustering approach and compared with conventional summary statistics, with sub-regions analysis and with histogram analysis. Both the K(trans) and vp estimates, following blood-pool CA injection, showed marked and spatial heterogeneous changes with antiangiogenic treatment. Averaged values for the whole tumor region, as well as from the rim/core sub-regions analysis were unable to assess the antiangiogenic response. Histogram analysis resulted in significant changes only in the vp estimates (p<0.05). The proposed clustering approach depicted marked changes in both the K(trans) and vp estimates, with significant spatial heterogeneity in vp maps in response to treatment (p<0.05), provided that DCE-MRI data are properly clustered in three or four sub-regions. This study demonstrated the value of cluster analysis applied to pharmacokinetic DCE-MRI parametric maps for assessing tumor response to antiangiogenic therapy. Copyright © 2015 Elsevier Inc. All rights reserved.
Delineation of gravel-bed clusters via factorial kriging
NASA Astrophysics Data System (ADS)
Wu, Fu-Chun; Wang, Chi-Kuei; Huang, Guo-Hao
2018-05-01
Gravel-bed clusters are the most prevalent microforms that affect local flows and sediment transport. A growing consensus is that the practice of cluster delineation should be based primarily on bed topography rather than grain sizes. Here we present a novel approach for cluster delineation using patch-scale high-resolution digital elevation models (DEMs). We use a geostatistical interpolation method, i.e., factorial kriging, to decompose the short- and long-range (grain- and microform-scale) DEMs. The required parameters are determined directly from the scales of the nested variograms. The short-range DEM exhibits a flat bed topography, yet individual grains are sharply outlined, making the short-range DEM a useful aid for grain segmentation. The long-range DEM exhibits a smoother topography than the original full DEM, yet groupings of particles emerge as small-scale bedforms, making the contour percentile levels of the long-range DEM a useful tool for cluster identification. Individual clusters are delineated using the segmented grains and identified clusters via a range of contour percentile levels. Our results reveal that the density and total area of delineated clusters decrease with increasing contour percentile level, while the mean grain size of clusters and average size of anchor clast (i.e., the largest particle in a cluster) increase with the contour percentile level. These results support the interpretation that larger particles group as clusters and protrude higher above the bed than other smaller grains. A striking feature of the delineated clusters is that anchor clasts are invariably greater than the D90 of the grain sizes even though a threshold anchor size was not adopted herein. The average areal fractal dimensions (Hausdorff-Besicovich dimensions of the projected areas) of individual clusters, however, demonstrate that clusters delineated with different contour percentile levels exhibit similar planform morphologies. Comparisons with a compilation of existing field data show consistency with the cluster properties documented in a wide variety of settings. This study thus points toward a promising, alternative DEM-based approach to characterizing sediment structures in gravel-bed rivers.
NASA Astrophysics Data System (ADS)
Granade, Christopher; Wiebe, Nathan
2017-08-01
A major challenge facing existing sequential Monte Carlo methods for parameter estimation in physics stems from the inability of existing approaches to robustly deal with experiments that have different mechanisms that yield the results with equivalent probability. We address this problem here by proposing a form of particle filtering that clusters the particles that comprise the sequential Monte Carlo approximation to the posterior before applying a resampler. Through a new graphical approach to thinking about such models, we are able to devise an artificial-intelligence based strategy that automatically learns the shape and number of the clusters in the support of the posterior. We demonstrate the power of our approach by applying it to randomized gap estimation and a form of low circuit-depth phase estimation where existing methods from the physics literature either exhibit much worse performance or even fail completely.
Origins and modeling of many-body exchange effects in van der Waals clusters
NASA Astrophysics Data System (ADS)
Chałasiński, Grzegorz; Rak, Janusz; Szcześniak, Małgorzata M.; Cybulski, sławomir M.
1997-02-01
We analyze the many-body exchange interactions in atomic and molecular clusters as they arise in the supermolecular SCF and MP2 approaches. A rigorous formal setting is provided by the symmetry-adapted perturbation theory. Particular emphasis is put on the decomposition into the single exchange (SE) and triple exchange (TE) terms, at the SCF and correlated levels. We also propose a novel approach, whereby selected SE nonadditive exchange terms are evaluated indirectly, as differences of the two-body SAPT corrections arising between the components of the trimer treated as a complex of a dimer and a monomer (pseudodimer approach). This provides additional insights into the nature of various nonadditive effects, an interpretation of supermolecular interaction energies, and may serve as a viable alternative for the calculation of some SE terms.
Comparison of four statistical and machine learning methods for crash severity prediction.
Iranitalab, Amirfarrokh; Khattak, Aemal
2017-11-01
Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.
Discrete bivariate population balance modelling of heteroaggregation processes.
Rollié, Sascha; Briesen, Heiko; Sundmacher, Kai
2009-08-15
Heteroaggregation in binary particle mixtures was simulated with a discrete population balance model in terms of two internal coordinates describing the particle properties. The considered particle species are of different size and zeta-potential. Property space is reduced with a semi-heuristic approach to enable an efficient solution. Aggregation rates are based on deterministic models for Brownian motion and stability, under consideration of DLVO interaction potentials. A charge-balance kernel is presented, relating the electrostatic surface potential to the property space by a simple charge balance. Parameter sensitivity with respect to the fractal dimension, aggregate size, hydrodynamic correction, ionic strength and absolute particle concentration was assessed. Results were compared to simulations with the literature kernel based on geometric coverage effects for clusters with heterogeneous surface properties. In both cases electrostatic phenomena, which dominate the aggregation process, show identical trends: impeded cluster-cluster aggregation at low particle mixing ratio (1:1), restabilisation at high mixing ratios (100:1) and formation of complex clusters for intermediate ratios (10:1). The particle mixing ratio controls the surface coverage extent of the larger particle species. Simulation results are compared to experimental flow cytometric data and show very satisfactory agreement.
Assessing environmental quality: the implications for social justice
Individuals experience simultaneous exposure to pollutants and social factors, which cluster to affect human health outcomes. The optimal approach to combining these factors is unknown, therefore we developed a method to model simultaneous exposure using criteria air pollutants, ...
The development of structure in the expanding universe
NASA Technical Reports Server (NTRS)
Silk, J.; White, S. D.
1978-01-01
A model for clustering in an expanding universe is developed based on an application of the coagulation equation to the collision and aggregation of bound condensations. While the growth rate of clustering is determined by the rate at which density fluctuations reach the nonlinear regime and therefore depends on the initial fluctuation spectrum, the mass spectrum rapidly approaches a self-similar limiting form. This form is determined by the tidal processes which lead to the merging of condensations, and is not dependent on initial conditions.
Dissipation and Rheology of Sheared Soft-Core Frictionless Disks Below Jamming
NASA Astrophysics Data System (ADS)
Vâgberg, Daniel; Olsson, Peter; Teitel, S.
2014-05-01
We use numerical simulations to investigate the effect that different models of energy dissipation have on the rheology of soft-core frictionless disks, below jamming in two dimensions. We find that it is not necessarily the mass of the particles that determines whether a system has Bagnoldian or Newtonian rheology, but rather the presence or absence of large connected clusters of particles. We demonstrate the key role that tangential dissipation plays in the formation of such clusters and in several models find a transition from Bagnoldian to Newtonian rheology as the packing fraction ϕ is varied. For each model, we show that appropriately scaled rheology curves approach a well defined limit as the mass of the particles decreases and collisions become strongly inelastic.
NASA Astrophysics Data System (ADS)
Kamer, Yavor; Ouillon, Guy; Sornette, Didier; Wössner, Jochen
2014-05-01
We present applications of a new clustering method for fault network reconstruction based on the spatial distribution of seismicity. Unlike common approaches that start from the simplest large scale and gradually increase the complexity trying to explain the small scales, our method uses a bottom-up approach, by an initial sampling of the small scales and then reducing the complexity. The new approach also exploits the location uncertainty associated with each event in order to obtain a more accurate representation of the spatial probability distribution of the seismicity. For a given dataset, we first construct an agglomerative hierarchical cluster (AHC) tree based on Ward's minimum variance linkage. Such a tree starts out with one cluster and progressively branches out into an increasing number of clusters. To atomize the structure into its constitutive protoclusters, we initialize a Gaussian Mixture Modeling (GMM) at a given level of the hierarchical clustering tree. We then let the GMM converge using an Expectation Maximization (EM) algorithm. The kernels that become ill defined (less than 4 points) at the end of the EM are discarded. By incrementing the number of initialization clusters (by atomizing at increasingly populated levels of the AHC tree) and repeating the procedure above, we are able to determine the maximum number of Gaussian kernels the structure can hold. The kernels in this configuration constitute our protoclusters. In this setting, merging of any pair will lessen the likelihood (calculated over the pdf of the kernels) but in turn will reduce the model's complexity. The information loss/gain of any possible merging can thus be quantified based on the Minimum Description Length (MDL) principle. Similar to an inter-distance matrix, where the matrix element di,j gives the distance between points i and j, we can construct a MDL gain/loss matrix where mi,j gives the information gain/loss resulting from the merging of kernels i and j. Based on this matrix, merging events resulting in MDL gain are performed in descending order until no gainful merging is possible anymore. We envision that the results of this study could lead to a better understanding of the complex interactions within the Californian fault system and hopefully use the acquired insights for earthquake forecasting.
NASA Astrophysics Data System (ADS)
Kaiser, Olga; Martius, Olivia; Horenko, Illia
2017-04-01
Regression based Generalized Pareto Distribution (GPD) models are often used to describe the dynamics of hydrological threshold excesses relying on the explicit availability of all of the relevant covariates. But, in real application the complete set of relevant covariates might be not available. In this context, it was shown that under weak assumptions the influence coming from systematically missing covariates can be reflected by a nonstationary and nonhomogenous dynamics. We present a data-driven, semiparametric and an adaptive approach for spatio-temporal regression based clustering of threshold excesses in a presence of systematically missing covariates. The nonstationary and nonhomogenous behavior of threshold excesses is describes by a set of local stationary GPD models, where the parameters are expressed as regression models, and a non-parametric spatio-temporal hidden switching process. Exploiting nonparametric Finite Element time-series analysis Methodology (FEM) with Bounded Variation of the model parameters (BV) for resolving the spatio-temporal switching process, the approach goes beyond strong a priori assumptions made is standard latent class models like Mixture Models and Hidden Markov Models. Additionally, the presented FEM-BV-GPD provides a pragmatic description of the corresponding spatial dependence structure by grouping together all locations that exhibit similar behavior of the switching process. The performance of the framework is demonstrated on daily accumulated precipitation series over 17 different locations in Switzerland from 1981 till 2013 - showing that the introduced approach allows for a better description of the historical data.
Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla
2013-12-01
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
Kim, Seonah; Robichaud, David J; Beckham, Gregg T; Paton, Robert S; Nimlos, Mark R
2015-04-16
Dehydration over acidic zeolites is an important reaction class for the upgrading of biomass pyrolysis vapors to hydrocarbon fuels or to precursors for myriad chemical products. Here, we examine the dehydration of ethanol at a Brønsted acid site, T12, found in HZSM-5 using density functional theory (DFT). The geometries of both cluster and mixed quantum mechanics/molecular mechanics (QM:MM) models are prepared from the ZSM-5 crystal structure. Comparisons between these models and different DFT methods are conducted to show similar results among the models and methods used. Inclusion of the full catalyst cavity through a QM:MM approach is found to be important, since activation barriers are computed on average as 7 kcal mol(-1) lower than those obtained with a smaller cluster model. Two different pathways, concerted and stepwise, have been considered when examining dehydration and deprotonation steps. The current study shows that a concerted dehydration process is possible with a lower (4-5 kcal mol(-1)) activation barrier while previous literature studies have focused on a stepwise mechanism. Overall, this work demonstrates that fairly high activation energies (∼50 kcal mol(-1)) are required for ethanol dehydration. A concerted mechanism is favored over a stepwise mechanism because charge separation in the transition state is minimized. QM:MM approaches appear to provide superior results to cluster calculations due to a more accurate representation of charges on framework oxygen atoms.
NASA Astrophysics Data System (ADS)
Ban, Sang-Woo; Lee, Minho
2008-04-01
Knowledge-based clustering and autonomous mental development remains a high priority research topic, among which the learning techniques of neural networks are used to achieve optimal performance. In this paper, we present a new framework that can automatically generate a relevance map from sensory data that can represent knowledge regarding objects and infer new knowledge about novel objects. The proposed model is based on understating of the visual what pathway in our brain. A stereo saliency map model can selectively decide salient object areas by additionally considering local symmetry feature. The incremental object perception model makes clusters for the construction of an ontology map in the color and form domains in order to perceive an arbitrary object, which is implemented by the growing fuzzy topology adaptive resonant theory (GFTART) network. Log-polar transformed color and form features for a selected object are used as inputs of the GFTART. The clustered information is relevant to describe specific objects, and the proposed model can automatically infer an unknown object by using the learned information. Experimental results with real data have demonstrated the validity of this approach.
Exploring multicollinearity using a random matrix theory approach.
Feher, Kristen; Whelan, James; Müller, Samuel
2012-01-01
Clustering of gene expression data is often done with the latent aim of dimension reduction, by finding groups of genes that have a common response to potentially unknown stimuli. However, what is poorly understood to date is the behaviour of a low dimensional signal embedded in high dimensions. This paper introduces a multicollinear model which is based on random matrix theory results, and shows potential for the characterisation of a gene cluster's correlation matrix. This model projects a one dimensional signal into many dimensions and is based on the spiked covariance model, but rather characterises the behaviour of the corresponding correlation matrix. The eigenspectrum of the correlation matrix is empirically examined by simulation, under the addition of noise to the original signal. The simulation results are then used to propose a dimension estimation procedure of clusters from data. Moreover, the simulation results warn against considering pairwise correlations in isolation, as the model provides a mechanism whereby a pair of genes with `low' correlation may simply be due to the interaction of high dimension and noise. Instead, collective information about all the variables is given by the eigenspectrum.
Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks.
Wang, Zhaowei; Zeng, Peng; Zhou, Mingtuo; Li, Dong; Wang, Jintao
2017-01-13
Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs), and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs' demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS) method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays.
Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks †
Wang, Zhaowei; Zeng, Peng; Zhou, Mingtuo; Li, Dong; Wang, Jintao
2017-01-01
Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs), and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs’ demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS) method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays. PMID:28098750
On hierarchical solutions to the BBGKY hierarchy
NASA Technical Reports Server (NTRS)
Hamilton, A. J. S.
1988-01-01
It is thought that the gravitational clustering of galaxies in the universe may approach a scale-invariant, hierarchical form in the small separation, large-clustering regime. Past attempts to solve the Born-Bogoliubov-Green-Kirkwood-Yvon (BBGKY) hierarchy in this regime have assumed a certain separable hierarchical form for the higher order correlation functions of galaxies in phase space. It is shown here that such separable solutions to the BBGKY equations must satisfy the condition that the clustered component of the solution has cluster-cluster correlations equal to galaxy-galaxy correlations to all orders. The solutions also admit the presence of an arbitrary unclustered component, which plays no dyamical role in the large-clustering regime. These results are a particular property of the specific separable model assumed for the correlation functions in phase space, not an intrinsic property of spatially hierarchical solutions to the BBGKY hierarchy. The observed distribution of galaxies does not satisfy the required conditions. The disagreement between theory and observation may be traced, at least in part, to initial conditions which, if Gaussian, already have cluster correlations greater than galaxy correlations.
Torres, Edmanuel; DiLabio, Gino A
2013-08-13
Large clusters of noncovalently bonded molecules can only be efficiently modeled by classical mechanics simulations. One prominent challenge associated with this approach is obtaining force-field parameters that accurately describe noncovalent interactions. High-level correlated wave function methods, such as CCSD(T), are capable of correctly predicting noncovalent interactions, and are widely used to produce reference data. However, high-level correlated methods are generally too computationally costly to generate the critical reference data required for good force-field parameter development. In this work we present an approach to generate Lennard-Jones force-field parameters to accurately account for noncovalent interactions. We propose the use of a computational step that is intermediate to CCSD(T) and classical molecular mechanics, that can bridge the accuracy and computational efficiency gap between them, and demonstrate the efficacy of our approach with methane clusters. On the basis of CCSD(T)-level binding energy data for a small set of methane clusters, we develop methane-specific, atom-centered, dispersion-correcting potentials (DCPs) for use with the PBE0 density-functional and 6-31+G(d,p) basis sets. We then use the PBE0-DCP approach to compute a detailed map of the interaction forces associated with the removal of a single methane molecule from a cluster of eight methane molecules and use this map to optimize the Lennard-Jones parameters for methane. The quality of the binding energies obtained by the Lennard-Jones parameters we obtained is assessed on a set of methane clusters containing from 2 to 40 molecules. Our Lennard-Jones parameters, used in combination with the intramolecular parameters of the CHARMM force field, are found to closely reproduce the results of our dispersion-corrected density-functional calculations. The approach outlined can be used to develop Lennard-Jones parameters for any kind of molecular system.
Model-based Clustering of High-Dimensional Data in Astrophysics
NASA Astrophysics Data System (ADS)
Bouveyron, C.
2016-05-01
The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model-based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.
Stochastic competitive learning in complex networks.
Silva, Thiago Christiano; Zhao, Liang
2012-03-01
Competitive learning is an important machine learning approach which is widely employed in artificial neural networks. In this paper, we present a rigorous definition of a new type of competitive learning scheme realized on large-scale networks. The model consists of several particles walking within the network and competing with each other to occupy as many nodes as possible, while attempting to reject intruder particles. The particle's walking rule is composed of a stochastic combination of random and preferential movements. The model has been applied to solve community detection and data clustering problems. Computer simulations reveal that the proposed technique presents high precision of community and cluster detections, as well as low computational complexity. Moreover, we have developed an efficient method for estimating the most likely number of clusters by using an evaluator index that monitors the information generated by the competition process itself. We hope this paper will provide an alternative way to the study of competitive learning..
Re-entrant phase behavior for systems with competition between phase separation and self-assembly
NASA Astrophysics Data System (ADS)
Reinhardt, Aleks; Williamson, Alexander J.; Doye, Jonathan P. K.; Carrete, Jesús; Varela, Luis M.; Louis, Ard A.
2011-03-01
In patchy particle systems where there is a competition between the self-assembly of finite clusters and liquid-vapor phase separation, re-entrant phase behavior can be observed, with the system passing from a monomeric vapor phase to a region of liquid-vapor phase coexistence and then to a vapor phase of clusters as the temperature is decreased at constant density. Here, we present a classical statistical mechanical approach to the determination of the complete phase diagram of such a system. We model the system as a van der Waals fluid, but one where the monomers can assemble into monodisperse clusters that have no attractive interactions with any of the other species. The resulting phase diagrams show a clear region of re-entrance. However, for the most physically reasonable parameter values of the model, this behavior is restricted to a certain range of density, with phase separation still persisting at high densities.
Martin, Jean-Charles; Berton, Amélie; Ginies, Christian; Bott, Romain; Scheercousse, Pierre; Saddi, Alessandra; Gripois, Daniel; Landrier, Jean-François; Dalemans, Daniel; Alessi, Marie-Christine; Delplanque, Bernadette
2015-09-01
We assessed the atheroprotective efficiency of modified dairy fats in hyperlipidemic hamsters. A systems biology approach was implemented to reveal and quantify the dietary fat-related components of the disease. Three modified dairy fats (40% energy) were prepared from regular butter by mixing with a plant oil mixture, by removing cholesterol alone, or by removing cholesterol in combination with reducing saturated fatty acids. A plant oil mixture and a regular butter were used as control diets. The atherosclerosis severity (aortic cholesteryl-ester level) was higher in the regular butter-fed hamsters than in the other four groups (P < 0.05). Eighty-seven of the 1,666 variables measured from multiplatform analysis were found to be strongly associated with the disease. When aggregated into 10 biological clusters combined into a multivariate predictive equation, these 87 variables explained 81% of the disease variability. The biological cluster "regulation of lipid transport and metabolism" appeared central to atherogenic development relative to diets. The "vitamin E metabolism" cluster was the main driver of atheroprotection with the best performing transformed dairy fat. Under conditions that promote atherosclerosis, the impact of dairy fats on atherogenesis could be greatly ameliorated by technological modifications. Our modeling approach allowed for identifying and quantifying the contribution of complex factors to atherogenic development in each dietary setup. Copyright © 2015 the American Physiological Society.
Alternative Methods for Assessing Mediation in Multilevel Data: The Advantages of Multilevel SEM
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Zhang, Zhen; Zyphur, Michael J.
2011-01-01
Multilevel modeling (MLM) is a popular way of assessing mediation effects with clustered data. Two important limitations of this approach have been identified in prior research and a theoretical rationale has been provided for why multilevel structural equation modeling (MSEM) should be preferred. However, to date, no empirical evidence of MSEM's…
Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks
Mall, Raghvendra; Langone, Rocco; Suykens, Johan A. K.
2014-01-01
Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks. PMID:24949877
NASA Astrophysics Data System (ADS)
Sloan, Gregory James
The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
Semantic Clustering of Search Engine Results
Soliman, Sara Saad; El-Sayed, Maged F.; Hassan, Yasser F.
2015-01-01
This paper presents a novel approach for search engine results clustering that relies on the semantics of the retrieved documents rather than the terms in those documents. The proposed approach takes into consideration both lexical and semantics similarities among documents and applies activation spreading technique in order to generate semantically meaningful clusters. This approach allows documents that are semantically similar to be clustered together rather than clustering documents based on similar terms. A prototype is implemented and several experiments are conducted to test the prospered solution. The result of the experiment confirmed that the proposed solution achieves remarkable results in terms of precision. PMID:26933673
A Stationary Wavelet Entropy-Based Clustering Approach Accurately Predicts Gene Expression
Nguyen, Nha; Vo, An; Choi, Inchan
2015-01-01
Abstract Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation. PMID:25383910
Ren, Jie; Song, Kai; Deng, Minghua; Reinert, Gesine; Cannon, Charles H; Sun, Fengzhu
2016-04-01
Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential.A plausible model for this underlying distribution of word counts is given through modeling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution ,: using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate those using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results ,: and that the clustering results that use a N: MC of the estimated order give a plausible clustering of the species. Our implementation of the statistics developed here is available as R package 'NGS.MC' at http://www-rcf.usc.edu/∼fsun/Programs/NGS-MC/NGS-MC.html fsun@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Malpetti, Daniele; Roscilde, Tommaso
2017-02-01
The mean-field approximation is at the heart of our understanding of complex systems, despite its fundamental limitation of completely neglecting correlations between the elementary constituents. In a recent work [Phys. Rev. Lett. 117, 130401 (2016), 10.1103/PhysRevLett.117.130401], we have shown that in quantum many-body systems at finite temperature, two-point correlations can be formally separated into a thermal part and a quantum part and that quantum correlations are generically found to decay exponentially at finite temperature, with a characteristic, temperature-dependent quantum coherence length. The existence of these two different forms of correlation in quantum many-body systems suggests the possibility of formulating an approximation, which affects quantum correlations only, without preventing the correct description of classical fluctuations at all length scales. Focusing on lattice boson and quantum Ising models, we make use of the path-integral formulation of quantum statistical mechanics to introduce such an approximation, which we dub quantum mean-field (QMF) approach, and which can be readily generalized to a cluster form (cluster QMF or cQMF). The cQMF approximation reduces to cluster mean-field theory at T =0 , while at any finite temperature it produces a family of systematically improved, semi-classical approximations to the quantum statistical mechanics of the lattice theory at hand. Contrary to standard MF approximations, the correct nature of thermal critical phenomena is captured by any cluster size. In the two exemplary cases of the two-dimensional quantum Ising model and of two-dimensional quantum rotors, we study systematically the convergence of the cQMF approximation towards the exact result, and show that the convergence is typically linear or sublinear in the boundary-to-bulk ratio of the clusters as T →0 , while it becomes faster than linear as T grows. These results pave the way towards the development of semiclassical numerical approaches based on an approximate, yet systematically improved account of quantum correlations.
Gholami, Mohammad; Brennan, Robert W
2016-01-06
In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency.
Gholami, Mohammad; Brennan, Robert W.
2016-01-01
In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency. PMID:26751447
Network based approaches reveal clustering in protein point patterns
NASA Astrophysics Data System (ADS)
Parker, Joshua; Barr, Valarie; Aldridge, Joshua; Samelson, Lawrence E.; Losert, Wolfgang
2014-03-01
Recent advances in super-resolution imaging have allowed for the sub-diffraction measurement of the spatial location of proteins on the surfaces of T-cells. The challenge is to connect these complex point patterns to the internal processes and interactions, both protein-protein and protein-membrane. We begin analyzing these patterns by forming a geometric network amongst the proteins and looking at network measures, such the degree distribution. This allows us to compare experimentally observed patterns to models. Specifically, we find that the experimental patterns differ from heterogeneous Poisson processes, highlighting an internal clustering structure. Further work will be to compare our results to simulated protein-protein interactions to determine clustering mechanisms.
Nuclear structure studies performed using the (18O,16O) two-neutron transfer reactions
NASA Astrophysics Data System (ADS)
Carbone, D.; Agodi, C.; Cappuzzello, F.; Cavallaro, M.; Ferreira, J. L.; Foti, A.; Gargano, A.; Lenzi, S. M.; Linares, R.; Lubian, J.; Santagati, G.
2018-02-01
Excitation energy spectra and absolute cross section angular distributions were measured for the 13C(18O,16O)15C two-neutron transfer reaction at 84 MeV incident energy. This reaction selectively populates two-neutron configurations in the states of the residual nucleus. Exact finite-range coupled reaction channel calculations are used to analyse the data. Two approaches are discussed: the extreme cluster and the newly introduced microscopic cluster. The latter makes use of spectroscopic amplitudes in the centre of mass reference frame, derived from shell-model calculations using the Moshinsky transformation brackets. The results describe well the experimental cross section and highlight cluster configurations in the involved wave functions.
NASA Astrophysics Data System (ADS)
Bokhan, Denis; Trubnikov, Dmitrii N.; Perera, Ajith; Bartlett, Rodney J.
2018-04-01
An explicitly-correlated method of calculation of excited states with spin-orbit couplings, has been formulated and implemented. Developed approach utilizes left and right eigenvectors of equation-of-motion coupled-cluster model, which is based on the linearly approximated explicitly correlated coupled-cluster singles and doubles [CCSD(F12)] method. The spin-orbit interactions are introduced by using the spin-orbit mean field (SOMF) approximation of the Breit-Pauli Hamiltonian. Numerical tests for several atoms and molecules show good agreement between explicitly-correlated results and the corresponding values, calculated in complete basis set limit (CBS); the highly-accurate excitation energies can be obtained already at triple- ζ level.
NASA Astrophysics Data System (ADS)
Titantah, John T.; Karttunen, Mikko
2016-05-01
Electronic and optical properties of silver clusters were calculated using two different ab initio approaches: (1) based on all-electron full-potential linearized-augmented plane-wave method and (2) local basis function pseudopotential approach. Agreement is found between the two methods for small and intermediate sized clusters for which the former method is limited due to its all-electron formulation. The latter, due to non-periodic boundary conditions, is the more natural approach to simulate small clusters. The effect of cluster size is then explored using the local basis function approach. We find that as the cluster size increases, the electronic structure undergoes a transition from molecular behavior to nanoparticle behavior at a cluster size of 140 atoms (diameter ~1.7 nm). Above this cluster size the step-like electronic structure, evident as several features in the imaginary part of the polarizability of all clusters smaller than Ag147, gives way to a dominant plasmon peak localized at wavelengths 350 nm ≤ λ ≤ 600 nm. It is, thus, at this length-scale that the conduction electrons' collective oscillations that are responsible for plasmonic resonances begin to dominate the opto-electronic properties of silver nanoclusters.
NASA Astrophysics Data System (ADS)
Mohaghegh, Shahab
2010-05-01
Surrogate Reservoir Model (SRM) is new solution for fast track, comprehensive reservoir analysis (solving both direct and inverse problems) using existing reservoir simulation models. SRM is defined as a replica of the full field reservoir simulation model that runs and provides accurate results in real-time (one simulation run takes only a fraction of a second). SRM mimics the capabilities of a full field model with high accuracy. Reservoir simulation is the industry standard for reservoir management. It is used in all phases of field development in the oil and gas industry. The routine of simulation studies calls for integration of static and dynamic measurements into the reservoir model. Full field reservoir simulation models have become the major source of information for analysis, prediction and decision making. Large prolific fields usually go through several versions (updates) of their model. Each new version usually is a major improvement over the previous version. The updated model includes the latest available information incorporated along with adjustments that usually are the result of single-well or multi-well history matching. As the number of reservoir layers (thickness of the formations) increases, the number of cells representing the model approaches several millions. As the reservoir models grow in size, so does the time that is required for each run. Schemes such as grid computing and parallel processing helps to a certain degree but do not provide the required speed for tasks such as: field development strategies using comprehensive reservoir analysis, solving the inverse problem for injection/production optimization, quantifying uncertainties associated with the geological model and real-time optimization and decision making. These types of analyses require hundreds or thousands of runs. Furthermore, with the new push for smart fields in the oil/gas industry that is a natural growth of smart completion and smart wells, the need for real time reservoir modeling becomes more pronounced. SRM is developed using the state of the art in neural computing and fuzzy pattern recognition to address the ever growing need in the oil and gas industry to perform accurate, but high speed simulation and modeling. Unlike conventional geo-statistical approaches (response surfaces, proxy models …) that require hundreds of simulation runs for development, SRM is developed only with a few (from 10 to 30 runs) simulation runs. SRM can be developed regularly (as new versions of the full field model become available) off-line and can be put online for real-time processing to guide important decisions. SRM has proven its value in the field. An SRM was developed for a giant oil field in the Middle East. The model included about one million grid blocks with more than 165 horizontal wells and took ten hours for a single run on 12 parallel CPUs. Using only 10 simulation runs, an SRM was developed that was able to accurately mimic the behavior of the reservoir simulation model. Performing a comprehensive reservoir analysis that included making millions of SRM runs, wells in the field were divided into five clusters. It was predicted that wells in cluster one & two are best candidates for rate relaxation with minimal, long term water production while wells in clusters four and five are susceptive to high water cuts. Two and a half years and 20 wells later, rate relaxation results from the field proved that all the predictions made by the SRM analysis were correct. While incremental oil production increased in all wells (wells in clusters 1 produced the most followed by wells in cluster 2, 3 …) the percent change in average monthly water cut for wells in each cluster clearly demonstrated the analytic power of SRM. As it was correctly predicted, wells in clusters 1 and 2 actually experience a reduction in water cut while a substantial increase in water cut was observed in wells classified into clusters 4 and 5. Performing these analyses would have been impossible using the original full field simulation model.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
A Fast Implementation of the Isodata Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Le Moigne, Jacqueline; Mount, David M.; Netanyahu, Nathan S.
2007-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to IsoDATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
On Learning Cluster Coefficient of Private Networks
Wang, Yue; Wu, Xintao; Zhu, Jun; Xiang, Yang
2013-01-01
Enabling accurate analysis of social network data while preserving differential privacy has been challenging since graph features such as clustering coefficient or modularity often have high sensitivity, which is different from traditional aggregate functions (e.g., count and sum) on tabular data. In this paper, we treat a graph statistics as a function f and develop a divide and conquer approach to enforce differential privacy. The basic procedure of this approach is to first decompose the target computation f into several less complex unit computations f1, …, fm connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), then perturb the output of each fi with Laplace noise derived from its own sensitivity value and the distributed privacy threshold εi, and finally combine those perturbed fi as the perturbed output of computation f. We examine how various operations affect the accuracy of complex computations. When unit computations have large global sensitivity values, we enforce the differential privacy by calibrating noise based on the smooth sensitivity, rather than the global sensitivity. By doing this, we achieve the strict differential privacy guarantee with smaller magnitude noise. We illustrate our approach by using clustering coefficient, which is a popular statistics used in social network analysis. Empirical evaluations on five real social networks and various synthetic graphs generated from three random graph models show the developed divide and conquer approach outperforms the direct approach. PMID:24429843
NASA Astrophysics Data System (ADS)
Salditch, L.; Brooks, E. M.; Stein, S.; Spencer, B. D.; Campbell, M. R.
2017-12-01
A challenge for earthquake hazard assessment is that geologic records often show large earthquakes occurring in temporal clusters separated by periods of quiescence. For example, in Cascadia, a paleoseismic record going back 10,000 years shows four to five clusters separated by approximately 1,000 year gaps. If we are still in the cluster that began 1700 years ago, a large earthquake is likely to happen soon. If the cluster has ended, a great earthquake is less likely. For a Gaussian distribution of recurrence times, the probability of an earthquake in the next 50 years is six times larger if we are still in the most recent cluster. Earthquake hazard assessments typically employ one of two recurrence models, neither of which directly incorporate clustering. In one, earthquake probability is time-independent and modeled as Poissonian, so an earthquake is equally likely at any time. The fault has no "memory" because when a prior earthquake occurred has no bearing on when the next will occur. The other common model is a time-dependent earthquake cycle in which the probability of an earthquake increases with time until one happens, after which the probability resets to zero. Because the probability is reset after each earthquake, the fault "remembers" only the last earthquake. This approach can be used with any assumed probability density function for recurrence times. We propose an alternative, Long-Term Fault Memory (LTFM), a modified earthquake cycle model where the probability of an earthquake increases with time until one happens, after which it decreases, but not necessarily to zero. Hence the probability of the next earthquake depends on the fault's history over multiple cycles, giving "long-term memory". Physically, this reflects an earthquake releasing only part of the elastic strain stored on the fault. We use the LTFM to simulate earthquake clustering along the San Andreas Fault and Cascadia. In some portions of the simulated earthquake history, events would appear quasiperiodic, while at other times, the events can appear more Poissonian. Hence a given paleoseismic or instrumental record may not reflect the long-term seismicity of a fault, which has important implications for hazard assessment.
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Looping and clustering model for the organization of protein-DNA complexes on the bacterial genome
NASA Astrophysics Data System (ADS)
Walter, Jean-Charles; Walliser, Nils-Ole; David, Gabriel; Dorignac, Jérôme; Geniet, Frédéric; Palmeri, John; Parmeggiani, Andrea; Wingreen, Ned S.; Broedersz, Chase P.
2018-03-01
The bacterial genome is organized by a variety of associated proteins inside a structure called the nucleoid. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the looping and clustering model, which employs a statistical physics approach to describe protein-DNA complexes. The looping and clustering model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins that is organized around a single high-affinity binding site. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and loop closure entropy of this protein-DNA cluster on the one hand, and the positional entropy for placing loops within the cluster on the other. Indeed, we show that the protein interaction strength determines the ‘tightness’ of the loopy protein-DNA complex. Thus, our model provides a theoretical framework for quantitatively computing the binding profiles of ParB-like proteins around a cognate (parS) binding site.
The Quantitative Analysis of Chennai Automotive Industry Cluster
NASA Astrophysics Data System (ADS)
Bhaskaran, Ethirajan
2016-07-01
Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity increase.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.
A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less
NASA Astrophysics Data System (ADS)
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Developing a model for effective leadership in healthcare: a concept mapping approach.
Hargett, Charles William; Doty, Joseph P; Hauck, Jennifer N; Webb, Allison Mb; Cook, Steven H; Tsipis, Nicholas E; Neumann, Julie A; Andolsek, Kathryn M; Taylor, Dean C
2017-01-01
Despite increasing awareness of the importance of leadership in healthcare, our understanding of the competencies of effective leadership remains limited. We used a concept mapping approach (a blend of qualitative and quantitative analysis of group processes to produce a visual composite of the group's ideas) to identify stakeholders' mental model of effective healthcare leadership, clarifying the underlying structure and importance of leadership competencies. Literature review, focus groups, and consensus meetings were used to derive a representative set of healthcare leadership competency statements. Study participants subsequently sorted and rank-ordered these statements based on their perceived importance in contributing to effective healthcare leadership in real-world settings. Hierarchical cluster analysis of individual sortings was used to develop a coherent model of effective leadership in healthcare. A diverse group of 92 faculty and trainees individually rank-sorted 33 leadership competency statements. The highest rated statements were "Acting with Personal Integrity", "Communicating Effectively", "Acting with Professional Ethical Values", "Pursuing Excellence", "Building and Maintaining Relationships", and "Thinking Critically". Combining the results from hierarchical cluster analysis with our qualitative data led to a healthcare leadership model based on the core principle of Patient Centeredness and the core competencies of Integrity, Teamwork, Critical Thinking, Emotional Intelligence, and Selfless Service. Using a mixed qualitative-quantitative approach, we developed a graphical representation of a shared leadership model derived in the healthcare setting. This model may enhance learning, teaching, and patient care in this important area, as well as guide future research.
Gene prioritization and clustering by multi-view text mining
2010-01-01
Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336
Magnification Bias in Gravitational Arc Statistics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caminha, G. B.; Estrada, J.; Makler, M.
2013-08-29
The statistics of gravitational arcs in galaxy clusters is a powerful probe of cluster structure and may provide complementary cosmological constraints. Despite recent progresses, discrepancies still remain among modelling and observations of arc abundance, specially regarding the redshift distribution of strong lensing clusters. Besides, fast "semi-analytic" methods still have to incorporate the success obtained with simulations. In this paper we discuss the contribution of the magnification in gravitational arc statistics. Although lensing conserves surface brightness, the magnification increases the signal-to-noise ratio of the arcs, enhancing their detectability. We present an approach to include this and other observational effects in semi-analyticmore » calculations for arc statistics. The cross section for arc formation ({\\sigma}) is computed through a semi-analytic method based on the ratio of the eigenvalues of the magnification tensor. Using this approach we obtained the scaling of {\\sigma} with respect to the magnification, and other parameters, allowing for a fast computation of the cross section. We apply this method to evaluate the expected number of arcs per cluster using an elliptical Navarro--Frenk--White matter distribution. Our results show that the magnification has a strong effect on the arc abundance, enhancing the fraction of arcs, moving the peak of the arc fraction to higher redshifts, and softening its decrease at high redshifts. We argue that the effect of magnification should be included in arc statistics modelling and that it could help to reconcile arcs statistics predictions with the observational data.« less
Bertamini, Marco; Guest, Martin; Vallortigara, Giorgio; Rugani, Rosa; Regolin, Lucia
2018-04-30
Animals can perceive the numerosity of sets of visual elements. Qualitative and quantitative similarities in different species suggest the existence of a shared system (approximate number system). Biases associated with sensory properties are informative about the underlying mechanisms. In humans, regular spacing increases perceived numerosity (regular-random numerosity illusion). This has led to a model that predicts numerosity based on occupancy (a measure that decreases when elements are close together). We used a procedure in which observers selected one of two stimuli and were given feedback with respect to whether the choice was correct. One configuration had 20 elements and the other 40, randomly placed inside a circular region. Participants had to discover the rule based on feedback. Because density and clustering covaried with numerosity, different dimensions could be used. After reaching a criterion, test trials presented two types of configurations with 30 elements. One type had a larger interelement distance than the other (high or low clustering). If observers had adopted a numerosity strategy, they would choose low clustering (if reinforced with 40) and high clustering (if reinforced with 20). A clustering or density strategy predicts the opposite. Human adults used a numerosity strategy. Chicks were tested using a similar procedure. There were two behavioral measures: first approach response and final circumnavigation (walking behind the screen). The prediction based on numerosity was confirmed by the first approach data. For chicks, one clear pattern from both responses was a preference for the configurations with higher clustering. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Confidence intervals for a difference between lognormal means in cluster randomization trials.
Poirier, Julia; Zou, G Y; Koval, John
2017-04-01
Cluster randomization trials, in which intact social units are randomized to different interventions, have become popular in the last 25 years. Outcomes from these trials in many cases are positively skewed, following approximately lognormal distributions. When inference is focused on the difference between treatment arm arithmetic means, existent confidence interval procedures either make restricting assumptions or are complex to implement. We approach this problem by assuming log-transformed outcomes from each treatment arm follow a one-way random effects model. The treatment arm means are functions of multiple parameters for which separate confidence intervals are readily available, suggesting that the method of variance estimates recovery may be applied to obtain closed-form confidence intervals. A simulation study showed that this simple approach performs well in small sample sizes in terms of empirical coverage, relatively balanced tail errors, and interval widths as compared to existing methods. The methods are illustrated using data arising from a cluster randomization trial investigating a critical pathway for the treatment of community acquired pneumonia.
Segmentation by fusion of histogram-based k-means clusters in different color spaces.
Mignotte, Max
2008-05-01
This paper presents a new, simple, and efficient segmentation approach, based on a fusion procedure which aims at combining several segmentation maps associated to simpler partition models in order to finally get a more reliable and accurate segmentation result. The different label fields to be fused in our application are given by the same and simple (K-means based) clustering technique on an input image expressed in different color spaces. Our fusion strategy aims at combining these segmentation maps with a final clustering procedure using as input features, the local histogram of the class labels, previously estimated and associated to each site and for all these initial partitions. This fusion framework remains simple to implement, fast, general enough to be applied to various computer vision applications (e.g., motion detection and segmentation), and has been successfully applied on the Berkeley image database. The experiments herein reported in this paper illustrate the potential of this approach compared to the state-of-the-art segmentation methods recently proposed in the literature.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vasylkivska, Veronika S.; Huerta, Nicolas J.
Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less
NASA Technical Reports Server (NTRS)
Ponomarev, A. L.; Brenner, D.; Hlatky, L. R.; Sachs, R. K.
2000-01-01
DNA double-strand breaks (DSBs) produced by densely ionizing radiation are not located randomly in the genome: recent data indicate DSB clustering along chromosomes. Stochastic DSB clustering at large scales, from > 100 Mbp down to < 0.01 Mbp, is modeled using computer simulations and analytic equations. A random-walk, coarse-grained polymer model for chromatin is combined with a simple track structure model in Monte Carlo software called DNAbreak and is applied to data on alpha-particle irradiation of V-79 cells. The chromatin model neglects molecular details but systematically incorporates an increase in average spatial separation between two DNA loci as the number of base-pairs between the loci increases. Fragment-size distributions obtained using DNAbreak match data on large fragments about as well as distributions previously obtained with a less mechanistic approach. Dose-response relations, linear at small doses of high linear energy transfer (LET) radiation, are obtained. They are found to be non-linear when the dose becomes so large that there is a significant probability of overlapping or close juxtaposition, along one chromosome, for different DSB clusters from different tracks. The non-linearity is more evident for large fragments than for small. The DNAbreak results furnish an example of the RLC (randomly located clusters) analytic formalism, which generalizes the broken-stick fragment-size distribution of the random-breakage model that is often applied to low-LET data.
Fuzzy Document Clustering Approach using WordNet Lexical Categories
NASA Astrophysics Data System (ADS)
Gharib, Tarek F.; Fouad, Mohammed M.; Aref, Mostafa M.
Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.