A Model-Based Cluster Analysis of Maternal Emotion Regulation and Relations to Parenting Behavior.
Shaffer, Anne; Whitehead, Monica; Davis, Molly; Morelen, Diana; Suveg, Cynthia
2017-10-15
In a diverse community sample of mothers (N = 108) and their preschool-aged children (M age = 3.50 years), this study conducted person-oriented analyses of maternal emotion regulation (ER) based on a multimethod assessment incorporating physiological, observational, and self-report indicators. A model-based cluster analysis was applied to five indicators of maternal ER: maternal self-report, observed negative affect in a parent-child interaction, baseline respiratory sinus arrhythmia (RSA), and RSA suppression across two laboratory tasks. Model-based cluster analyses revealed four maternal ER profiles, including a group of mothers with average ER functioning, characterized by socioeconomic advantage and more positive parenting behavior. A dysregulated cluster demonstrated the greatest challenges with parenting and dyadic interactions. Two clusters of intermediate dysregulation were also identified. Implications for assessment and applications to parenting interventions are discussed. © 2017 Family Process Institute.
Kopelman, Naama M; Mayzel, Jonathan; Jakobsson, Mattias; Rosenberg, Noah A; Mayrose, Itay
2015-09-01
The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. © 2015 John Wiley & Sons Ltd.
Optimal Partitioning of a Data Set Based on the "p"-Median Model
ERIC Educational Resources Information Center
Brusco, Michael J.; Kohn, Hans-Friedrich
2008-01-01
Although the "K"-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The "p"-median model is an especially well-studied clustering problem that requires the selection of "p" objects to serve as…
Ng, Edmond S-W; Diaz-Ordaz, Karla; Grieve, Richard; Nixon, Richard M; Thompson, Simon G; Carpenter, James R
2016-10-01
Multilevel models provide a flexible modelling framework for cost-effectiveness analyses that use cluster randomised trial data. However, there is a lack of guidance on how to choose the most appropriate multilevel models. This paper illustrates an approach for deciding what level of model complexity is warranted; in particular how best to accommodate complex variance-covariance structures, right-skewed costs and missing data. Our proposed models differ according to whether or not they allow individual-level variances and correlations to differ across treatment arms or clusters and by the assumed cost distribution (Normal, Gamma, Inverse Gaussian). The models are fitted by Markov chain Monte Carlo methods. Our approach to model choice is based on four main criteria: the characteristics of the data, model pre-specification informed by the previous literature, diagnostic plots and assessment of model appropriateness. This is illustrated by re-analysing a previous cost-effectiveness analysis that uses data from a cluster randomised trial. We find that the most useful criterion for model choice was the deviance information criterion, which distinguishes amongst models with alternative variance-covariance structures, as well as between those with different cost distributions. This strategy for model choice can help cost-effectiveness analyses provide reliable inferences for policy-making when using cluster trials, including those with missing data. © The Author(s) 2013.
CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets
Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.
2017-01-01
High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787
Xiao, Yongling; Abrahamowicz, Michal
2010-03-30
We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
2007-01-01
including tree- based methods such as the unweighted pair group method of analysis ( UPGMA ) and Neighbour-joining (NJ) (Saitou & Nei, 1987). By...based Bayesian approach and the tree-based UPGMA and NJ cluster- ing methods. The results obtained suggest that far more species occur in the An...unlikely that groups that differ by more than these levels are conspecific. Genetic distances were clustered using the UPGMA and NJ algorithms in MEGA
Old, L.; Wojtak, R.; Mamon, G. A.; ...
2015-03-26
Our paper is the second in a series in which we perform an extensive comparison of various galaxy-based cluster mass estimation techniques that utilize the positions, velocities and colours of galaxies. Our aim is to quantify the scatter, systematic bias and completeness of cluster masses derived from a diverse set of 25 galaxy-based methods using two contrasting mock galaxy catalogues based on a sophisticated halo occupation model and a semi-analytic model. Analysing 968 clusters, we find a wide range in the rms errors in log M200c delivered by the different methods (0.18–1.08 dex, i.e. a factor of ~1.5–12), with abundance-matchingmore » and richness methods providing the best results, irrespective of the input model assumptions. In addition, certain methods produce a significant number of catastrophic cases where the mass is under- or overestimated by a factor greater than 10. Given the steeply falling high-mass end of the cluster mass function, we recommend that richness- or abundance-matching-based methods are used in conjunction with these methods as a sanity check for studies selecting high-mass clusters. We also see a stronger correlation of the recovered to input number of galaxies for both catalogues in comparison with the group/cluster mass, however, this does not guarantee that the correct member galaxies are being selected. Finally, we did not observe significantly higher scatter for either mock galaxy catalogues. These results have implications for cosmological analyses that utilize the masses, richnesses, or abundances of clusters, which have different uncertainties when different methods are used.« less
Density-based clustering analyses to identify heterogeneous cellular sub-populations
NASA Astrophysics Data System (ADS)
Heaster, Tiffany M.; Walsh, Alex J.; Landman, Bennett A.; Skala, Melissa C.
2017-02-01
Autofluorescence microscopy of NAD(P)H and FAD provides functional metabolic measurements at the single-cell level. Here, density-based clustering algorithms were applied to metabolic autofluorescence measurements to identify cell-level heterogeneity in tumor cell cultures. The performance of the density-based clustering algorithm, DENCLUE, was tested in samples with known heterogeneity (co-cultures of breast carcinoma lines). DENCLUE was found to better represent the distribution of cell clusters compared to Gaussian mixture modeling. Overall, DENCLUE is a promising approach to quantify cell-level heterogeneity, and could be used to understand single cell population dynamics in cancer progression and treatment.
MOCCA-SURVEY Database I: Is NGC 6535 a dark star cluster harbouring an IMBH?
NASA Astrophysics Data System (ADS)
Askar, Abbas; Bianchini, Paolo; de Vita, Ruggero; Giersz, Mirek; Hypki, Arkadiusz; Kamann, Sebastian
2017-01-01
We describe the dynamical evolution of a unique type of dark star cluster model in which the majority of the cluster mass at Hubble time is dominated by an intermediate-mass black hole (IMBH). We analysed results from about 2000 star cluster models (Survey Database I) simulated using the Monte Carlo code MOnte Carlo Cluster simulAtor and identified these dark star cluster models. Taking one of these models, we apply the method of simulating realistic `mock observations' by utilizing the Cluster simulatiOn Comparison with ObservAtions (COCOA) and Simulating Stellar Cluster Observation (SISCO) codes to obtain the photometric and kinematic observational properties of the dark star cluster model at 12 Gyr. We find that the perplexing Galactic globular cluster NGC 6535 closely matches the observational photometric and kinematic properties of the dark star cluster model presented in this paper. Based on our analysis and currently observed properties of NGC 6535, we suggest that this globular cluster could potentially harbour an IMBH. If it exists, the presence of this IMBH can be detected robustly with proposed kinematic observations of NGC 6535.
Baars, Erik W; van der Hart, Onno; Nijenhuis, Ellert R S; Chu, James A; Glas, Gerrit; Draijer, Nel
2011-01-01
The purpose of this study was to develop an expertise-based prognostic model for the treatment of complex posttraumatic stress disorder (PTSD) and dissociative identity disorder (DID). We developed a survey in 2 rounds: In the first round we surveyed 42 experienced therapists (22 DID and 20 complex PTSD therapists), and in the second round we surveyed a subset of 22 of the 42 therapists (13 DID and 9 complex PTSD therapists). First, we drew on therapists' knowledge of prognostic factors for stabilization-oriented treatment of complex PTSD and DID. Second, therapists prioritized a list of prognostic factors by estimating the size of each variable's prognostic effect; we clustered these factors according to content and named the clusters. Next, concept mapping methodology and statistical analyses (including principal components analyses) were used to transform individual judgments into weighted group judgments for clusters of items. A prognostic model, based on consensually determined estimates of effect sizes, of 8 clusters containing 51 factors for both complex PTSD and DID was formed. It includes the clusters lack of motivation, lack of healthy relationships, lack of healthy therapeutic relationships, lack of other internal and external resources, serious Axis I comorbidity, serious Axis II comorbidity, poor attachment, and self-destruction. In addition, a set of 5 DID-specific items was constructed. The model is supportive of the current phase-oriented treatment model, emphasizing the strengthening of the therapeutic relationship and the patient's resources in the initial stabilization phase. Further research is needed to test the model's statistical and clinical validity.
A Cyber-Attack Detection Model Based on Multivariate Analyses
NASA Astrophysics Data System (ADS)
Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi
In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.
Hahus, Ian; Migliaccio, Kati; Douglas-Mankin, Kyle; Klarenberg, Geraldine; Muñoz-Carpena, Rafael
2018-04-27
Hierarchical and partitional cluster analyses were used to compartmentalize Water Conservation Area 1, a managed wetland within the Arthur R. Marshall Loxahatchee National Wildlife Refuge in southeast Florida, USA, based on physical, biological, and climatic geospatial attributes. Single, complete, average, and Ward's linkages were tested during the hierarchical cluster analyses, with average linkage providing the best results. In general, the partitional method, partitioning around medoids, found clusters that were more evenly sized and more spatially aggregated than those resulting from the hierarchical analyses. However, hierarchical analysis appeared to be better suited to identify outlier regions that were significantly different from other areas. The clusters identified by geospatial attributes were similar to clusters developed for the interior marsh in a separate study using water quality attributes, suggesting that similar factors have influenced variations in both the set of physical, biological, and climatic attributes selected in this study and water quality parameters. However, geospatial data allowed further subdivision of several interior marsh clusters identified from the water quality data, potentially indicating zones with important differences in function. Identification of these zones can be useful to managers and modelers by informing the distribution of monitoring equipment and personnel as well as delineating regions that may respond similarly to future changes in management or climate.
Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data
Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.
2003-01-01
Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292
Intracluster age gradients in numerous young stellar clusters
NASA Astrophysics Data System (ADS)
Getman, K. V.; Feigelson, E. D.; Kuhn, M. A.; Bate, M. R.; Broos, P. S.; Garmire, G. P.
2018-05-01
The pace and pattern of star formation leading to rich young stellar clusters is quite uncertain. In this context, we analyse the spatial distribution of ages within 19 young (median t ≲ 3 Myr on the Siess et al. time-scale), morphologically simple, isolated, and relatively rich stellar clusters. Our analysis is based on young stellar object (YSO) samples from the Massive Young Star-Forming Complex Study in Infrared and X-ray and Star Formation in Nearby Clouds surveys, and a new estimator of pre-main sequence (PMS) stellar ages, AgeJX, derived from X-ray and near-infrared photometric data. Median cluster ages are computed within four annular subregions of the clusters. We confirm and extend the earlier result of Getman et al. (2014): 80 per cent of the clusters show age trends where stars in cluster cores are younger than in outer regions. Our cluster stacking analyses establish the existence of an age gradient to high statistical significance in several ways. Time-scales vary with the choice of PMS evolutionary model; the inferred median age gradient across the studied clusters ranges from 0.75 to 1.5 Myr pc-1. The empirical finding reported in the present study - late or continuing formation of stars in the cores of star clusters with older stars dispersed in the outer regions - has a strong foundation with other observational studies and with the astrophysical models like the global hierarchical collapse model of Vázquez-Semadeni et al.
Schramm, Catherine; Vial, Céline; Bachoud-Lévi, Anne-Catherine; Katsahian, Sandrine
2018-01-01
Heterogeneity in treatment efficacy is a major concern in clinical trials. Clustering may help to identify the treatment responders and the non-responders. In the context of longitudinal cluster analyses, sample size and variability of the times of measurements are the main issues with the current methods. Here, we propose a new two-step method for the Clustering of Longitudinal data by using an Extended Baseline. The first step relies on a piecewise linear mixed model for repeated measurements with a treatment-time interaction. The second step clusters the random predictions and considers several parametric (model-based) and non-parametric (partitioning, ascendant hierarchical clustering) algorithms. A simulation study compares all options of the clustering of longitudinal data by using an extended baseline method with the latent-class mixed model. The clustering of longitudinal data by using an extended baseline method with the two model-based algorithms was the more robust model. The clustering of longitudinal data by using an extended baseline method with all the non-parametric algorithms failed when there were unequal variances of treatment effect between clusters or when the subgroups had unbalanced sample sizes. The latent-class mixed model failed when the between-patients slope variability is high. Two real data sets on neurodegenerative disease and on obesity illustrate the clustering of longitudinal data by using an extended baseline method and show how clustering may help to identify the marker(s) of the treatment response. The application of the clustering of longitudinal data by using an extended baseline method in exploratory analysis as the first stage before setting up stratified designs can provide a better estimation of treatment effect in future clinical trials.
Grieve, Richard; Nixon, Richard; Thompson, Simon G
2010-01-01
Cost-effectiveness analyses (CEA) may be undertaken alongside cluster randomized trials (CRTs) where randomization is at the level of the cluster (for example, the hospital or primary care provider) rather than the individual. Costs (and outcomes) within clusters may be correlated so that the assumption made by standard bivariate regression models, that observations are independent, is incorrect. This study develops a flexible modeling framework to acknowledge the clustering in CEA that use CRTs. The authors extend previous Bayesian bivariate models for CEA of multicenter trials to recognize the specific form of clustering in CRTs. They develop new Bayesian hierarchical models (BHMs) that allow mean costs and outcomes, and also variances, to differ across clusters. They illustrate how each model can be applied using data from a large (1732 cases, 70 primary care providers) CRT evaluating alternative interventions for reducing postnatal depression. The analyses compare cost-effectiveness estimates from BHMs with standard bivariate regression models that ignore the data hierarchy. The BHMs show high levels of cost heterogeneity across clusters (intracluster correlation coefficient, 0.17). Compared with standard regression models, the BHMs yield substantially increased uncertainty surrounding the cost-effectiveness estimates, and altered point estimates. The authors conclude that ignoring clustering can lead to incorrect inferences. The BHMs that they present offer a flexible modeling framework that can be applied more generally to CEA that use CRTs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hadgu, Teklu; Appel, Gordon John
Sandia National Laboratories (SNL) continued evaluation of total system performance assessment (TSPA) computing systems for the previously considered Yucca Mountain Project (YMP). This was done to maintain the operational readiness of the computing infrastructure (computer hardware and software) and knowledge capability for total system performance assessment (TSPA) type analysis, as directed by the National Nuclear Security Administration (NNSA), DOE 2010. This work is a continuation of the ongoing readiness evaluation reported in Lee and Hadgu (2014) and Hadgu et al. (2015). The TSPA computing hardware (CL2014) and storage system described in Hadgu et al. (2015) were used for the currentmore » analysis. One floating license of GoldSim with Versions 9.60.300, 10.5 and 11.1.6 was installed on the cluster head node, and its distributed processing capability was mapped on the cluster processors. Other supporting software were tested and installed to support the TSPA-type analysis on the server cluster. The current tasks included verification of the TSPA-LA uncertainty and sensitivity analyses, and preliminary upgrade of the TSPA-LA from Version 9.60.300 to the latest version 11.1. All the TSPA-LA uncertainty and sensitivity analyses modeling cases were successfully tested and verified for the model reproducibility on the upgraded 2014 server cluster (CL2014). The uncertainty and sensitivity analyses used TSPA-LA modeling cases output generated in FY15 based on GoldSim Version 9.60.300 documented in Hadgu et al. (2015). The model upgrade task successfully converted the Nominal Modeling case to GoldSim Version 11.1. Upgrade of the remaining of the modeling cases and distributed processing tasks will continue. The 2014 server cluster and supporting software systems are fully operational to support TSPA-LA type analysis.« less
Tait, Luke; Wedgwood, Kyle; Tsaneva-Atanasova, Krasimira; Brown, Jon T; Goodfellow, Marc
2018-07-14
The entorhinal cortex is a crucial component of our memory and spatial navigation systems and is one of the first areas to be affected in dementias featuring tau pathology, such as Alzheimer's disease and frontotemporal dementia. Electrophysiological recordings from principle cells of medial entorhinal cortex (layer II stellate cells, mEC-SCs) demonstrate a number of key identifying properties including subthreshold oscillations in the theta (4-12 Hz) range and clustered action potential firing. These single cell properties are correlated with network activity such as grid firing and coupling between theta and gamma rhythms, suggesting they are important for spatial memory. As such, experimental models of dementia have revealed disruption of organised dorsoventral gradients in clustered action potential firing. To better understand the mechanisms underpinning these different dynamics, we study a conductance based model of mEC-SCs. We demonstrate that the model, driven by extrinsic noise, can capture quantitative differences in clustered action potential firing patterns recorded from experimental models of tau pathology and healthy animals. The differential equation formulation of our model allows us to perform numerical bifurcation analyses in order to uncover the dynamic mechanisms underlying these patterns. We show that clustered dynamics can be understood as subcritical Hopf/homoclinic bursting in a fast-slow system where the slow sub-system is governed by activation of the persistent sodium current and inactivation of the slow A-type potassium current. In the full system, we demonstrate that clustered firing arises via flip bifurcations as conductance parameters are varied. Our model analyses confirm the experimentally suggested hypothesis that the breakdown of clustered dynamics in disease occurs via increases in AHP conductance. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Deckersbach, Thilo; Peters, Amy T.; Sylvia, Louisa G.; Gold, Alexandra K.; da Silva Magalhaes, Pedro Vieira; Henry, David B.; Frank, Ellen; Otto, Michael W.; Berk, Michael; Dougherty, Darin D.; Nierenberg, Andrew A.; Miklowitz, David J.
2016-01-01
Background We sought to address how predictors and moderators of psychotherapy for bipolar depression – identified individually in prior analyses – can inform the development of a metric for prospectively classifying treatment outcome in intensive psychotherapy (IP) versus collaborative care (CC) adjunctive to pharmacotherapy in the Systematic Treatment Enhancement Program (STEP-BD) study. Methods We conducted post-hoc analyses on 135 STEP-BD participants using cluster analysis to identify subsets of participants with similar clinical profiles and investigated this combined metric as a moderator and predictor of response to IP. We used agglomerative hierarchical cluster analyses and k-means clustering to determine the content of the clinical profiles. Logistic regression and Cox proportional hazard models were used to evaluate whether the resulting clusters predicted or moderated likelihood of recovery or time until recovery. Results The cluster analysis yielded a two-cluster solution: 1) “less-recurrent/severe” and 2) “chronic/recurrent.” Rates of recovery in IP were similar for less-recurrent/severe and chronic/recurrent participants. Less-recurrent/severe patients were more likely than chronic/recurrent patients to achieve recovery in CC (p = .040, OR = 4.56). IP yielded a faster recovery for chronic/recurrent participants, whereas CC led to recovery sooner in the less-recurrent/severe cluster (p = .034, OR = 2.62). Limitations Cluster analyses require list-wise deletion of cases with missing data so we were unable to conduct analyses on all STEP-BD participants. Conclusions A well-powered, parametric approach can distinguish patients based on illness history and provide clinicians with symptom profiles of patients that confer differential prognosis in CC vs. IP. PMID:27289316
Substructures in DAFT/FADA survey clusters based on XMM and optical data
NASA Astrophysics Data System (ADS)
Durret, F.; DAFT/FADA Team
2014-07-01
The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.
Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses
ERIC Educational Resources Information Center
Huang, Guan-Hua; Wang, Su-Mei; Hsu, Chung-Chu
2011-01-01
Statisticians typically estimate the parameters of latent class and latent profile models using the Expectation-Maximization algorithm. This paper proposes an alternative two-stage approach to model fitting. The first stage uses the modified k-means and hierarchical clustering algorithms to identify the latent classes that best satisfy the…
Cerón-Muñoz, M F; Tonhati, H; Costa, C N; Rojas-Sarmiento, D; Echeverri Echeverri, D M
2004-08-01
Descriptive herd variables (DVHE) were used to explain genotype by environment interactions (G x E) for milk yield (MY) in Brazilian and Colombian production environments and to develop a herd-cluster model to estimate covariance components and genetic parameters for each herd environment group. Data consisted of 180,522 lactation records of 94,558 Holstein cows from 937 Brazilian and 400 Colombian herds. Herds in both countries were jointly grouped in thirds according to 8 DVHE: production level, phenotypic variability, age at first calving, calving interval, percentage of imported semen, lactation length, and herd size. For each DVHE, REML bivariate animal model analyses were used to estimate genetic correlations for MY between upper and lower thirds of the data. Based on estimates of genetic correlations, weights were assigned to each DVHE to group herds in a cluster analysis using the FASTCLUS procedure in SAS. Three clusters were defined, and genetic and residual variance components were heterogeneous among herd clusters. Estimates of heritability in clusters 1 and 3 were 0.28 and 0.29, respectively, but the estimate was larger (0.39) in Cluster 2. The genetic correlations of MY from different clusters ranged from 0.89 to 0.97. The herd-cluster model based on DVHE properly takes into account G x E by grouping similar environments accordingly and seems to be an alternative to simply considering country borders to distinguish between environments.
Simultaneous Co-Clustering and Classification in Customers Insight
NASA Astrophysics Data System (ADS)
Anggistia, M.; Saefuddin, A.; Sartono, B.
2017-04-01
Building predictive model based on the heterogeneous dataset may yield many problems, such as less precise in parameter and prediction accuracy. Such problem can be solved by segmenting the data into relatively homogeneous groups and then build a predictive model for each cluster. The advantage of using this strategy usually gives result in simpler models, more interpretable, and more actionable without any loss in accuracy and reliability. This work concerns on marketing data set which recorded a customer behaviour across products. There are some variables describing customer and product as attributes. The basic idea of this approach is to combine co-clustering and classification simultaneously. The objective of this research is to analyse the customer across product characteristics, so the marketing strategy implemented precisely.
Truscott, James E; Werkman, Marleen; Wright, James E; Farrell, Sam H; Sarkar, Rajiv; Ásbjörnsdóttir, Kristjana; Anderson, Roy M
2017-06-30
There is an increased focus on whether mass drug administration (MDA) programmes alone can interrupt the transmission of soil-transmitted helminths (STH). Mathematical models can be used to model these interventions and are increasingly being implemented to inform investigators about expected trial outcome and the choice of optimum study design. One key factor is the choice of threshold for detecting elimination. However, there are currently no thresholds defined for STH regarding breaking transmission. We develop a simulation of an elimination study, based on the DeWorm3 project, using an individual-based stochastic disease transmission model in conjunction with models of MDA, sampling, diagnostics and the construction of study clusters. The simulation is then used to analyse the relationship between the study end-point elimination threshold and whether elimination is achieved in the long term within the model. We analyse the quality of a range of statistics in terms of the positive predictive values (PPV) and how they depend on a range of covariates, including threshold values, baseline prevalence, measurement time point and how clusters are constructed. End-point infection prevalence performs well in discriminating between villages that achieve interruption of transmission and those that do not, although the quality of the threshold is sensitive to baseline prevalence and threshold value. Optimal post-treatment prevalence threshold value for determining elimination is in the range 2% or less when the baseline prevalence range is broad. For multiple clusters of communities, both the probability of elimination and the ability of thresholds to detect it are strongly dependent on the size of the cluster and the size distribution of the constituent communities. Number of communities in a cluster is a key indicator of probability of elimination and PPV. Extending the time, post-study endpoint, at which the threshold statistic is measured improves PPV value in discriminating between eliminating clusters and those that bounce back. The probability of elimination and PPV are very sensitive to baseline prevalence for individual communities. However, most studies and programmes are constructed on the basis of clusters. Since elimination occurs within smaller population sub-units, the construction of clusters introduces new sensitivities for elimination threshold values to cluster size and the underlying population structure. Study simulation offers an opportunity to investigate key sources of sensitivity for elimination studies and programme designs in advance and to tailor interventions to prevailing local or national conditions.
NASA Astrophysics Data System (ADS)
Ji, Yu; Sheng, Wanxing; Jin, Wei; Wu, Ming; Liu, Haitao; Chen, Feng
2018-02-01
A coordinated optimal control method of active and reactive power of distribution network with distributed PV cluster based on model predictive control is proposed in this paper. The method divides the control process into long-time scale optimal control and short-time scale optimal control with multi-step optimization. The models are transformed into a second-order cone programming problem due to the non-convex and nonlinear of the optimal models which are hard to be solved. An improved IEEE 33-bus distribution network system is used to analyse the feasibility and the effectiveness of the proposed control method
Li, Qing; Li, Xiaoming; Stanton, Bonita; Fang, Xiaoyi; Zhao, Ran
2010-11-01
Multilevel analytical techniques are being applied in condom use research to ensure the validity of investigation on environmental/structural influences and clustered data from venue-based sampling. The literature contains reports of consistent associations between perceived gatekeeper support and condom use among entertainment establishment-based female sex workers (FSWs) in Guangxi, China. However, the clustering inherent in the data (FSWs being clustered within establishment) has not been accounted in most of the analyses. We used multilevel analyses to examine perceived features of gatekeepers and individual correlates of consistent condom use among FSWs and to validate the findings in the existing literature. We analyzed cross-sectional data from 318 FSWs from 29 entertainment establishments in Guangxi, China in 2004, with a minimum of 5 FSWs per establishment. The Hierarchical Linear Models program with Laplace estimation was used to estimate the parameters in models containing random effects and binary outcomes. About 11.6% of women reported consistent condom use with clients. The intraclass correlation coefficient indicated 18.5% of the variance in condom use could be attributed to their similarity between FSWs within the same establishments. Women's perceived gatekeeper support and education remained positively associated with condom use (P < 0.05), after controlling for other individual characteristics and clustering. After adjusting for data clustering, perceived gatekeeper support remains associated with consistent condom use with clients among FSWs in China. The results imply that combined interventions to intervene both gatekeepers and individual FSW may effectively promote consistent condom use.
Li, Qing; Li, Xiaoming; Stanton, Bonita; Fang, Xiaoyi; Zhao, Ran
2010-01-01
Background Multilevel analytical techniques are being applied in condom use research to ensure the validity of investigation on environmental/structural influences and clustered data from venue-based sampling. The literature contains reports of consistent associations between perceived gatekeeper support and condom use among entertainments establishment-based female sex workers (FSWs) in Guangxi, China. However, the clustering inherent in the data (FSWs being clustered within establishment) has not been accounted in most of the analyses. We used multilevel analyses to examine perceived features of gatekeepers and individual correlates of consistent condom use among FSWs and to validate the findings in the existing literature. Methods We analyzed cross-sectional data from 318 FSWs from 29 entertainment establishments in Guangxi, China in 2004, with a minimum of 5 FSWs per establishment. The Hierarchical Linear Models program with Laplace estimation was used to estimate the parameters in models containing random effects and binary outcomes. Results About 11.6% of women reported consistent condom use with clients. The intraclass correlation coefficient indicated 18.5% of the variance in condom use could be attributed to their similarity between FSWs within the same establishments. Women’s perceived gatekeeper support and education remained positively associated with condom use (P < 0.05), after controlling for other individual characteristics and clustering. Conclusions After adjusting for data clustering, perceived gatekeeper support remains associated with consistent condom use with clients among FSWs in China. The results imply that combined interventions to intervene both gatekeepers and individual FSW may effectively promote consistent condom use. PMID:20539262
Weak lensing calibration of mass bias in the REFLEX+BCS X-ray galaxy cluster catalogue
NASA Astrophysics Data System (ADS)
Simet, Melanie; Battaglia, Nicholas; Mandelbaum, Rachel; Seljak, Uroš
2017-04-01
The use of large, X-ray-selected Galaxy cluster catalogues for cosmological analyses requires a thorough understanding of the X-ray mass estimates. Weak gravitational lensing is an ideal method to shed light on such issues, due to its insensitivity to the cluster dynamical state. We perform a weak lensing calibration of 166 galaxy clusters from the REFLEX and BCS cluster catalogue and compare our results to the X-ray masses based on scaled luminosities from that catalogue. To interpret the weak lensing signal in terms of cluster masses, we compare the lensing signal to simple theoretical Navarro-Frenk-White models and to simulated cluster lensing profiles, including complications such as cluster substructure, projected large-scale structure and Eddington bias. We find evidence of underestimation in the X-ray masses, as expected, with
Cluster Physics with Merging Galaxy Clusters
NASA Astrophysics Data System (ADS)
Molnar, Sandor
Collisions between galaxy clusters provide a unique opportunity to study matter in a parameter space which cannot be explored in our laboratories on Earth. In the standard ΛCDM model, where the total density is dominated by the cosmological constant (Λ) and the matter density by cold dark matter (CDM), structure formation is hierarchical, and clusters grow mostly by merging. Mergers of two massive clusters are the most energetic events in the universe after the Big Bang, hence they provide a unique laboratory to study cluster physics. The two main mass components in clusters behave differently during collisions: the dark matter is nearly collisionless, responding only to gravity, while the gas is subject to pressure forces and dissipation, and shocks and turbulence are developed during collisions. In the present contribution we review the different methods used to derive the physical properties of merging clusters. Different physical processes leave their signatures on different wavelengths, thus our review is based on a multifrequency analysis. In principle, the best way to analyze multifrequency observations of merging clusters is to model them using N-body/HYDRO numerical simulations. We discuss the results of such detailed analyses. New high spatial and spectral resolution ground and space based telescopes will come online in the near future. Motivated by these new opportunities, we briefly discuss methods which will be feasible in the near future in studying merging clusters.
Kéchichian, Razmig; Valette, Sébastien; Desvignes, Michel; Prost, Rémy
2013-11-01
We derive shortest-path constraints from graph models of structure adjacency relations and introduce them in a joint centroidal Voronoi image clustering and Graph Cut multiobject semiautomatic segmentation framework. The vicinity prior model thus defined is a piecewise-constant model incurring multiple levels of penalization capturing the spatial configuration of structures in multiobject segmentation. Qualitative and quantitative analyses and comparison with a Potts prior-based approach and our previous contribution on synthetic, simulated, and real medical images show that the vicinity prior allows for the correct segmentation of distinct structures having identical intensity profiles and improves the precision of segmentation boundary placement while being fairly robust to clustering resolution. The clustering approach we take to simplify images prior to segmentation strikes a good balance between boundary adaptivity and cluster compactness criteria furthermore allowing to control the trade-off. Compared with a direct application of segmentation on voxels, the clustering step improves the overall runtime and memory footprint of the segmentation process up to an order of magnitude without compromising the quality of the result.
Characteristics of airflow and particle deposition in COPD current smokers
NASA Astrophysics Data System (ADS)
Zou, Chunrui; Choi, Jiwoong; Haghighi, Babak; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long
2017-11-01
A recent imaging-based cluster analysis of computed tomography (CT) lung images in a chronic obstructive pulmonary disease (COPD) cohort identified four clusters, viz. disease sub-populations. Cluster 1 had relatively normal airway structures; Cluster 2 had wall thickening; Cluster 3 exhibited decreased wall thickness and luminal narrowing; Cluster 4 had a significant decrease of luminal diameter and a significant reduction of lung deformation, thus having relatively low pulmonary functions. To better understand the characteristics of airflow and particle deposition in these clusters, we performed computational fluid and particle dynamics analyses on representative cluster patients and healthy controls using CT-based airway models and subject-specific 3D-1D coupled boundary conditions. The results show that particle deposition in central airways of cluster 4 patients was noticeably increased especially with increasing particle size despite reduced vital capacity as compared to other clusters and healthy controls. This may be attributable in part to significant airway constriction in cluster 4. This study demonstrates the potential application of cluster-guided CFD analysis in disease populations. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837.
Patterns of breast cancer mortality trends in Europe.
Amaro, Joana; Severo, Milton; Vilela, Sofia; Fonseca, Sérgio; Fontes, Filipa; La Vecchia, Carlo; Lunet, Nuno
2013-06-01
To identify patterns of variation in breast cancer mortality in Europe (1980-2010), using a model-based approach. Mortality data were obtained from the World Health Organization database and mixed models were used to describe the time trends in the age-standardized mortality rates (ASMR). Model-based clustering was used to identify clusters of countries with homogeneous variation in ASMR. Three patterns were identified. Patterns 1 and 2 are characterized by stable or slightly increasing trends in ASMR in the first half of the period analysed, and a clear decline is observed thereafter; in pattern 1 the median of the ASMR is higher, and the highest rates were achieved sooner. Pattern 3 is characterised by a rapid increase in mortality until 1999, declining slowly thereafter. This study provides a general model for the description and interpretation of the variation in breast cancer mortality in Europe, based in three main patterns. Copyright © 2013 Elsevier Ltd. All rights reserved.
Crowe, Michael L; LoPilato, Alexander C; Campbell, W Keith; Miller, Joshua D
2016-12-01
The present study hypothesized that there exist two distinct groups of entitled individuals: grandiose-entitled, and vulnerable-entitled. Self-report scores of entitlement were collected for 916 individuals using an online platform. Model-based cluster analyses were conducted on the individuals with scores one standard deviation above mean (n = 159) using the five-factor model dimensions as clustering variables. The results support the existence of two groups of entitled individuals categorized as emotionally stable and emotionally vulnerable. The emotionally stable cluster reported emotional stability, high self-esteem, more positive affect, and antisocial behavior. The emotionally vulnerable cluster reported low self-esteem and high levels of neuroticism, disinhibition, conventionality, psychopathy, negative affect, childhood abuse, intrusive parenting, and attachment difficulties. Compared to the control group, both clusters reported being more antagonistic, extraverted, Machiavellian, and narcissistic. These results suggest important differences are missed when simply examining the linear relationships between entitlement and various aspects of its nomological network.
Development of a model of the tobacco industry's interference with tobacco control programmes
Trochim, W; Stillman, F; Clark, P; Schmitt, C
2003-01-01
Objective: To construct a conceptual model of tobacco industry tactics to undermine tobacco control programmes for the purposes of: (1) developing measures to evaluate industry tactics, (2) improving tobacco control planning, and (3) supplementing current or future frameworks used to classify and analyse tobacco industry documents. Design: Web based concept mapping was conducted, including expert brainstorming, sorting, and rating of statements describing industry tactics. Statistical analyses used multidimensional scaling and cluster analysis. Interpretation of the resulting maps was accomplished by an expert panel during a face-to-face meeting. Subjects: 34 experts, selected because of their previous encounters with industry resistance or because of their research into industry tactics, took part in some or all phases of the project. Results: Maps with eight non-overlapping clusters in two dimensional space were developed, with importance ratings of the statements and clusters. Cluster and quadrant labels were agreed upon by the experts. Conclusions: The conceptual maps summarise the tactics used by the industry and their relationships to each other, and suggest a possible hierarchy for measures that can be used in statistical modelling of industry tactics and for review of industry documents. Finally, the maps enable hypothesis of a likely progression of industry reactions as public health programmes become more successful, and therefore more threatening to industry profits. PMID:12773723
Individual participant data meta-analyses should not ignore clustering
Abo-Zaid, Ghada; Guo, Boliang; Deeks, Jonathan J.; Debray, Thomas P.A.; Steyerberg, Ewout W.; Moons, Karel G.M.; Riley, Richard David
2013-01-01
Objectives Individual participant data (IPD) meta-analyses often analyze their IPD as if coming from a single study. We compare this approach with analyses that rather account for clustering of patients within studies. Study Design and Setting Comparison of effect estimates from logistic regression models in real and simulated examples. Results The estimated prognostic effect of age in patients with traumatic brain injury is similar, regardless of whether clustering is accounted for. However, a family history of thrombophilia is found to be a diagnostic marker of deep vein thrombosis [odds ratio, 1.30; 95% confidence interval (CI): 1.00, 1.70; P = 0.05] when clustering is accounted for but not when it is ignored (odds ratio, 1.06; 95% CI: 0.83, 1.37; P = 0.64). Similarly, the treatment effect of nicotine gum on smoking cessation is severely attenuated when clustering is ignored (odds ratio, 1.40; 95% CI: 1.02, 1.92) rather than accounted for (odds ratio, 1.80; 95% CI: 1.29, 2.52). Simulations show models accounting for clustering perform consistently well, but downwardly biased effect estimates and low coverage can occur when ignoring clustering. Conclusion Researchers must routinely account for clustering in IPD meta-analyses; otherwise, misleading effect estimates and conclusions may arise. PMID:23651765
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
Optimal integrated abundances for chemical tagging of extragalactic globular clusters
NASA Astrophysics Data System (ADS)
Sakari, Charli M.; Venn, Kim; Shetrone, Matthew; Dotter, Aaron; Mackey, Dougal
2014-09-01
High-resolution integrated light (IL) spectroscopy provides detailed abundances of distant globular clusters whose stars cannot be resolved. Abundance comparisons with other systems (e.g. for chemical tagging) require understanding the systematic offsets that can occur between clusters, such as those due to uncertainties in the underlying stellar population. This paper analyses high-resolution IL spectra of the Galactic globular clusters 47 Tuc, M3, M13, NGC 7006, and M15 to (1) quantify potential systematic uncertainties in Fe, Ca, Ti, Ni, Ba, and Eu and (2) identify the most stable abundance ratios that will be useful in future analyses of unresolved targets. When stellar populations are well modelled, uncertainties are ˜0.1-0.2 dex based on sensitivities to the atmospheric parameters alone; in the worst-case scenarios, uncertainties can rise to 0.2-0.4 dex. The [Ca I/Fe I] ratio is identified as the optimal integrated [α/Fe] indicator (with offsets ≲ 0.1 dex), while [Ni I/Fe I] is also extremely stable to within ≲ 0.1 dex. The [Ba II/Eu II] ratios are also stable when the underlying populations are well modelled and may also be useful for chemical tagging.
NASA Astrophysics Data System (ADS)
Grieb, Jan Niklas; Sánchez, Ariel G.; Salazar-Albornoz, Salvador; Scoccimarro, Román; Crocce, Martín; Dalla Vecchia, Claudio; Montesano, Francesco; Gil-Marín, Héctor; Ross, Ashley J.; Beutler, Florian; Rodríguez-Torres, Sergio; Chuang, Chia-Hsun; Prada, Francisco; Kitaura, Francisco-Shu; Cuesta, Antonio J.; Eisenstein, Daniel J.; Percival, Will J.; Vargas-Magaña, Mariana; Tinker, Jeremy L.; Tojeiro, Rita; Brownstein, Joel R.; Maraston, Claudia; Nichol, Robert C.; Olmstead, Matthew D.; Samushia, Lado; Seo, Hee-Jong; Streblyanska, Alina; Zhao, Gong-bo
2017-05-01
We extract cosmological information from the anisotropic power-spectrum measurements from the recently completed Baryon Oscillation Spectroscopic Survey (BOSS), extending the concept of clustering wedges to Fourier space. Making use of new fast-Fourier-transform-based estimators, we measure the power-spectrum clustering wedges of the BOSS sample by filtering out the information of Legendre multipoles ℓ > 4. Our modelling of these measurements is based on novel approaches to describe non-linear evolution, bias and redshift-space distortions, which we test using synthetic catalogues based on large-volume N-body simulations. We are able to include smaller scales than in previous analyses, resulting in tighter cosmological constraints. Using three overlapping redshift bins, we measure the angular-diameter distance, the Hubble parameter and the cosmic growth rate, and explore the cosmological implications of our full-shape clustering measurements in combination with cosmic microwave background and Type Ia supernova data. Assuming a Λ cold dark matter (ΛCDM) cosmology, we constrain the matter density to Ω M= 0.311_{-0.010}^{+0.009} and the Hubble parameter to H_0 = 67.6_{-0.6}^{+0.7} km s^{-1 Mpc^{-1}}, at a confidence level of 68 per cent. We also allow for non-standard dark energy models and modifications of the growth rate, finding good agreement with the ΛCDM paradigm. For example, we constrain the equation-of-state parameter to w = -1.019_{-0.039}^{+0.048}. This paper is part of a set that analyses the final galaxy-clustering data set from BOSS. The measurements and likelihoods presented here are combined with others in Alam et al. to produce the final cosmological constraints from BOSS.
Psychopathic Traits in Youth: Is There Evidence for Primary and Secondary Subtypes?
ERIC Educational Resources Information Center
Lee, Zina; Salekin, Randall T.; Iselin, Anne-Marie R.
2010-01-01
The current study employed model-based cluster analysis in a sample of male adolescent offenders (n = 94) to examine subtypes based on psychopathic traits and anxiety. Using the Psychopathy Checklist: Youth Version (PCL:YV; Forth et al. 2003) and the self-report Antisocial Process Screening Device (APSD; Caputo et al. 1999), analyses identified…
Sasidharan, Lekshmi; Wu, Kun-Feng; Menendez, Monica
2015-12-01
One of the major challenges in traffic safety analyses is the heterogeneous nature of safety data, due to the sundry factors involved in it. This heterogeneity often leads to difficulties in interpreting results and conclusions due to unrevealed relationships. Understanding the underlying relationship between injury severities and influential factors is critical for the selection of appropriate safety countermeasures. A method commonly employed to address systematic heterogeneity is to focus on any subgroup of data based on the research purpose. However, this need not ensure homogeneity in the data. In this paper, latent class cluster analysis is applied to identify homogenous subgroups for a specific crash type-pedestrian crashes. The manuscript employs data from police reported pedestrian (2009-2012) crashes in Switzerland. The analyses demonstrate that dividing pedestrian severity data into seven clusters helps in reducing the systematic heterogeneity of the data and to understand the hidden relationships between crash severity levels and socio-demographic, environmental, vehicle, temporal, traffic factors, and main reason for the crash. The pedestrian crash injury severity models were developed for the whole data and individual clusters, and were compared using receiver operating characteristics curve, for which results favored clustering. Overall, the study suggests that latent class clustered regression approach is suitable for reducing heterogeneity and revealing important hidden relationships in traffic safety analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
de Vries, Natalie Jane; Reis, Rodrigo; Moscato, Pablo
2015-01-01
Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN) followed by a feature saliency method (the CM1 score). A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not-for-profit organisations adopt these strategies, they will be more successful in today's competitive environment.
de Vries, Natalie Jane; Reis, Rodrigo; Moscato, Pablo
2015-01-01
Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN) followed by a feature saliency method (the CM1 score). A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not-for-profit organisations adopt these strategies, they will be more successful in today's competitive environment. PMID:25849547
Assessment of cluster yield components by image analysis.
Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose
2015-04-01
Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
NASA Astrophysics Data System (ADS)
Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.
2014-06-01
Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
NASA Astrophysics Data System (ADS)
Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.
2014-11-01
Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods that have been recently employed to analyse PNSD data; however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectrum to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method.
Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels
2014-07-01
The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous studies. We also compared pairwise distances (between geographically separated samples) with those obtained using the AMOVA method and found good agreement. Further analyses that are impossible with AMOVA were made using the discrete Laplace method: analysis of the homogeneity in two different ways and calculating marginal STR distributions. We found that the Y-STR haplotypes from e.g. Finland were relatively homogeneous as opposed to the relatively heterogeneous Y-STR haplotypes from e.g. Lublin, Eastern Poland and Berlin, Germany. We demonstrated that the observed distributions of alleles at each locus were similar to the expected ones. We also compared pairwise distances between geographically separated samples from Africa with those obtained using the AMOVA method and found good agreement. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A Multivariate Model and Analysis of Competitive Strategy in the U.S. Hardwood Lumber Industry
Robert J. Bush; Steven A. Sinclair
1991-01-01
Business-level competitive strategy in the hardwood lumber industry was modeled through the identification of strategic groups among large U.S. hardwood lumber producers. Strategy was operationalized using a measure based on the variables developed by Dess and Davis (1984). Factor and cluster analyses were used to define strategic groups along the dimensions of cost...
Jacob, Benjamin J; Krapp, Fiorella; Ponce, Mario; Gottuzzo, Eduardo; Griffith, Daniel A; Novak, Robert J
2010-05-01
Spatial autocorrelation is problematic for classical hierarchical cluster detection tests commonly used in multi-drug resistant tuberculosis (MDR-TB) analyses as considerable random error can occur. Therefore, when MDRTB clusters are spatially autocorrelated the assumption that the clusters are independently random is invalid. In this research, a product moment correlation coefficient (i.e., the Moran's coefficient) was used to quantify local spatial variation in multiple clinical and environmental predictor variables sampled in San Juan de Lurigancho, Lima, Peru. Initially, QuickBird 0.61 m data, encompassing visible bands and the near infra-red bands, were selected to synthesize images of land cover attributes of the study site. Data of residential addresses of individual patients with smear-positive MDR-TB were geocoded, prevalence rates calculated and then digitally overlaid onto the satellite data within a 2 km buffer of 31 georeferenced health centers, using a 10 m2 grid-based algorithm. Geographical information system (GIS)-gridded measurements of each health center were generated based on preliminary base maps of the georeferenced data aggregated to block groups and census tracts within each buffered area. A three-dimensional model of the study site was constructed based on a digital elevation model (DEM) to determine terrain covariates associated with the sampled MDR-TB covariates. Pearson's correlation was used to evaluate the linear relationship between the DEM and the sampled MDR-TB data. A SAS/GIS(R) module was then used to calculate univariate statistics and to perform linear and non-linear regression analyses using the sampled predictor variables. The estimates generated from a global autocorrelation analyses were then spatially decomposed into empirical orthogonal bases using a negative binomial regression with a non-homogeneous mean. Results of the DEM analyses indicated a statistically non-significant, linear relationship between georeferenced health centers and the sampled covariate elevation. The data exhibited positive spatial autocorrelation and the decomposition of Moran's coefficient into uncorrelated, orthogonal map pattern components revealed global spatial heterogeneities necessary to capture latent autocorrelation in the MDR-TB model. It was thus shown that Poisson regression analyses and spatial eigenvector mapping can elucidate the mechanics of MDR-TB transmission by prioritizing clinical and environmental-sampled predictor variables for identifying high risk populations.
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil
2009-07-01
Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.
Quark cluster model for deep-inelastic lepton-deuteron scattering
NASA Astrophysics Data System (ADS)
Yen, G.; Vary, J. P.; Harindranath, A.; Pirner, H. J.
1990-10-01
We evaluate the contribution of quasifree nucleon knockout and of inelastic lepton-nucleon scattering in inclusive electron-deuteron reactions at large momentum transfer. We examine the degree of quantitative agreement with deuteron wave functions from the Reid soft-core and Bonn realistic nucleon-nucleon interactions. For the range of data available there is strong sensitivity to the tensor correlations which are distinctively different in these two deuteron models. At this stage of the analyses the Reid soft-core wave function provides a reasonable description of the data while the Bonn wave function does not. We then include a six-quark cluster component whose relative contribution is based on an overlap criterion and obtain a good description of all the data with both interactions. The critical separation at which overlap occurs (formation of six-quark clusters) is taken to be 1.0 fm and the six-quark cluster probability is 4.7% for Reid and 5.4% for Bonn. As a consequence the quark cluster model with either Reid or Bonn wave function describe the SLAC inclusive electron-deuteron scattering data equally well. We then show how additional data would be decisive in resolving which model is ultimately more correct.
NASA Astrophysics Data System (ADS)
Quitadamo, Ian Joseph
Many higher education faculty perceive a deficiency in students' ability to reason, evaluate, and make informed judgments, skills that are deemed necessary for academic and job success in science and math. These skills, often collected within a domain called critical thinking (CT), have been studied and are thought to be influenced by teaching styles (the combination of beliefs, behavior, and attitudes used when teaching) and small group collaborative learning (SGCL). However, no existing studies show teaching styles and SGCL cause changes in student CT performance. This study determined how combinations of teaching styles called clusters and peer-facilitated SGCL (a specific form of SGCL) affect changes in undergraduate student CT performance using a quasi-experimental pre-test/post-test research design and valid and reliable CT performance indicators. Quantitative analyses of three teaching style cluster models (Grasha's cluster model, a weighted cluster model, and a student-centered/teacher-centered cluster model) and peer-facilitated SGCL were performed to evaluate their ability to cause measurable changes in student CT skills. Based on results that indicated weighted teaching style clusters and peer-facilitated SGCL are associated with significant changes in student CT, we conclude that teaching styles and peer-facilitated SGCL influence the development of undergraduate CT in higher education science and math.
Sauzet, Odile; Peacock, Janet L
2017-07-20
The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.
A Web service substitution method based on service cluster nets
NASA Astrophysics Data System (ADS)
Du, YuYue; Gai, JunJing; Zhou, MengChu
2017-11-01
Service substitution is an important research topic in the fields of Web services and service-oriented computing. This work presents a novel method to analyse and substitute Web services. A new concept, called a Service Cluster Net Unit, is proposed based on Web service clusters. A service cluster is converted into a Service Cluster Net Unit. Then it is used to analyse whether the services in the cluster can satisfy some service requests. Meanwhile, the substitution methods of an atomic service and a composite service are proposed. The correctness of the proposed method is proved, and the effectiveness is shown and compared with the state-of-the-art method via an experiment. It can be readily applied to e-commerce service substitution to meet the business automation needs.
Planck 2015 results. XXIV. Cosmology from Sunyaev-Zeldovich cluster counts
NASA Astrophysics Data System (ADS)
Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Bartolo, N.; Battaner, E.; Battye, R.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.-R.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dolag, K.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Falgarone, E.; Fergusson, J.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Melin, J.-B.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Roman, M.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Türler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Weller, J.; White, S. D. M.; Yvon, D.; Zacchei, A.; Zonca, A.
2016-09-01
We present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing of background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. Improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.
Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts
Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...
2016-09-20
In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less
Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ade, P. A. R.; Aghanim, N.; Arnaud, M.
In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less
Bayesian multivariate hierarchical transformation models for ROC analysis.
O'Malley, A James; Zou, Kelly H
2006-02-15
A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.
Bayesian multivariate hierarchical transformation models for ROC analysis
O'Malley, A. James; Zou, Kelly H.
2006-01-01
SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836
An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.
Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei
2013-05-01
Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.
Implicit Priors in Galaxy Cluster Mass and Scaling Relation Determinations
NASA Technical Reports Server (NTRS)
Mantz, A.; Allen, S. W.
2011-01-01
Deriving the total masses of galaxy clusters from observations of the intracluster medium (ICM) generally requires some prior information, in addition to the assumptions of hydrostatic equilibrium and spherical symmetry. Often, this information takes the form of particular parametrized functions used to describe the cluster gas density and temperature profiles. In this paper, we investigate the implicit priors on hydrostatic masses that result from this fully parametric approach, and the implications of such priors for scaling relations formed from those masses. We show that the application of such fully parametric models of the ICM naturally imposes a prior on the slopes of the derived scaling relations, favoring the self-similar model, and argue that this prior may be influential in practice. In contrast, this bias does not exist for techniques which adopt an explicit prior on the form of the mass profile but describe the ICM non-parametrically. Constraints on the slope of the cluster mass-temperature relation in the literature show a separation based the approach employed, with the results from fully parametric ICM modeling clustering nearer the self-similar value. Given that a primary goal of scaling relation analyses is to test the self-similar model, the application of methods subject to strong, implicit priors should be avoided. Alternative methods and best practices are discussed.
Huber, Heinrich J; Connolly, Niamh M C; Dussmann, Heiko; Prehn, Jochen H M
2012-03-01
We devised an approach to extract control principles of cellular bioenergetics for intact and impaired mitochondria from ODE-based models and applied it to a recently established bioenergetic model of cancer cells. The approach used two methods for varying ODE model parameters to determine those model components that, either alone or in combination with other components, most decisively regulated bioenergetic state variables. We found that, while polarisation of the mitochondrial membrane potential (ΔΨ(m)) and, therefore, the protomotive force were critically determined by respiratory complex I activity in healthy mitochondria, complex III activity was dominant for ΔΨ(m) during conditions of cytochrome-c deficiency. As a further important result, cellular bioenergetics in healthy, ATP-producing mitochondria was regulated by three parameter clusters that describe (1) mitochondrial respiration, (2) ATP production and consumption and (3) coupling of ATP-production and respiration. These parameter clusters resembled metabolic blocks and their intermediaries from top-down control analyses. However, parameter clusters changed significantly when cells changed from low to high ATP levels or when mitochondria were considered to be impaired by loss of cytochrome-c. This change suggests that the assumption of static metabolic blocks by conventional top-down control analyses is not valid under these conditions. Our approach is complementary to both ODE and top-down control analysis approaches and allows a better insight into cellular bioenergetics and its pathological alterations.
Coordinate based random effect size meta-analysis of neuroimaging studies.
Tench, C R; Tanasescu, Radu; Constantinescu, C S; Auer, D P; Cottam, W J
2017-06-01
Low power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta-analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density. Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is necessary to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely. Copyright © 2017 Elsevier Inc. All rights reserved.
Cluster Synchronization of Diffusively Coupled Nonlinear Systems: A Contraction-Based Approach
NASA Astrophysics Data System (ADS)
Aminzare, Zahra; Dey, Biswadip; Davison, Elizabeth N.; Leonard, Naomi Ehrich
2018-04-01
Finding the conditions that foster synchronization in networked nonlinear systems is critical to understanding a wide range of biological and mechanical systems. However, the conditions proved in the literature for synchronization in nonlinear systems with linear coupling, such as has been used to model neuronal networks, are in general not strict enough to accurately determine the system behavior. We leverage contraction theory to derive new sufficient conditions for cluster synchronization in terms of the network structure, for a network where the intrinsic nonlinear dynamics of each node may differ. Our result requires that network connections satisfy a cluster-input-equivalence condition, and we explore the influence of this requirement on network dynamics. For application to networks of nodes with FitzHugh-Nagumo dynamics, we show that our new sufficient condition is tighter than those found in previous analyses that used smooth or nonsmooth Lyapunov functions. Improving the analytical conditions for when cluster synchronization will occur based on network configuration is a significant step toward facilitating understanding and control of complex networked systems.
NASA Astrophysics Data System (ADS)
Jauzac, Mathilde; Harvey, David; Massey, Richard
2018-04-01
We assess how much unused strong lensing information is available in the deep Hubble Space Telescope imaging and VLT/MUSE spectroscopy of the Frontier Field clusters. As a pilot study, we analyse galaxy cluster MACS J0416.1-2403 (z=0.397, M(R < 200 kpc)=1.6×1014M⊙), which has 141 multiple images with spectroscopic redshifts. We find that many additional parameters in a cluster mass model can be constrained, and that adding even small amounts of extra freedom to a model can dramatically improve its figures of merit. We use this information to constrain the distribution of dark matter around cluster member galaxies, simultaneously with the cluster's large-scale mass distribution. We find tentative evidence that some galaxies' dark matter has surprisingly similar ellipticity to their stars (unlike in the field, where it is more spherical), but that its orientation is often misaligned. When non-coincident dark matter and stellar halos are allowed, the model improves by 35%. This technique may provide a new way to investigate the processes and timescales on which dark matter is stripped from galaxies as they fall into a massive cluster. Our preliminary conclusions will be made more robust by analysing the remaining five Frontier Field clusters.
NASA Astrophysics Data System (ADS)
Jauzac, Mathilde; Harvey, David; Massey, Richard
2018-07-01
We assess how much unused strong lensing information is available in the deep Hubble Space Telescope imaging and Very Large Telescope/Multi Unit Spectroscopic Explorer spectroscopy of the Frontier Field clusters. As a pilot study, we analyse galaxy cluster MACS J0416.1-2403 (z = 0.397, M(R < 200 kpc) = 1.6 × 1014 M⊙), which has 141 multiple images with spectroscopic redshifts. We find that many additional parameters in a cluster mass model can be constrained, and that adding even small amounts of extra freedom to a model can dramatically improve its figures of merit. We use this information to constrain the distribution of dark matter around cluster member galaxies, simultaneously with the cluster's large-scale mass distribution. We find tentative evidence that some galaxies' dark matter has surprisingly similar ellipticity to their stars (unlike in the field, where it is more spherical), but that its orientation is often misaligned. When non-coincident dark matter and stellar haloes are allowed, the model improves by 35 per cent. This technique may provide a new way to investigate the processes and time-scales on which dark matter is stripped from galaxies as they fall into a massive cluster. Our preliminary conclusions will be made more robust by analysing the remaining five Frontier Field clusters.
NASA Astrophysics Data System (ADS)
Asa'd, Randa S.; Vazdekis, Alexandre; Cerviño, Miguel; Noël, Noelia E. D.; Beasley, Michael A.; Kassab, Mahmoud
2017-11-01
The optical integrated spectra of three Large Magellanic Cloud young stellar clusters (NGC 1984, NGC 1994 and NGC 2011) exhibit concave continua and prominent molecular bands which deviate significantly from the predictions of single stellar population (SSP) models. In order to understand the appearance of these spectra, we create a set of young stellar population (MILES) models, which we make available to the community. We use archival International Ultraviolet Explorer integrated UV spectra to independently constrain the cluster masses and extinction, and rule out strong stochastic effects in the optical spectra. In addition, we also analyse deep colour-magnitude diagrams of the clusters to provide independent age determinations based on isochrone fitting. We explore hypotheses, including age spreads in the clusters, a top-heavy initial mass function, different SSP models and the role of red supergiant stars (RSG). We find that the strong molecular features in the optical spectra can be only reproduced by modelling an increased fraction of about ˜20 per cent by luminosity of RSG above what is predicted by canonical stellar evolution models. Given the uncertainties in stellar evolution at Myr ages, we cannot presently rule out the presence of Myr age spreads in these clusters. Our work combines different wavelengths as well as different approaches (resolved data as well as integrated spectra for the same sample) in order to reveal the complete picture. We show that each approach provides important information but in combination we can better understand the cluster stellar populations.
Transformation and model choice for RNA-seq co-expression analysis.
Rau, Andrea; Maugis-Rabusseau, Cathy
2018-05-01
Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.
Cold dark energy constraints from the abundance of galaxy clusters
Heneka, Caroline; Rapetti, David; Cataneo, Matteo; ...
2017-10-05
We constrain cold dark energy of negligible sound speed using galaxy cluster abundance observations. In contrast to standard quasi-homogeneous dark energy, negligible sound speed implies clustering of the dark energy fluid at all scales, allowing us to measure the effects of dark energy perturbations at cluster scales. We compare those models and set the stage for using non-linear information from semi-analytical modelling in cluster growth data analyses. For this, we recalibrate the halo mass function with non-linear characteristic quantities, the spherical collapse threshold and virial overdensity, that account for model and redshift-dependent behaviours, as well as an additional mass contributionmore » for cold dark energy. Here in this paper, we present the first constraints from this cold dark matter plus cold dark energy mass function using our cluster abundance likelihood, which self-consistently accounts for selection effects, covariances and systematic uncertainties. We combine cluster growth data with cosmic microwave background, supernovae Ia and baryon acoustic oscillation data, and find a shift between cold versus quasi-homogeneous dark energy of up to 1σ. We make a Fisher matrix forecast of constraints attainable with cluster growth data from the ongoing Dark Energy Survey (DES). For DES, we predict ~ 50 percent tighter constraints on (Ωm, w) for cold dark energy versus wCDM models, with the same free parameters. Overall, we show that cluster abundance analyses are sensitive to cold dark energy, an alternative, viable model that should be routinely investigated alongside the standard dark energy scenario.« less
Cold dark energy constraints from the abundance of galaxy clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heneka, Caroline; Rapetti, David; Cataneo, Matteo
We constrain cold dark energy of negligible sound speed using galaxy cluster abundance observations. In contrast to standard quasi-homogeneous dark energy, negligible sound speed implies clustering of the dark energy fluid at all scales, allowing us to measure the effects of dark energy perturbations at cluster scales. We compare those models and set the stage for using non-linear information from semi-analytical modelling in cluster growth data analyses. For this, we recalibrate the halo mass function with non-linear characteristic quantities, the spherical collapse threshold and virial overdensity, that account for model and redshift-dependent behaviours, as well as an additional mass contributionmore » for cold dark energy. Here in this paper, we present the first constraints from this cold dark matter plus cold dark energy mass function using our cluster abundance likelihood, which self-consistently accounts for selection effects, covariances and systematic uncertainties. We combine cluster growth data with cosmic microwave background, supernovae Ia and baryon acoustic oscillation data, and find a shift between cold versus quasi-homogeneous dark energy of up to 1σ. We make a Fisher matrix forecast of constraints attainable with cluster growth data from the ongoing Dark Energy Survey (DES). For DES, we predict ~ 50 percent tighter constraints on (Ωm, w) for cold dark energy versus wCDM models, with the same free parameters. Overall, we show that cluster abundance analyses are sensitive to cold dark energy, an alternative, viable model that should be routinely investigated alongside the standard dark energy scenario.« less
Intermediate and advanced topics in multilevel logistic regression analysis.
Austin, Peter C; Merlo, Juan
2017-09-10
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Swarm Intelligence for Urban Dynamics Modelling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghnemat, Rawan; Bertelle, Cyrille; Duchamp, Gerard H. E.
2009-04-16
In this paper, we propose swarm intelligence algorithms to deal with dynamical and spatial organization emergence. The goal is to model and simulate the developement of spatial centers using multi-criteria. We combine a decentralized approach based on emergent clustering mixed with spatial constraints or attractions. We propose an extension of the ant nest building algorithm with multi-center and adaptive process. Typically, this model is suitable to analyse and simulate urban dynamics like gentrification or the dynamics of the cultural equipment in urban area.
Swarm Intelligence for Urban Dynamics Modelling
NASA Astrophysics Data System (ADS)
Ghnemat, Rawan; Bertelle, Cyrille; Duchamp, Gérard H. E.
2009-04-01
In this paper, we propose swarm intelligence algorithms to deal with dynamical and spatial organization emergence. The goal is to model and simulate the developement of spatial centers using multi-criteria. We combine a decentralized approach based on emergent clustering mixed with spatial constraints or attractions. We propose an extension of the ant nest building algorithm with multi-center and adaptive process. Typically, this model is suitable to analyse and simulate urban dynamics like gentrification or the dynamics of the cultural equipment in urban area.
X-ray aspects of the DAFT/FADA clusters
NASA Astrophysics Data System (ADS)
Guennou, L.; Durret, F.; Lima Neto, G. B.; Adami, C.
2012-12-01
We have undertaken the DAFT/FADA survey with the aim of applying constraints on dark energy based on weak lensing tomography as well as obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range [0.4,0.9] for which there are HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. We present preliminary results on the coupled X-ray and dynamical analyses of these clusters.
Pan-genome and phylogeny of Bacillus cereus sensu lato.
Bazinet, Adam L
2017-08-02
Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes and novel bioinformatic workflows to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered. A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP*, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering. All phylogenetic analyses recapitulated two previously used classification systems, and taxa were consistently assigned to the same major clade and group. By including accessory genes from the pan-genome in the phylogenetic analyses, I produced an exceptionally well-supported phylogeny of 114 complete B. cereus s. l. genomes. The best-performing methods were used to produce a phylogeny of all 498 publicly available B. cereus s. l. genomes, which was in turn used to compare three different classification systems and to test the monophyly status of various B. cereus s. l. species. The majority of the methodology used in this study is generic and could be leveraged to produce pan-genome estimates and similarly robust phylogenetic hypotheses for other bacterial groups.
Sethi, Suresh; Linden, Daniel; Wenburg, John; Lewis, Cara; Lemons, Patrick R.; Fuller, Angela K.; Hare, Matthew P.
2016-01-01
Error-tolerant likelihood-based match calling presents a promising technique to accurately identify recapture events in genetic mark–recapture studies by combining probabilities of latent genotypes and probabilities of observed genotypes, which may contain genotyping errors. Combined with clustering algorithms to group samples into sets of recaptures based upon pairwise match calls, these tools can be used to reconstruct accurate capture histories for mark–recapture modelling. Here, we assess the performance of a recently introduced error-tolerant likelihood-based match-calling model and sample clustering algorithm for genetic mark–recapture studies. We assessed both biallelic (i.e. single nucleotide polymorphisms; SNP) and multiallelic (i.e. microsatellite; MSAT) markers using a combination of simulation analyses and case study data on Pacific walrus (Odobenus rosmarus divergens) and fishers (Pekania pennanti). A novel two-stage clustering approach is demonstrated for genetic mark–recapture applications. First, repeat captures within a sampling occasion are identified. Subsequently, recaptures across sampling occasions are identified. The likelihood-based matching protocol performed well in simulation trials, demonstrating utility for use in a wide range of genetic mark–recapture studies. Moderately sized SNP (64+) and MSAT (10–15) panels produced accurate match calls for recaptures and accurate non-match calls for samples from closely related individuals in the face of low to moderate genotyping error. Furthermore, matching performance remained stable or increased as the number of genetic markers increased, genotyping error notwithstanding.
An integrative model for in-silico clinical-genomics discovery science.
Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael
2002-01-01
Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.
Childhood antecedents of adolescent personality disorders.
Bernstein, D P; Cohen, P; Skodol, A; Bezirganian, S; Brook, J S
1996-07-01
The purpose of this study was to investigate the childhood antecedents of personality disorders that are diagnosed in adolescence. A randomly selected community sample of 641 youths was assessed initially in childhood and followed longitudinally over 10 years. Childhood behavior ratings were based on maternal report; diagnoses of adolescent personality disorders were based on data obtained from both maternal and youth informants. Four composite measures of childhood behavior problems were used: conduct problems, depressive symptoms, anxiety/fear, and immaturity. Adolescent personality disorders were considered present only if the disorders persisted over a 2-year period. For all analyses, personality disorders were grouped into the three clusters (A, B, and C) of DSM-III-R. Logistic regression analyses indicated that all four of the putative childhood antecedents were associated with greater odds of an adolescent personality disorder 10 years later. Childhood conduct problems remained an independent predictor of personality disorders in all three clusters, even when other childhood problems were included in the same regression model. Additionally, depressive symptoms emerged as an independent predictor of cluster A personality disorders in boys, while immaturity was an independent predictor of cluster B personality disorders in girls. No moderating effects of age at time of childhood assessment were found. These results support the view that personality disorders can be traced to childhood emotional and behavioral disturbances and suggest that these problems have both general and specific relationships to adolescent personality functioning.
NASA Astrophysics Data System (ADS)
Mehmood, S.; Ashfaq, M.; Evans, K. J.; Black, R. X.; Hsu, H. H.
2017-12-01
Extreme precipitation during summer season has shown an increasing trend across South Asia in recent decades, causing an exponential increase in weather related losses. Here we combine a cluster analyses technique (Agglomerative Hierarchical Clustering) with a Lagrangian based moisture analyses technique to investigate potential commonalities in the characteristics of the large scale meteorological patterns (LSMP) and moisture anomalies associated with the observed extreme precipitation events, and their representation in the Department of Energy model ACME. Using precipitation observations from the Indian Meteorological Department (IMD) and Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE), and atmospheric variables from Era-Interim Reanalysis, we first identify LSMP both in upper and lower troposphere that are responsible for wide spread precipitation extreme events during 1980-2015 period. For each of the selected extreme event, we perform moisture source analyses to identify major evaporative sources that sustain anomalous moisture supply during the course of the event, with a particular focus on local terrestrial moisture recycling. Further, we perform similar analyses on two sets of five-member ensemble of ACME model (1-degree and ¼ degree) to investigate the ability of ACME model in simulating precipitation extremes associated with each of the LSMP patterns and associated anomalous moisture sourcing from each of the terrestrial and oceanic evaporative region. Comparison of low and high-resolution model configurations provides insight about the influence of horizontal grid spacing in the simulation of extreme precipitation and the governing mechanisms.
Percolation analyses of observed and simulated galaxy clustering
NASA Astrophysics Data System (ADS)
Bhavsar, S. P.; Barrow, J. D.
1983-11-01
A percolation cluster analysis is performed on equivalent regions of the CFA redshift survey of galaxies and the 4000 body simulations of gravitational clustering made by Aarseth, Gott and Turner (1979). The observed and simulated percolation properties are compared and, unlike correlation and multiplicity function analyses, favour high density (Omega = 1) models with n = - 1 initial data. The present results show that the three-dimensional data are consistent with the degree of filamentary structure present in isothermal models of galaxy formation at the level of percolation analysis. It is also found that the percolation structure of the CFA data is a function of depth. Percolation structure does not appear to be a sensitive probe of intrinsic filamentary structure.
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.
van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim
2017-01-01
In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
Stopka, Thomas J; Goulart, Michael A; Meyers, David J; Hutcheson, Marga; Barton, Kerri; Onofrey, Shauna; Church, Daniel; Donahue, Ashley; Chui, Kenneth K H
2017-04-20
Hepatitis C virus (HCV) infections have increased during the past decade but little is known about geographic clustering patterns. We used a unique analytical approach, combining geographic information systems (GIS), spatial epidemiology, and statistical modeling to identify and characterize HCV hotspots, statistically significant clusters of census tracts with elevated HCV counts and rates. We compiled sociodemographic and HCV surveillance data (n = 99,780 cases) for Massachusetts census tracts (n = 1464) from 2002 to 2013. We used a five-step spatial epidemiological approach, calculating incremental spatial autocorrelations and Getis-Ord Gi* statistics to identify clusters. We conducted logistic regression analyses to determine factors associated with the HCV hotspots. We identified nine HCV clusters, with the largest in Boston, New Bedford/Fall River, Worcester, and Springfield (p < 0.05). In multivariable analyses, we found that HCV hotspots were independently and positively associated with the percent of the population that was Hispanic (adjusted odds ratio [AOR]: 1.07; 95% confidence interval [CI]: 1.04, 1.09) and the percent of households receiving food stamps (AOR: 1.83; 95% CI: 1.22, 2.74). HCV hotspots were independently and negatively associated with the percent of the population that were high school graduates or higher (AOR: 0.91; 95% CI: 0.89, 0.93) and the percent of the population in the "other" race/ethnicity category (AOR: 0.88; 95% CI: 0.85, 0.91). We identified locations where HCV clusters were a concern, and where enhanced HCV prevention, treatment, and care can help combat the HCV epidemic in Massachusetts. GIS, spatial epidemiological and statistical analyses provided a rigorous approach to identify hotspot clusters of disease, which can inform public health policy and intervention targeting. Further studies that incorporate spatiotemporal cluster analyses, Bayesian spatial and geostatistical models, spatially weighted regression analyses, and assessment of associations between HCV clustering and the built environment are needed to expand upon our combined spatial epidemiological and statistical methods.
Catchment classification by runoff behaviour with self-organizing maps (SOM)
NASA Astrophysics Data System (ADS)
Ley, R.; Casper, M. C.; Hellebrand, H.; Merz, R.
2011-09-01
Catchments show a wide range of response behaviour, even if they are adjacent. For many purposes it is necessary to characterise and classify them, e.g. for regionalisation, prediction in ungauged catchments, model parameterisation. In this study, we investigate hydrological similarity of catchments with respect to their response behaviour. We analyse more than 8200 event runoff coefficients (ERCs) and flow duration curves of 53 gauged catchments in Rhineland-Palatinate, Germany, for the period from 1993 to 2008, covering a huge variability of weather and runoff conditions. The spatio-temporal variability of event-runoff coefficients and flow duration curves are assumed to represent how different catchments "transform" rainfall into runoff. From the runoff coefficients and flow duration curves we derive 12 signature indices describing various aspects of catchment response behaviour to characterise each catchment. Hydrological similarity of catchments is defined by high similarities of their indices. We identify, analyse and describe hydrologically similar catchments by cluster analysis using Self-Organizing Maps (SOM). As a result of the cluster analysis we get five clusters of similarly behaving catchments where each cluster represents one differentiated class of catchments. As catchment response behaviour is supposed to be dependent on its physiographic and climatic characteristics, we compare groups of catchments clustered by response behaviour with clusters of catchments based on catchment properties. Results show an overlap of 67% between these two pools of clustered catchments which can be improved using the topologic correctness of SOMs.
Catchment classification by runoff behaviour with self-organizing maps (SOM)
NASA Astrophysics Data System (ADS)
Ley, R.; Casper, M. C.; Hellebrand, H.; Merz, R.
2011-03-01
Catchments show a wide range of response behaviour, even if they are adjacent. For many purposes it is necessary to characterise and classify them, e.g. for regionalisation, prediction in ungauged catchments, model parameterisation. In this study, we investigate hydrological similarity of catchments with respect to their response behaviour. We analyse more than 8200 event runoff coefficients (ERCs) and flow duration curves of 53 gauged catchments in Rhineland-Palatinate, Germany, for the period from 1993 to 2008, covering a huge variability of weather and runoff conditions. The spatio-temporal variability of event-runoff coefficients and flow duration curves are assumed to represent how different catchments "transform" rainfall into runoff. From the runoff coefficients and flow duration curves we derive 12 signature indices describing various aspects of catchment response behaviour to characterise each catchment. Hydrological similarity of catchments is defined by high similarities of their indices. We identify, analyse and describe hydrologically similar catchments by cluster analysis using Self-Organizing Maps (SOM). As a result of the cluster analysis we get five clusters of similarly behaving catchments where each cluster represents one differentiated class of catchments. As catchment response behaviour is supposed to be dependent on its physiographic and climatic characteristics, we compare groups of catchments clustered by response behaviour with clusters of catchments based on catchment properties. Results show an overlap of 67% between these two pools of clustered catchments which can be improved using the topologic correctness of SOMs.
NASA Astrophysics Data System (ADS)
Sakata, Katsumi; Ohyanagi, Hajime; Sato, Shinji; Nobori, Hiroya; Hayashi, Akiko; Ishii, Hideshi; Daub, Carsten O.; Kawai, Jun; Suzuki, Harukazu; Saito, Toshiyuki
2015-02-01
We present a system-wide transcriptional network structure that controls cell types in the context of expression pattern transitions that correspond to cell type transitions. Co-expression based analyses uncovered a system-wide, ladder-like transcription factor cluster structure composed of nearly 1,600 transcription factors in a human transcriptional network. Computer simulations based on a transcriptional regulatory model deduced from the system-wide, ladder-like transcription factor cluster structure reproduced expression pattern transitions when human THP-1 myelomonocytic leukaemia cells cease proliferation and differentiate under phorbol myristate acetate stimulation. The behaviour of MYC, a reprogramming Yamanaka factor that was suggested to be essential for induced pluripotent stem cells during dedifferentiation, could be interpreted based on the transcriptional regulation predicted by the system-wide, ladder-like transcription factor cluster structure. This study introduces a novel system-wide structure to transcriptional networks that provides new insights into network topology.
Nolan, Jim
2014-01-01
This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS) data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area. PMID:24778585
Doubly stochastic Poisson process models for precipitation at fine time-scales
NASA Astrophysics Data System (ADS)
Ramesh, Nadarajah I.; Onof, Christian; Xie, Dichao
2012-09-01
This paper considers a class of stochastic point process models, based on doubly stochastic Poisson processes, in the modelling of rainfall. We examine the application of this class of models, a neglected alternative to the widely-known Poisson cluster models, in the analysis of fine time-scale rainfall intensity. These models are mainly used to analyse tipping-bucket raingauge data from a single site but an extension to multiple sites is illustrated which reveals the potential of this class of models to study the temporal and spatial variability of precipitation at fine time-scales.
Dishion, Thomas J.; Ha, Thao; Véronneau, Marie-Hélène
2012-01-01
This study proposes the inclusion of peer relationships in a life history perspective on adolescent problem behavior. Longitudinal analyses were used to examine deviant peer clustering as the mediating link between attenuated family ties, peer marginalization, and social disadvantage in early adolescence and sexual promiscuity in middle adolescence and childbearing by early adulthood. Specifically, 998 youth and their families were assessed at age 11 years and periodically through age 24 years. Structural equation modeling revealed that the peer-enhanced life history model provided a good fit to the longitudinal data, with deviant peer clustering strongly predicting adolescent sexual promiscuity and other correlated problem behaviors. Sexual promiscuity, as expected, also strongly predicted the number of children by age 22–24 years. Consistent with a life history perspective, family social disadvantage directly predicted deviant peer clustering and number of children in early adulthood, controlling for all other variables in the model. These data suggest that deviant peer clustering is a core dimension of a fast life history strategy, with strong links to sexual activity and childbearing. The implications of these findings are discussed with respect to the need to integrate an evolutionary-based model of self-organized peer groups in developmental and intervention science. PMID:22409765
Dishion, Thomas J; Ha, Thao; Véronneau, Marie-Hélène
2012-05-01
The authors propose that peer relationships should be included in a life history perspective on adolescent problem behavior. Longitudinal analyses were used to examine deviant peer clustering as the mediating link between attenuated family ties, peer marginalization, and social disadvantage in early adolescence and sexual promiscuity in middle adolescence and childbearing by early adulthood. Specifically, 998 youths, along with their families, were assessed at age 11 years and periodically through age 24 years. Structural equation modeling revealed that the peer-enhanced life history model provided a good fit to the longitudinal data, with deviant peer clustering strongly predicting adolescent sexual promiscuity and other correlated problem behaviors. Sexual promiscuity, as expected, also strongly predicted the number of children by ages 22-24 years. Consistent with a life history perspective, family social disadvantage directly predicted deviant peer clustering and number of children in early adulthood, controlling for all other variables in the model. These data suggest that deviant peer clustering is a core dimension of a fast life history strategy, with strong links to sexual activity and childbearing. The implications of these findings are discussed with respect to the need to integrate an evolutionary-based model of self-organized peer groups in developmental and intervention science.
Bias and inference from misspecified mixed-effect models in stepped wedge trial analysis.
Thompson, Jennifer A; Fielding, Katherine L; Davey, Calum; Aiken, Alexander M; Hargreaves, James R; Hayes, Richard J
2017-10-15
Many stepped wedge trials (SWTs) are analysed by using a mixed-effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common-to-all or varied-between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within-cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within-cluster comparisons in the standard model. In the SWTs simulated here, mixed-effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within-cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Bias and inference from misspecified mixed‐effect models in stepped wedge trial analysis
Fielding, Katherine L.; Davey, Calum; Aiken, Alexander M.; Hargreaves, James R.; Hayes, Richard J.
2017-01-01
Many stepped wedge trials (SWTs) are analysed by using a mixed‐effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common‐to‐all or varied‐between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within‐cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within‐cluster comparisons in the standard model. In the SWTs simulated here, mixed‐effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within‐cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28556355
Efficient generation of low-energy folded states of a model protein
NASA Astrophysics Data System (ADS)
Gordon, Heather L.; Kwan, Wai Kei; Gong, Chunhang; Larrass, Stefan; Rothstein, Stuart M.
2003-01-01
A number of short simulated annealing runs are performed on a highly-frustrated 46-"residue" off-lattice model protein. We perform, in an iterative fashion, a principal component analysis of the 946 nonbonded interbead distances, followed by two varieties of cluster analyses: hierarchical and k-means clustering. We identify several distinct sets of conformations with reasonably consistent cluster membership. Nonbonded distance constraints are derived for each cluster and are employed within a distance geometry approach to generate many new conformations, previously unidentified by the simulated annealing experiments. Subsequent analyses suggest that these new conformations are members of the parent clusters from which they were generated. Furthermore, several novel, previously unobserved structures with low energy were uncovered, augmenting the ensemble of simulated annealing results, and providing a complete distribution of low-energy states. The computational cost of this approach to generating low-energy conformations is small when compared to the expense of further Monte Carlo simulated annealing runs.
NASA Astrophysics Data System (ADS)
Liu, Fang; Cao, San-xing; Lu, Rui
2012-04-01
This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.
Orsi, Rebecca
2017-02-01
Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright © 2016 Elsevier Ltd. All rights reserved.
Exploring spatial evolution of economic clusters: A case study of Beijing
NASA Astrophysics Data System (ADS)
Yang, Zhenshan; Sliuzas, Richard; Cai, Jianming; Ottens, Henk F. L.
2012-10-01
An identification of economic clusters and analysing their changing spatial patterns is important for understanding urban economic space dynamics. Previous studies, however, suffer from limitations as a consequence of using fixed geographically areas and not combining functional and spatial dynamics. The paper presents an approach, based on local spatial statistics and the case of Beijing to understand the spatial clustering of industries that are functionally interconnected by common or complementary patterns of demand or supply relations. Using register data of business establishments, it identifies economic clusters and analyses their pattern based on postcodes at different time slices during the period 1983-2002. The study shows how the advanced services occupy the urban centre and key sub centres. The Information and Communication Technology (ICT) cluster is mainly concentrated in the north part of the city and circles the urban centre, and the main manufacturing clusters are evolved in the key sub centers. This type of outcomes improves understanding of urban-economic dynamics, which can support spatial and economic planning.
Hawkins, Amy L; Haskett, Mary E
2014-01-01
Abused children's internal working models (IWM) of relationships are known to relate to their socioemotional adjustment, but mechanisms through which negative representations increase vulnerability to maladjustment have not been explored. We sought to expand the understanding of individual differences in IWM of abused children and investigate the mediating role of self-regulation in links between IWM and adjustment. Cluster analysis was used to subgroup 74 physically abused children based on their IWM. Internal working models were identified by children's representations, as measured by a narrative story stem task. Self-regulation was assessed by teacher report and a behavioral task, and adjustment was measured by teacher report. Cluster analyses indicated two subgroups of abused children with distinct patterns of IWMs. Cluster membership predicted internalizing and externalizing problems. Associations between cluster membership and adjustment were mediated by children's regulation, as measured by teacher reports of many aspects of regulation. There was no support for mediation when regulation was measured by a behavioral task that tapped more narrow facets of regulation. Abused children exhibit clinically relevant individual differences in their IWMs; these models are linked to adjustment in the school setting, possibly through children's self-regulation. © 2013 The Authors. Journal of Child Psychology and Psychiatry © 2013 Association for Child and Adolescent Mental Health.
Sloshing Gas in the Core of the Most Luminous Galaxy Cluster RXJ1347.5-1145
NASA Technical Reports Server (NTRS)
Johnson, Ryan E.; Zuhone, John; Jones, Christine; Forman, William R.; Markevitvh, Maxim
2011-01-01
We present new constraints on the merger history of the most X-ray luminous cluster of galaxies, RXJ1347.5-1145, based on its unique multiwavelength morphology. Our X-ray analysis confirms the core gas is undergoing "sloshing" resulting from a prior, large scale, gravitational perturbation. In combination with extensive multiwavelength observations, the sloshing gas points to the primary and secondary clusters having had at least two prior strong gravitational interactions. The evidence supports a model in which the secondary subcluster with mass M=4.8+/-2.4x10(exp 14) solar Mass has previously (> or approx.0.6 Gyr ago) passed by the primary cluster, and has now returned for a subsequent crossing where the subcluster's gas has been completely stripped from its dark matter halo. RXJ1347 is a prime example of how core gas sloshing may be used to constrain the merger histories of galaxy clusters through multiwavelength analyses.
An order statistics approach to the halo model for galaxies
NASA Astrophysics Data System (ADS)
Paul, Niladri; Paranjape, Aseem; Sheth, Ravi K.
2017-04-01
We use the halo model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models - one in which this luminosity function p(L) is universal - naturally produces a number of features associated with previous analyses based on the 'central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the lognormal distribution around this mean and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering; however, this model predicts no luminosity dependence of large-scale clustering. We then show that an extended version of this model, based on the order statistics of a halo mass dependent luminosity function p(L|m), is in much better agreement with the clustering data as well as satellite luminosities, but systematically underpredicts central luminosities. This brings into focus the idea that central galaxies constitute a distinct population that is affected by different physical processes than are the satellites. We model this physical difference as a statistical brightening of the central luminosities, over and above the order statistics prediction. The magnitude gap between the brightest and second brightest group galaxy is predicted as a by-product, and is also in good agreement with observations. We propose that this order statistics framework provides a useful language in which to compare the halo model for galaxies with more physically motivated galaxy formation models.
Berwanger, Otávio; Guimarães, Hélio P; Laranjeira, Ligia N; Cavalcanti, Alexandre B; Kodama, Alessandra; Zazula, Ana Denise; Santucci, Eliana; Victor, Elivane; Flato, Uri A; Tenuta, Marcos; Carvalho, Vitor; Mira, Vera Lucia; Pieper, Karen S; Mota, Luiz Henrique; Peterson, Eric D; Lopes, Renato D
2012-03-01
Translating evidence into clinical practice in the management of acute coronary syndromes (ACS) is challenging. Few ACS quality improvement interventions have been rigorously evaluated to determine their impact on patient care and clinical outcomes. We designed a pragmatic, 2-arm, cluster-randomized trial involving 34 clusters (Brazilian public hospitals). Clusters were randomized to receive a multifaceted quality improvement intervention (experimental group) or routine practice (control group). The 6-month educational intervention included reminders, care algorithms, a case manager, and distribution of educational materials to health care providers. The primary end point was a composite of evidence-based post-ACS therapies within 24 hours of admission, with the secondary measure of major cardiovascular clinical events (death, nonfatal myocardial infarction, nonfatal cardiac arrest, and nonfatal stroke). Prescription of evidence-based therapies at hospital discharge were also evaluated as part of the secondary outcomes. All analyses were performed by the intention-to-treat principle and took the cluster design into account using individual-level regression modeling (generalized estimating equations). If proven effective, this multifaceted intervention would have wide use as a means of promoting optimal use of evidence-based interventions for the management of ACS. Copyright © 2012 Mosby, Inc. All rights reserved.
The use of hierarchical clustering for the design of optimized monitoring networks
NASA Astrophysics Data System (ADS)
Soares, Joana; Makar, Paul Andrew; Aklilu, Yayne; Akingunola, Ayodeji
2018-05-01
Associativity analysis is a powerful tool to deal with large-scale datasets by clustering the data on the basis of (dis)similarity and can be used to assess the efficacy and design of air quality monitoring networks. We describe here our use of Kolmogorov-Zurbenko filtering and hierarchical clustering of NO2 and SO2 passive and continuous monitoring data to analyse and optimize air quality networks for these species in the province of Alberta, Canada. The methodology applied in this study assesses dissimilarity between monitoring station time series based on two metrics: 1 - R, R being the Pearson correlation coefficient, and the Euclidean distance; we find that both should be used in evaluating monitoring site similarity. We have combined the analytic power of hierarchical clustering with the spatial information provided by deterministic air quality model results, using the gridded time series of model output as potential station locations, as a proxy for assessing monitoring network design and for network optimization. We demonstrate that clustering results depend on the air contaminant analysed, reflecting the difference in the respective emission sources of SO2 and NO2 in the region under study. Our work shows that much of the signal identifying the sources of NO2 and SO2 emissions resides in shorter timescales (hourly to daily) due to short-term variation of concentrations and that longer-term averages in data collection may lose the information needed to identify local sources. However, the methodology identifies stations mainly influenced by seasonality, if larger timescales (weekly to monthly) are considered. We have performed the first dissimilarity analysis based on gridded air quality model output and have shown that the methodology is capable of generating maps of subregions within which a single station will represent the entire subregion, to a given level of dissimilarity. We have also shown that our approach is capable of identifying different sampling methodologies as well as outliers (stations' time series which are markedly different from all others in a given dataset).
Inglin, Raffael C; Meile, Leo; Stevens, Marc J A
2018-04-24
Bacterial taxonomy aims to classify bacteria based on true evolutionary events and relies on a polyphasic approach that includes phenotypic, genotypic and chemotaxonomic analyses. Until now, complete genomes are largely ignored in taxonomy. The genus Lactobacillus consists of 173 species and many genomes are available to study taxonomy and evolutionary events. We analyzed and clustered 98 completely sequenced genomes of the genus Lactobacillus and 234 draft genomes of 5 different Lactobacillus species, i.e. L. reuteri, L. delbrueckii, L. plantarum, L. rhamnosus and L. helveticus. The core-genome of the genus Lactobacillus contains 266 genes and the pan-genome 20'800 genes. Clustering of the Lactobacillus pan- and core-genome resulted in two highly similar trees. This shows that evolutionary history is traceable in the core-genome and that clustering of the core-genome is sufficient to explore relationships. Clustering of core- and pan-genomes at species' level resulted in similar trees as well. Detailed analyses of the core-genomes showed that the functional class "genetic information processing" is conserved in the core-genome but that "signaling and cellular processes" is not. The latter class encodes functions that are involved in environmental interactions. Evolution of lactobacilli seems therefore directed by the environment. The type species L. delbrueckii was analyzed in detail and its pan-genome based tree contained two major clades whose members contained different genes yet identical functions. In addition, evidence for horizontal gene transfer between strains of L. delbrueckii, L. plantarum, and L. rhamnosus, and between species of the genus Lactobacillus is presented. Our data provide evidence for evolution of some lactobacilli according to a parapatric-like model for species differentiation. Core-genome trees are useful to detect evolutionary relationships in lactobacilli and might be useful in taxonomic analyses. Lactobacillus' evolution is directed by the environment and HGT.
ERIC Educational Resources Information Center
Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2008-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
NASA Astrophysics Data System (ADS)
Leckebusch, G. C.; Kirchner-Bossi, N. O.; Befort, D. J.; Ulbrich, U.
2015-12-01
Time-clustered mid-latitude winter storms are responsible for a large portion of the overall windstorm-related damage in Europe. Thus, its study entails a high meteorological interest, while its outcome can result in a crucial utility for the (re)insurance industry. In addition to existing cyclone-based studies, here we use an event identification approach based on surface near wind speeds only, to investigate windstorm clustering and compare it to cyclone clustering. Specifically, cyclone and windstorm tracks are identified for winter 1979-2013 (Oct-Mar), to perform two sensitivity analyses on event-clustering in the North Atlantic using ERA-Interim Reanalysis. First, the link between clustering and cyclone intensity is analysed and compared to windstorms. Secondly, the sensitivity of clustering on intra-seasonal time scales is investigated, for both cyclones and windstorms. The wind-based approach reveals additional regions of clustering over Western Europe, which could be related to extreme damages, showing the added value of investigating wind field derived tracks in addition to that of cyclone tracks. Previous studies indicate a higher degree of clustering for stronger cyclones. However, our results show that this assumption is not always met. Although a positive relationship is confirmed for the clustering centre located over Iceland, clustering off the coast of the Iberian Peninsula behaves opposite. Even though this region shows the highest clustering, most of its signal is due to cyclones with intensities below the 70th percentile of the Laplacian of MSLP. Results on the sensitivity of clustering to the time of the winter season (Oct-Mar) show a temporal evolution of the clustering patterns, for both windstorms and cyclones. Compared to all cyclones, clustering of windstorms and strongest cyclones culminate around February, while all cyclone clustering peak in December to January.
X-ray and optical substructures of the DAFT/FADA survey clusters
NASA Astrophysics Data System (ADS)
Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.
2013-04-01
We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.
NASA Astrophysics Data System (ADS)
Bharatham, Kavitha; Bharatham, Nagakumar; Kwon, Yong Jung; Lee, Keun Woo
2008-12-01
Allosteric inhibition of protein tyrosine phosphatase 1B (PTP1B), has paved a new path to design specific inhibitors for PTP1B, which is an important drug target for the treatment of type II diabetes and obesity. The PTP1B1-282-allosteric inhibitor complex crystal structure lacks α7 (287-298) and moreover there is no available 3D structure of PTP1B1-298 in open form. As the interaction between α7 and α6-α3 helices plays a crucial role in allosteric inhibition, α7 was modeled to the PTP1B1-282 in open form complexed with an allosteric inhibitor (compound-2) and a 5 ns MD simulation was performed to investigate the relative orientation of the α7-α6-α3 helices. The simulation conformational space was statistically sampled by clustering analyses. This approach was helpful to reveal certain clues on PTP1B allosteric inhibition. The simulation was also utilized in the generation of receptor based pharmacophore models to include the conformational flexibility of the protein-inhibitor complex. Three cluster representative structures of the highly populated clusters were selected for pharmacophore model generation. The three pharmacophore models were subsequently utilized for screening databases to retrieve molecules containing the features that complement the allosteric site. The retrieved hits were filtered based on certain drug-like properties and molecular docking simulations were performed in two different conformations of protein. Thus, performing MD simulation with α7 to investigate the changes at the allosteric site, then developing receptor based pharmacophore models and finally docking the retrieved hits into two distinct conformations will be a reliable methodology in identifying PTP1B allosteric inhibitors.
Intermediate and advanced topics in multilevel logistic regression analysis
Merlo, Juan
2017-01-01
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517
ERIC Educational Resources Information Center
King, Wayne M.; Giess, Sally A.; Lombardino, Linda J.
2007-01-01
Background: The marked degree of heterogeneity in persons with developmental dyslexia has motivated the investigation of possible subtypes. Attempts have proceeded both from theoretical models of reading and the application of unsupervised learning (clustering) methods. Previous cluster analyses of data obtained from persons with reading…
NASA Astrophysics Data System (ADS)
Carraro, G.; Villanova, S.; Demarque, P.; Moni Bidin, C.; McSwain, M. V.
2008-05-01
We report on a new, wide-field (20 × 20 arcmin2), multicolour (UBVI), photometric campaign in the area of the nearby old open cluster NGC 2112. At the same time, we provide medium-resolution spectroscopy of 35 (and high-resolution of additional 5) red giant and turn-off stars. This material is analysed with the aim to update the fundamental parameters of this traditionally difficult cluster, which is very sparse and suffers from heavy field star contamination. Among the 40 stars with spectra, we identified 21 bona fide radial velocity members which allow us to put more solid constraints on the cluster's metal abundance, long suggested to be as low as the metallicity of globulars. As indicated earlier by us on a purely photometric basis, the cluster [Fe/H] abundance is slightly supersolar ([Fe/H] = 0.16 +/- 0.03) and close to the Hyades value, as inferred from a detailed abundance analysis of three of the five stars with higher resolution spectra. Abundance ratios are also marginally supersolar. Based on this result, we revise the properties of NGC 2112 using stellar models from the Padova and Yale-Yonsei groups. For this metal abundance, we find that the cluster's age, reddening and distance values are 1.8 Gyr, 0.60 mag and 940 pc, respectively. Both the Yale-Yonsei and Padova models predict the same values for the fundamental parameters within the errors. Overall, NGC 2112 is a typical solar neighbourhood, thin-disc star cluster, sharing the same chemical properties of F-G stars and open clusters close to the Sun. This investigation outlines the importance of a detailed membership analysis in the study of disc star clusters. This paper includes data gathered with the 6.5 Magellan Telescopes, located at Las Campanas Observatory, Chile. The data discussed in this paper will be made available at the WEBDA open cluster data base http://www.univie.ac.at/webda, which is maintained by E. Paunzen and J.-C. Mermilliod. ‡ E-mail: gcarraro@eso.org (GC); sandro.villanova@unipd.it (SV); demarque@astro.yale.edu (PD); mbidin@das.uchile.cl (CMB); mcswain@lehigh.edu(MVM)
Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.
Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G
2012-01-01
Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.
Suveg, Cynthia; Jacob, Marni L; Whitehead, Monica; Jones, Anna; Kingery, Julie Newman
2014-01-01
Social difficulties are commonly associated with anxiety disorders in youth, yet are not well specified in the literature. The aim of this study was to identify patterns of social experiences in clinically anxious children and examine the associations with indices of emotional functioning. A model-based cluster analysis was conducted on parent-, teacher-, and child-reports of social experiences with 64 children, ages 7-12 years (M = 8.86 years, SD = 1.59 years; 60.3% boys; 85.7% Caucasian) with a primary diagnosis of separation anxiety disorder, social phobia, and/or generalized anxiety disorder. Follow-up analyses examined cluster differences on indices of emotional functioning. Findings yielded three clusters of social experiences that were unrelated to diagnosis: (1) Unaware Children (elevated scores on parent- and teacher-reports of social difficulties but relatively low scores on child-reports, n = 12), (2) Average Functioning (relatively average scores across all informants, n = 44), and (3) Victimized and Lonely (elevated child-reports of overt and relational victimization and loneliness and relatively low scores on parent- and teacher-reports of social difficulties, n = 8). Youth in the Unaware Children cluster were rated as more emotionally dysregulated by teachers and had a greater number of diagnoses than youth in the Average Functioning group. In contrast, the Victimized and Lonely group self-reported greater frequency of negative affect and reluctance to share emotional experiences than the Average Functioning cluster. Overall, this study demonstrates that social maladjustment in clinically anxious children can manifest in a variety of ways and assessment should include multiple informants and methods.
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.
2011-01-01
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
Feder, Stephan; Sundermann, Benedikt; Wersching, Heike; Teuber, Anja; Kugel, Harald; Teismann, Henning; Heindel, Walter; Berger, Klaus; Pfleiderer, Bettina
2017-11-01
Combinations of resting-state fMRI and machine-learning techniques are increasingly employed to develop diagnostic models for mental disorders. However, little is known about the neurobiological heterogeneity of depression and diagnostic machine learning has mainly been tested in homogeneous samples. Our main objective was to explore the inherent structure of a diverse unipolar depression sample. The secondary objective was to assess, if such information can improve diagnostic classification. We analyzed data from 360 patients with unipolar depression and 360 non-depressed population controls, who were subdivided into two independent subsets. Cluster analyses (unsupervised learning) of functional connectivity were used to generate hypotheses about potential patient subgroups from the first subset. The relationship of clusters with demographical and clinical measures was assessed. Subsequently, diagnostic classifiers (supervised learning), which incorporated information about these putative depression subgroups, were trained. Exploratory cluster analyses revealed two weakly separable subgroups of depressed patients. These subgroups differed in the average duration of depression and in the proportion of patients with concurrently severe depression and anxiety symptoms. The diagnostic classification models performed at chance level. It remains unresolved, if subgroups represent distinct biological subtypes, variability of continuous clinical variables or in part an overfitting of sparsely structured data. Functional connectivity in unipolar depression is associated with general disease effects. Cluster analyses provide hypotheses about potential depression subtypes. Diagnostic models did not benefit from this additional information regarding heterogeneity. Copyright © 2017 Elsevier B.V. All rights reserved.
Kavitha, Muthu Subash; Asano, Akira; Taguchi, Akira; Heo, Min-Suk
2013-09-01
To prevent low bone mineral density (BMD), that is, osteoporosis, in postmenopausal women, it is essential to diagnose osteoporosis more precisely. This study presented an automatic approach utilizing a histogram-based automatic clustering (HAC) algorithm with a support vector machine (SVM) to analyse dental panoramic radiographs (DPRs) and thus improve diagnostic accuracy by identifying postmenopausal women with low BMD or osteoporosis. We integrated our newly-proposed histogram-based automatic clustering (HAC) algorithm with our previously-designed computer-aided diagnosis system. The extracted moment-based features (mean, variance, skewness, and kurtosis) of the mandibular cortical width for the radial basis function (RBF) SVM classifier were employed. We also compared the diagnostic efficacy of the SVM model with the back propagation (BP) neural network model. In this study, DPRs and BMD measurements of 100 postmenopausal women patients (aged >50 years), with no previous record of osteoporosis, were randomly selected for inclusion. The accuracy, sensitivity, and specificity of the BMD measurements using our HAC-SVM model to identify women with low BMD were 93.0% (88.0%-98.0%), 95.8% (91.9%-99.7%) and 86.6% (79.9%-93.3%), respectively, at the lumbar spine; and 89.0% (82.9%-95.1%), 96.0% (92.2%-99.8%) and 84.0% (76.8%-91.2%), respectively, at the femoral neck. Our experimental results predict that the proposed HAC-SVM model combination applied on DPRs could be useful to assist dentists in early diagnosis and help to reduce the morbidity and mortality associated with low BMD and osteoporosis.
Group sequential designs for stepped-wedge cluster randomised trials
Grayling, Michael J; Wason, James MS; Mander, Adrian P
2017-01-01
Background/Aims: The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Methods: Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. Results: We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial’s type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. Conclusion: The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial. PMID:28653550
Group sequential designs for stepped-wedge cluster randomised trials.
Grayling, Michael J; Wason, James Ms; Mander, Adrian P
2017-10-01
The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial's type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial.
NASA Astrophysics Data System (ADS)
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Jacquez, Geoffrey M; Shi, Chen; Meliker, Jaymie R
2015-01-01
In case control studies disease risk not explained by the significant risk factors is the unexplained risk. Considering unexplained risk for specific populations, places and times can reveal the signature of unidentified risk factors and risk factors not fully accounted for in the case-control study. This potentially can lead to new hypotheses regarding disease causation. Global, local and focused Q-statistics are applied to data from a population-based case-control study of 11 southeast Michigan counties. Analyses were conducted using both year- and age-based measures of time. The analyses were adjusted for arsenic exposure, education, smoking, family history of bladder cancer, occupational exposure to bladder cancer carcinogens, age, gender, and race. Significant global clustering of cases was not found. Such a finding would indicate large-scale clustering of cases relative to controls through time. However, highly significant local clusters were found in Ingham County near Lansing, in Oakland County, and in the City of Jackson, Michigan. The Jackson City cluster was observed in working-ages and is thus consistent with occupational causes. The Ingham County cluster persists over time, suggesting a broad-based geographically defined exposure. Focused clusters were found for 20 industrial sites engaged in manufacturing activities associated with known or suspected bladder cancer carcinogens. Set-based tests that adjusted for multiple testing were not significant, although local clusters persisted through time and temporal trends in probability of local tests were observed. Q analyses provide a powerful tool for unpacking unexplained disease risk from case-control studies. This is particularly useful when the effect of risk factors varies spatially, through time, or through both space and time. For bladder cancer in Michigan, the next step is to investigate causal hypotheses that may explain the excess bladder cancer risk localized to areas of Oakland and Ingham counties, and to the City of Jackson.
Hebels, Dennie G A J; Rasche, Axel; Herwig, Ralf; van Westen, Gerard J P; Jennen, Danyel G J; Kleinjans, Jos C S
2016-01-01
When evaluating compound similarity, addressing multiple sources of information to reach conclusions about common pharmaceutical and/or toxicological mechanisms of action is a crucial strategy. In this chapter, we describe a systems biology approach that incorporates analyses of hepatotoxicant data for 33 compounds from three different sources: a chemical structure similarity analysis based on the 3D Tanimoto coefficient, a chemical structure-based protein target prediction analysis, and a cross-study/cross-platform meta-analysis of in vitro and in vivo human and rat transcriptomics data derived from public resources (i.e., the diXa data warehouse). Hierarchical clustering of the outcome scores of the separate analyses did not result in a satisfactory grouping of compounds considering their known toxic mechanism as described in literature. However, a combined analysis of multiple data types may hypothetically compensate for missing or unreliable information in any of the single data types. We therefore performed an integrated clustering analysis of all three data sets using the R-based tool iClusterPlus. This indeed improved the grouping results. The compound clusters that were formed by means of iClusterPlus represent groups that show similar gene expression while simultaneously integrating a similarity in structure and protein targets, which corresponds much better with the known mechanism of action of these toxicants. Using an integrative systems biology approach may thus overcome the limitations of the separate analyses when grouping liver toxicants sharing a similar mechanism of toxicity.
Adamczak, Rafal; Meller, Jarek
2016-12-28
Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust . uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs.
Biomarker clusters are differentially associated with longitudinal cognitive decline in late midlife
Racine, Annie M.; Koscik, Rebecca L.; Berman, Sara E.; Nicholas, Christopher R.; Clark, Lindsay R.; Okonkwo, Ozioma C.; Rowley, Howard A.; Asthana, Sanjay; Bendlin, Barbara B.; Blennow, Kaj; Zetterberg, Henrik; Gleason, Carey E.; Carlsson, Cynthia M.
2016-01-01
The ability to detect preclinical Alzheimer’s disease is of great importance, as this stage of the Alzheimer’s continuum is believed to provide a key window for intervention and prevention. As Alzheimer’s disease is characterized by multiple pathological changes, a biomarker panel reflecting co-occurring pathology will likely be most useful for early detection. Towards this end, 175 late middle-aged participants (mean age 55.9 ± 5.7 years at first cognitive assessment, 70% female) were recruited from two longitudinally followed cohorts to undergo magnetic resonance imaging and lumbar puncture. Cluster analysis was used to group individuals based on biomarkers of amyloid pathology (cerebrospinal fluid amyloid-β42/amyloid-β40 assay levels), magnetic resonance imaging-derived measures of neurodegeneration/atrophy (cerebrospinal fluid-to-brain volume ratio, and hippocampal volume), neurofibrillary tangles (cerebrospinal fluid phosphorylated tau181 assay levels), and a brain-based marker of vascular risk (total white matter hyperintensity lesion volume). Four biomarker clusters emerged consistent with preclinical features of (i) Alzheimer’s disease; (ii) mixed Alzheimer’s disease and vascular aetiology; (iii) suspected non-Alzheimer’s disease aetiology; and (iv) healthy ageing. Cognitive decline was then analysed between clusters using longitudinal assessments of episodic memory, semantic memory, executive function, and global cognitive function with linear mixed effects modelling. Cluster 1 exhibited a higher intercept and greater rates of decline on tests of episodic memory. Cluster 2 had a lower intercept on a test of semantic memory and both Cluster 2 and Cluster 3 had steeper rates of decline on a test of global cognition. Additional analyses on Cluster 3, which had the smallest hippocampal volume, suggest that its biomarker profile is more likely due to hippocampal vulnerability and not to detectable specific volume loss exceeding the rate of normal ageing. Our results demonstrate that pathology, as indicated by biomarkers, in a preclinical timeframe is related to patterns of longitudinal cognitive decline. Such biomarker patterns may be useful for identifying at-risk populations to recruit for clinical trials. PMID:27324877
Racine, Annie M; Koscik, Rebecca L; Berman, Sara E; Nicholas, Christopher R; Clark, Lindsay R; Okonkwo, Ozioma C; Rowley, Howard A; Asthana, Sanjay; Bendlin, Barbara B; Blennow, Kaj; Zetterberg, Henrik; Gleason, Carey E; Carlsson, Cynthia M; Johnson, Sterling C
2016-08-01
The ability to detect preclinical Alzheimer's disease is of great importance, as this stage of the Alzheimer's continuum is believed to provide a key window for intervention and prevention. As Alzheimer's disease is characterized by multiple pathological changes, a biomarker panel reflecting co-occurring pathology will likely be most useful for early detection. Towards this end, 175 late middle-aged participants (mean age 55.9 ± 5.7 years at first cognitive assessment, 70% female) were recruited from two longitudinally followed cohorts to undergo magnetic resonance imaging and lumbar puncture. Cluster analysis was used to group individuals based on biomarkers of amyloid pathology (cerebrospinal fluid amyloid-β42/amyloid-β40 assay levels), magnetic resonance imaging-derived measures of neurodegeneration/atrophy (cerebrospinal fluid-to-brain volume ratio, and hippocampal volume), neurofibrillary tangles (cerebrospinal fluid phosphorylated tau181 assay levels), and a brain-based marker of vascular risk (total white matter hyperintensity lesion volume). Four biomarker clusters emerged consistent with preclinical features of (i) Alzheimer's disease; (ii) mixed Alzheimer's disease and vascular aetiology; (iii) suspected non-Alzheimer's disease aetiology; and (iv) healthy ageing. Cognitive decline was then analysed between clusters using longitudinal assessments of episodic memory, semantic memory, executive function, and global cognitive function with linear mixed effects modelling. Cluster 1 exhibited a higher intercept and greater rates of decline on tests of episodic memory. Cluster 2 had a lower intercept on a test of semantic memory and both Cluster 2 and Cluster 3 had steeper rates of decline on a test of global cognition. Additional analyses on Cluster 3, which had the smallest hippocampal volume, suggest that its biomarker profile is more likely due to hippocampal vulnerability and not to detectable specific volume loss exceeding the rate of normal ageing. Our results demonstrate that pathology, as indicated by biomarkers, in a preclinical timeframe is related to patterns of longitudinal cognitive decline. Such biomarker patterns may be useful for identifying at-risk populations to recruit for clinical trials. © The Author (2016). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Double Cluster Heads Model for Secure and Accurate Data Fusion in Wireless Sensor Networks
Fu, Jun-Song; Liu, Yun
2015-01-01
Secure and accurate data fusion is an important issue in wireless sensor networks (WSNs) and has been extensively researched in the literature. In this paper, by combining clustering techniques, reputation and trust systems, and data fusion algorithms, we propose a novel cluster-based data fusion model called Double Cluster Heads Model (DCHM) for secure and accurate data fusion in WSNs. Different from traditional clustering models in WSNs, two cluster heads are selected after clustering for each cluster based on the reputation and trust system and they perform data fusion independently of each other. Then, the results are sent to the base station where the dissimilarity coefficient is computed. If the dissimilarity coefficient of the two data fusion results exceeds the threshold preset by the users, the cluster heads will be added to blacklist, and the cluster heads must be reelected by the sensor nodes in a cluster. Meanwhile, feedback is sent from the base station to the reputation and trust system, which can help us to identify and delete the compromised sensor nodes in time. Through a series of extensive simulations, we found that the DCHM performed very well in data fusion security and accuracy. PMID:25608211
Snyder, Frank; Flay, Brian; Vuchinich, Samuel; Acock, Alan; Washburn, Isaac; Beets, Michael; Li, Kin-Kit
2010-01-01
This paper reports the effects of a comprehensive elementary school-based social-emotional and character education program on school-level achievement, absenteeism, and disciplinary outcomes utilizing a matched-pair, cluster randomized, controlled design. The Positive Action Hawai'i trial included 20 racially/ethnically diverse schools (mean enrollment = 544) and was conducted from the 2002-03 through the 2005-06 academic years. Using school-level archival data, analyses comparing change from baseline (2002) to one-year post trial (2007) revealed that intervention schools scored 9.8% better on the TerraNova (2 nd ed.) test for reading and 8.8% on math; 20.7% better in Hawai'i Content and Performance Standards scores for reading and 51.4% better in math; and that intervention schools reported 15.2% lower absenteeism and fewer suspensions (72.6%) and retentions (72.7%). Overall, effect sizes were moderate to large (range 0.5-1.1) for all of the examined outcomes. Sensitivity analyses using permutation models and random-intercept growth curve models substantiated results. The results provide evidence that a comprehensive school-based program, specifically developed to target student behavior and character, can positively influence school-level achievement, attendance, and disciplinary outcomes concurrently.
Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.
Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban
2017-05-01
Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.
Dumuid, Dorothea; Olds, T; Lewis, L K; Martin-Fernández, J A; Barreira, T; Broyles, S; Chaput, J-P; Fogelholm, M; Hu, G; Kuriyan, R; Kurpad, A; Lambert, E V; Maia, J; Matsudo, V; Onywera, V O; Sarmiento, O L; Standage, M; Tremblay, M S; Tudor-Locke, C; Zhao, P; Katzmarzyk, P; Gillison, F; Maher, C
2018-02-01
The relationship between children's adiposity and lifestyle behaviour patterns is an area of growing interest. The objectives of this study are to identify clusters of children based on lifestyle behaviours and compare children's adiposity among clusters. Cross-sectional data from the International Study of Childhood Obesity, Lifestyle and the Environment were used. the participants were children (9-11 years) from 12 nations (n = 5710). 24-h accelerometry and self-reported diet and screen time were clustering input variables. Objectively measured adiposity indicators were waist-to-height ratio, percent body fat and body mass index z-scores. sex-stratified analyses were performed on the global sample and repeated on a site-wise basis. Cluster analysis (using isometric log ratios for compositional data) was used to identify common lifestyle behaviour patterns. Site representation and adiposity were compared across clusters using linear models. Four clusters emerged: (1) Junk Food Screenies, (2) Actives, (3) Sitters and (4) All-Rounders. Countries were represented differently among clusters. Chinese children were over-represented in Sitters and Colombian children in Actives. Adiposity varied across clusters, being highest in Sitters and lowest in Actives. Children from different sites clustered into groups of similar lifestyle behaviours. Cluster membership was linked with differing adiposity. Findings support the implementation of activity interventions in all countries, targeting both physical activity and sedentary time. © 2016 World Obesity Federation.
NASA Astrophysics Data System (ADS)
Fezzani, Ridha; Berger, Laurent
2018-06-01
An automated signal-based method was developed in order to analyse the seafloor backscatter data logged by calibrated multibeam echosounder. The processing consists first in the clustering of each survey sub-area into a small number of homogeneous sediment types, based on the backscatter average level at one or several incidence angles. Second, it uses their local average angular response to extract discriminant descriptors, obtained by fitting the field data to the Generic Seafloor Acoustic Backscatter parametric model. Third, the descriptors are used for seafloor type classification. The method was tested on the multi-year data recorded by a calibrated 90-kHz Simrad ME70 multibeam sonar operated in the Bay of Biscay, France and Celtic Sea, Ireland. It was applied for seafloor-type classification into 12 classes, to a dataset of 158 spots surveyed for demersal and benthic fauna study and monitoring. Qualitative analyses and classified clusters using extracted parameters show a good discriminatory potential, indicating the robustness of this approach.
Sun, Peng; Guo, Jiong; Baumbach, Jan
2012-07-17
The explosion of biological data has largely influenced the focus of today’s biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.
Sun, Peng; Guo, Jiong; Baumbach, Jan
2012-06-01
The explosion of biological data has largely influenced the focus of today's biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.
Application of a clustering-remote sensing method in analyzing security patterns
NASA Astrophysics Data System (ADS)
López-Caloca, Alejandra; Martínez-Viveros, Elvia; Chapela-Castañares, José Ignacio
2009-04-01
In Mexican academic and government circles, research on criminal spatial behavior has been neglected. Only recently has there been an interest in criminal data geo-reference. However, more sophisticated spatial analyses models are needed to disclose spatial patterns of crime and pinpoint their changes overtime. The main use of these models lies in supporting policy making and strategic intelligence. In this paper we present a model for finding patterns associated with crime. It is based on a fuzzy logic algorithm which finds the best fit within cluster numbers and shapes of groupings. We describe the methodology for building the model and its validation. The model was applied to annual data for types of felonies from 2005 to 2006 in the Mexican city of Hermosillo. The results are visualized as a standard deviational ellipse computed for the points identified to be a "cluster". These areas indicate a high to low demand for public security, and they were cross-related to urban structure analyzed by SPOT images and statistical data such as population, poverty levels, urbanization, and available services. The fusion of the model results with other geospatial data allows detecting obstacles and opportunities for crime commission in specific high risk zones and guide police activities and criminal investigations.
Comparison of organs' shapes with geometric and Zernike 3D moments.
Broggio, D; Moignier, A; Ben Brahim, K; Gardumi, A; Grandgirard, N; Pierrat, N; Chea, M; Derreumaux, S; Desbrée, A; Boisserie, G; Aubert, B; Mazeron, J-J; Franck, D
2013-09-01
The morphological similarity of organs is studied with feature vectors based on geometric and Zernike 3D moments. It is particularly investigated if outliers and average models can be identified. For this purpose, the relative proximity to the mean feature vector is defined, principal coordinate and clustering analyses are also performed. To study the consistency and usefulness of this approach, 17 livers and 76 hearts voxel models from several sources are considered. In the liver case, models with similar morphological feature are identified. For the limited amount of studied cases, the liver of the ICRP male voxel model is identified as a better surrogate than the female one. For hearts, the clustering analysis shows that three heart shapes represent about 80% of the morphological variations. The relative proximity and clustering analysis rather consistently identify outliers and average models. For the two cases, identification of outliers and surrogate of average models is rather robust. However, deeper classification of morphological feature is subject to caution and can only be performed after cross analysis of at least two kinds of feature vectors. Finally, the Zernike moments contain all the information needed to re-construct the studied objects and thus appear as a promising tool to derive statistical organ shapes. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Jacob, Benjamin G; Griffith, Daniel A; Muturi, Ephantus J; Caamano, Erick X; Githure, John I; Novak, Robert J
2009-01-01
Background Autoregressive regression coefficients for Anopheles arabiensis aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of An. arabiensis aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of An. arabiensis aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled Anopheles aquatic habitat covariates. A test for diagnostic checking error residuals in an An. arabiensis aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature. Methods Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3®. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix. Results By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with An. arabiensis aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled An. arabiensis aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat. Conclusion An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific An. arabiensis aquatic habitats based on larval/pupal productivity. PMID:19772590
Barker, Daniel; D'Este, Catherine; Campbell, Michael J; McElduff, Patrick
2017-03-09
Stepped wedge cluster randomised trials frequently involve a relatively small number of clusters. The most common frameworks used to analyse data from these types of trials are generalised estimating equations and generalised linear mixed models. A topic of much research into these methods has been their application to cluster randomised trial data and, in particular, the number of clusters required to make reasonable inferences about the intervention effect. However, for stepped wedge trials, which have been claimed by many researchers to have a statistical power advantage over the parallel cluster randomised trial, the minimum number of clusters required has not been investigated. We conducted a simulation study where we considered the most commonly used methods suggested in the literature to analyse cross-sectional stepped wedge cluster randomised trial data. We compared the per cent bias, the type I error rate and power of these methods in a stepped wedge trial setting with a binary outcome, where there are few clusters available and when the appropriate adjustment for a time trend is made, which by design may be confounding the intervention effect. We found that the generalised linear mixed modelling approach is the most consistent when few clusters are available. We also found that none of the common analysis methods for stepped wedge trials were both unbiased and maintained a 5% type I error rate when there were only three clusters. Of the commonly used analysis approaches, we recommend the generalised linear mixed model for small stepped wedge trials with binary outcomes. We also suggest that in a stepped wedge design with three steps, at least two clusters be randomised at each step, to ensure that the intervention effect estimator maintains the nominal 5% significance level and is also reasonably unbiased.
Inference from clustering with application to gene-expression microarrays.
Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M
2002-01-01
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei
2015-01-01
The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.
Finding gene clusters for a replicated time course study
2014-01-01
Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656
Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials
Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.
2012-01-01
Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Network-based spatial clustering technique for exploring features in regional industry
NASA Astrophysics Data System (ADS)
Chou, Tien-Yin; Huang, Pi-Hui; Yang, Lung-Shih; Lin, Wen-Tzu
2008-10-01
In the past researches, industrial cluster mainly focused on single or particular industry and less on spatial industrial structure and mutual relations. Industrial cluster could generate three kinds of spillover effects, including knowledge, labor market pooling, and input sharing. In addition, industrial cluster indeed benefits industry development. To fully control the status and characteristics of district industrial cluster can facilitate to improve the competitive ascendancy of district industry. The related researches on industrial spatial cluster were of great significance for setting up industrial policies and promoting district economic development. In this study, an improved model, GeoSOM, that combines DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and SOM (Self-Organizing Map) was developed for analyzing industrial cluster. Different from former distance-based algorithm for industrial cluster, the proposed GeoSOM model can calculate spatial characteristics between firms based on DBSCAN algorithm and evaluate the similarity between firms based on SOM clustering analysis. The demonstrative data sets, the manufacturers around Taichung County in Taiwan, were analyzed for verifying the practicability of the proposed model. The analyzed results indicate that GeoSOM is suitable for evaluating spatial industrial cluster.
Luo, Yi; Zhang, Tao; Li, Xiao-song
2016-05-01
To explore the application of fuzzy time series model based on fuzzy c-means clustering in forecasting monthly incidence of Hepatitis E in mainland China. Apredictive model (fuzzy time series method based on fuzzy c-means clustering) was developed using Hepatitis E incidence data in mainland China between January 2004 and July 2014. The incidence datafrom August 2014 to November 2014 were used to test the fitness of the predictive model. The forecasting results were compared with those resulted from traditional fuzzy time series models. The fuzzy time series model based on fuzzy c-means clustering had 0.001 1 mean squared error (MSE) of fitting and 6.977 5 x 10⁻⁴ MSE of forecasting, compared with 0.0017 and 0.0014 from the traditional forecasting model. The results indicate that the fuzzy time series model based on fuzzy c-means clustering has a better performance in forecasting incidence of Hepatitis E.
Welcome to pandoraviruses at the ‘Fourth TRUC’ club
Sharma, Vikas; Colson, Philippe; Chabrol, Olivier; Scheid, Patrick; Pontarotti, Pierre; Raoult, Didier
2015-01-01
Nucleocytoplasmic large DNA viruses, or representatives of the proposed order Megavirales, belong to families of giant viruses that infect a broad range of eukaryotic hosts. Megaviruses have been previously described to comprise a fourth monophylogenetic TRUC (things resisting uncompleted classification) together with cellular domains in the universal tree of life. Recently described pandoraviruses have large (1.9–2.5 MB) and highly divergent genomes. In the present study, we updated the classification of pandoraviruses and other reported giant viruses. Phylogenetic trees were constructed based on six informational genes. Hierarchical clustering was performed based on a set of informational genes from Megavirales members and cellular organisms. Homologous sequences were selected from cellular organisms using TimeTree software, comprising comprehensive, and representative sets of members from Bacteria, Archaea, and Eukarya. Phylogenetic analyses based on three conserved core genes clustered pandoraviruses with phycodnaviruses, exhibiting their close relatedness. Additionally, hierarchical clustering analyses based on informational genes grouped pandoraviruses with Megavirales members as a super group distinct from cellular organisms. Thus, the analyses based on core conserved genes revealed that pandoraviruses are new genuine members of the ‘Fourth TRUC’ club, encompassing distinct life forms compared with cellular organisms. PMID:26042093
Welcome to pandoraviruses at the 'Fourth TRUC' club.
Sharma, Vikas; Colson, Philippe; Chabrol, Olivier; Scheid, Patrick; Pontarotti, Pierre; Raoult, Didier
2015-01-01
Nucleocytoplasmic large DNA viruses, or representatives of the proposed order Megavirales, belong to families of giant viruses that infect a broad range of eukaryotic hosts. Megaviruses have been previously described to comprise a fourth monophylogenetic TRUC (things resisting uncompleted classification) together with cellular domains in the universal tree of life. Recently described pandoraviruses have large (1.9-2.5 MB) and highly divergent genomes. In the present study, we updated the classification of pandoraviruses and other reported giant viruses. Phylogenetic trees were constructed based on six informational genes. Hierarchical clustering was performed based on a set of informational genes from Megavirales members and cellular organisms. Homologous sequences were selected from cellular organisms using TimeTree software, comprising comprehensive, and representative sets of members from Bacteria, Archaea, and Eukarya. Phylogenetic analyses based on three conserved core genes clustered pandoraviruses with phycodnaviruses, exhibiting their close relatedness. Additionally, hierarchical clustering analyses based on informational genes grouped pandoraviruses with Megavirales members as a super group distinct from cellular organisms. Thus, the analyses based on core conserved genes revealed that pandoraviruses are new genuine members of the 'Fourth TRUC' club, encompassing distinct life forms compared with cellular organisms.
Parameters of oscillation generation regions in open star cluster models
NASA Astrophysics Data System (ADS)
Danilov, V. M.; Putkov, S. I.
2017-07-01
We determine the masses and radii of central regions of open star cluster (OCL) models with small or zero entropy production and estimate the masses of oscillation generation regions in clustermodels based on the data of the phase-space coordinates of stars. The radii of such regions are close to the core radii of the OCL models. We develop a new method for estimating the total OCL masses based on the cluster core mass, the cluster and cluster core radii, and radial distribution of stars. This method yields estimates of dynamical masses of Pleiades, Praesepe, and M67, which agree well with the estimates of the total masses of the corresponding clusters based on proper motions and spectroscopic data for cluster stars.We construct the spectra and dispersion curves of the oscillations of the field of azimuthal velocities v φ in OCL models. Weak, low-amplitude unstable oscillations of v φ develop in cluster models near the cluster core boundary, and weak damped oscillations of v φ often develop at frequencies close to the frequencies of more powerful oscillations, which may reduce the non-stationarity degree in OCL models. We determine the number and parameters of such oscillations near the cores boundaries of cluster models. Such oscillations points to the possible role that gradient instability near the core of cluster models plays in the decrease of the mass of the oscillation generation regions and production of entropy in the cores of OCL models with massive extended cores.
Clustering of change patterns using Fourier coefficients.
Kim, Jaehee; Kim, Haseong
2008-01-15
To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.
Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range
NASA Astrophysics Data System (ADS)
Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.
2014-01-01
Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 < z < 0.9), all with HST imaging data available. This survey has two main objectives: to constrain dark energy (DE) using weak lensing tomography on galaxy clusters and to build a database (deep multi-band imaging allowing photometric redshift estimates, spectroscopic data, X-ray data) of rich distant clusters to study their properties. Aims: We analyse the structures of all the clusters in the DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster pericentre approach and are relatively recent infalls. We also find hints of a decreasing X-ray gas density profile core radius with redshift. Conclusions: The percentage of mass included in substructures was found to be roughly constant with redshift values of 5-15%, in agreement both with the general CDM framework and with the results of numerical simulations. Galaxies in substructures show the same general behaviour as regular cluster galaxies; however, in substructures, there is a deficiency of both late type and old stellar population galaxies. Late type galaxies with recent bursts of star formation seem to be missing in the substructures close to the bottom of the host cluster potential well. However, our sample would need to be increased to allow a more robust analysis. Tables 1, 2, 4 and Appendices A-C are available in electronic form at http://www.aanda.org
Spatial memory in foraging games.
Kerster, Bryan E; Rhodes, Theo; Kello, Christopher T
2016-03-01
Foraging and foraging-like processes are found in spatial navigation, memory, visual search, and many other search functions in human cognition and behavior. Foraging is commonly theorized using either random or correlated movements based on Lévy walks, or a series of decisions to remain or leave proximal areas known as "patches". Neither class of model makes use of spatial memory, but search performance may be enhanced when information about searched and unsearched locations is encoded. A video game was developed to test the role of human spatial memory in a canonical foraging task. Analyses of search trajectories from over 2000 human players yielded evidence that foraging movements were inherently clustered, and that clustering was facilitated by spatial memory cues and influenced by memory for spatial locations of targets found. A simple foraging model is presented in which spatial memory is used to integrate aspects of Lévy-based and patch-based foraging theories to perform a kind of area-restricted search, and thereby enhance performance as search unfolds. Using only two free parameters, the model accounts for a variety of findings that individually support competing theories, but together they argue for the integration of spatial memory into theories of foraging. Copyright © 2015 Elsevier B.V. All rights reserved.
Electron attachment to molecules in a cluster environment: suppression and enhancement effects
NASA Astrophysics Data System (ADS)
Fabrikant, Ilya I.
2018-05-01
Cluster environments can strongly influence dissociative electron attachment (DEA) processes. These effects are important in many applications, particularly for surface chemistry, radiation damage, and atmospheric physics. We review several mechanisms for DEA suppression and enhancement due to cluster environments, particularly due to microhydration. Long-range electron-molecule and electron-cluster interactions play often a significant role in these effects and can be analysed by using theoretical models. Nevertheless many observations remain unexplained due to complexity of the physics and chemistry of interaction of DEA fragments with the cluster environment.
Cluster-based analysis of multi-model climate ensembles
NASA Astrophysics Data System (ADS)
Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.
2018-06-01
Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and useful framework in which to assess and visualise model spread, offering insight into geographical areas of agreement among models and a measure of diversity across an ensemble. Finally, we discuss caveats of the clustering techniques and note that while we have focused on tropospheric ozone, the principles underlying the cluster-based MMMs are applicable to other prognostic variables from climate models.
A hierarchical model for clustering m(6)A methylation peaks in MeRIP-seq data.
Cui, Xiaodong; Meng, Jia; Zhang, Shaowu; Rao, Manjeet K; Chen, Yidong; Huang, Yufei
2016-08-22
The recent advent of the state-of-art high throughput sequencing technology, known as Methylated RNA Immunoprecipitation combined with RNA sequencing (MeRIP-seq) revolutionizes the area of mRNA epigenetics and enables the biologists and biomedical researchers to have a global view of N (6)-Methyladenosine (m(6)A) on transcriptome. Yet there is a significant need for new computation tools for processing and analysing MeRIP-Seq data to gain a further insight into the function and m(6)A mRNA methylation. We developed a novel algorithm and an open source R package ( http://compgenomics.utsa.edu/metcluster ) for uncovering the potential types of m(6)A methylation by clustering the degree of m(6)A methylation peaks in MeRIP-Seq data. This algorithm utilizes a hierarchical graphical model to model the reads account variance and the underlying clusters of the methylation peaks. Rigorous statistical inference is performed to estimate the model parameter and detect the number of clusters. MeTCluster is evaluated on both simulated and real MeRIP-seq datasets and the results demonstrate its high accuracy in characterizing the clusters of methylation peaks. Our algorithm was applied to two different sets of real MeRIP-seq datasets and reveals a novel pattern that methylation peaks with less peak enrichment tend to clustered in the 5' end of both in both mRNAs and lncRNAs, whereas those with higher peak enrichment are more likely to be distributed in CDS and towards the 3'end of mRNAs and lncRNAs. This result might suggest that m(6)A's functions could be location specific. In this paper, a novel hierarchical graphical model based algorithm was developed for clustering the enrichment of methylation peaks in MeRIP-seq data. MeTCluster is written in R and is publicly available.
a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis
NASA Astrophysics Data System (ADS)
Huang, W.; Li, S.; Xu, S.
2016-06-01
How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the approximate 55% spatiotemporal clusters distributed in different locations can be eventually grouped as the same type of clusters with consideration of semantic aspect.
NASA Astrophysics Data System (ADS)
Shen, Fei; Chen, Chao; Yan, Ruqiang
2017-05-01
Classical bearing fault diagnosis methods, being designed according to one specific task, always pay attention to the effectiveness of extracted features and the final diagnostic performance. However, most of these approaches suffer from inefficiency when multiple tasks exist, especially in a real-time diagnostic scenario. A fault diagnosis method based on Non-negative Matrix Factorization (NMF) and Co-clustering strategy is proposed to overcome this limitation. Firstly, some high-dimensional matrixes are constructed using the Short-Time Fourier Transform (STFT) features, where the dimension of each matrix equals to the number of target tasks. Then, the NMF algorithm is carried out to obtain different components in each dimension direction through optimized matching, such as Euclidean distance and divergence distance. Finally, a Co-clustering technique based on information entropy is utilized to realize classification of each component. To verity the effectiveness of the proposed approach, a series of bearing data sets were analysed in this research. The tests indicated that although the diagnostic performance of single task is comparable to traditional clustering methods such as K-mean algorithm and Guassian Mixture Model, the accuracy and computational efficiency in multi-tasks fault diagnosis are improved.
A Network-Based Algorithm for Clustering Multivariate Repeated Measures Data
NASA Technical Reports Server (NTRS)
Koslovsky, Matthew; Arellano, John; Schaefer, Caroline; Feiveson, Alan; Young, Millennia; Lee, Stuart
2017-01-01
The National Aeronautics and Space Administration (NASA) Astronaut Corps is a unique occupational cohort for which vast amounts of measures data have been collected repeatedly in research or operational studies pre-, in-, and post-flight, as well as during multiple clinical care visits. In exploratory analyses aimed at generating hypotheses regarding physiological changes associated with spaceflight exposure, such as impaired vision, it is of interest to identify anomalies and trends across these expansive datasets. Multivariate clustering algorithms for repeated measures data may help parse the data to identify homogeneous groups of astronauts that have higher risks for a particular physiological change. However, available clustering methods may not be able to accommodate the complex data structures found in NASA data, since the methods often rely on strict model assumptions, require equally-spaced and balanced assessment times, cannot accommodate missing data or differing time scales across variables, and cannot process continuous and discrete data simultaneously. To fill this gap, we propose a network-based, multivariate clustering algorithm for repeated measures data that can be tailored to fit various research settings. Using simulated data, we demonstrate how our method can be used to identify patterns in complex data structures found in practice.
Hiemstra, Marieke; Engels, Rutger C M E; van Schayck, Onno C P; Otten, Roy
2016-01-01
The home-based smoking prevention programme 'Smoke-free Kids' did not have an effect on primary outcome smoking initiation. A possible explanation may be that the programme has a delayed effect. The aim of this study was to evaluate the effects on the development of important precursors of smoking: smoking-related cognitions. We used a cluster randomised controlled trial in 9- to 11-year-old children and their mothers. The intervention condition received five activity modules, including a communication sheet for mothers, by mail at four-week intervals. The control condition received a fact-based programme. Secondary outcomes were attitudes, self-efficacy and social norms. Latent growth curves analyses were used to calculate the development of cognitions over time. Subsequently, path modelling was used to estimate the programme effects on the initial level and growth of each cognition. Analyses were performed on 1398 never-smoking children at baseline. Results showed that for children in the intervention condition, perceived maternal norms increased less strongly as compared to the control condition (β = -.10, p = .03). No effects were found for the other cognitions. Based on the limited effects, we do not assume that the programme will have a delayed effect on smoking behaviour later during adolescence.
Comparison of the linear bias models in the light of the Dark Energy Survey
NASA Astrophysics Data System (ADS)
Papageorgiou, A.; Basilakos, S.; Plionis, M.
2018-05-01
The evolution of the linear and scale independent bias, based on the most popular dark matter bias models within the Λ cold dark matter (ΛCDM) cosmology, is confronted to that of the Dark Energy Survey (DES) luminous red galaxies (LRGs). Applying a χ2 minimization procedure between models and data, we find that all the considered linear bias models reproduce well the LRG bias data. The differences among the bias models are absorbed in the predicted mass of the dark-matter halo in which LRGs live and which ranges between ˜6 × 1012 and 1.4 × 1013 h-1 M⊙, for the different bias models. Similar results, reaching however a maximum value of ˜2 × 1013 h-1 M⊙, are found by confronting the SDSS (2SLAQ) Large Red Galaxies clustering with theoretical clustering models, which also include the evolution of bias. This later analysis also provides a value of Ωm = 0.30 ± 0.01, which is in excellent agreement with recent joint analyses of different cosmological probes and the reanalysis of the Planck data.
Density-based cluster algorithms for the identification of core sets
NASA Astrophysics Data System (ADS)
Lemke, Oliver; Keller, Bettina G.
2016-10-01
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
NASA Astrophysics Data System (ADS)
Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Bartolo, N.; Battaner, E.; Benabed, K.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Burigana, C.; Butler, R. C.; Calabrese, E.; Catalano, A.; Chamballu, A.; Chiang, H. C.; Christensen, P. R.; Churazov, E.; Clements, D. L.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Dickinson, C.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Finelli, F.; Flores-Cacho, I.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Galeotta, S.; Galli, S.; Ganga, K.; Génova-Santos, R. T.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Harrison, D. L.; Helou, G.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lamarre, J.-M.; Langer, M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Levrier, F.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maffei, B.; Maggio, G.; Maino, D.; Mak, D. S. Y.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; Melchiorri, A.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Munshi, D.; Murphy, J. A.; Nati, F.; Natoli, P.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Ponthieu, N.; Pratt, G. W.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Welikala, N.; Yvon, D.; Zacchei, A.; Zonca, A.
2016-09-01
We use Planck data to detect the cross-correlation between the thermal Sunyaev-Zeldovich (tSZ) effect and the infrared emission from the galaxies that make up the the cosmic infrared background (CIB). We first perform a stacking analysis towards Planck-confirmed galaxy clusters. We detect infrared emission produced by dusty galaxies inside these clusters and demonstrate that the infrared emission is about 50% more extended than the tSZ effect. Modelling the emission with a Navarro-Frenk-White profile, we find that the radial profile concentration parameter is c500 = 1.00+0.18-0.15 . This indicates that infrared galaxies in the outskirts of clusters have higher infrared flux than cluster-core galaxies. We also study the cross-correlation between tSZ and CIB anisotropies, following three alternative approaches based on power spectrum analyses: (I) using a catalogue of confirmed clusters detected in Planck data; (II) using an all-sky tSZ map built from Planck frequency maps; and (III) using cross-spectra between Planck frequency maps. With the three different methods, we detect the tSZ-CIB cross-power spectrum at significance levels of (I) 6σ; (II) 3σ; and (III) 4σ. We model the tSZ-CIB cross-correlation signature and compare predictions with the measurements. The amplitude of the cross-correlation relative to the fiducial model is AtSZ-CIB = 1.2 ± 0.3. This result is consistent with predictions for the tSZ-CIB cross-correlation assuming the best-fit cosmological model from Planck 2015 results along with the tSZ and CIB scaling relations.
Rudi, Knut; Kleiberg, Gro H; Heiberg, Ragnhild; Rosnes, Jan T
2007-08-01
The aim of this work was to evaluate restriction fragment melting curve analyses (RFMCA) as a novel approach for rapid classification of bacteria during food production. RFMCA was evaluated for bacteria isolated from sous vide food products, and raw materials used for sous vide production. We identified four major bacterial groups in the material analysed (cluster I-Streptococcus, cluster II-Carnobacterium/Bacillus, cluster III-Staphylococcus and cluster IV-Actinomycetales). The accuracy of RFMCA was evaluated by comparison with 16S rDNA sequencing. The strains satisfying the RFMCA quality filtering criteria (73%, n=57), with both 16S rDNA sequence information and RFMCA data (n=45) gave identical group assignments with the two methods. RFMCA enabled rapid and accurate classification of bacteria that is database compatible. Potential application of RFMCA in the food or pharmaceutical industry will include development of classification models for the bacteria expected in a given product, and then to build an RFMCA database as a part of the product quality control.
Dewey, Daniel; Schuldberg, David; Madathil, Renee
2014-08-01
This study investigated whether specific peritraumatic emotions differentially predict PTSD symptom clusters in individuals who have experienced stressful life events. Hypotheses were developed based on the SPAARS model of PTSD. It was predicted that the peritraumatic emotions of anger, disgust, guilt, and fear would significantly predict re-experiencing and avoidance symptoms, while only fear would predict hyperarousal. Undergraduate students (N = 144) participated in this study by completing a packet of self-report questionnaires. Multiple regression analyses were conducted with PCL-S symptom cluster scores as dependent variables and peritraumatic fear, guilt, anger, shame, and disgust as predictor variables. As hypothesized, peritraumatic anger, guilt, and fear all significantly predicted re-experiencing. However, only fear predicted avoidance, and anger significantly predicted hyperarousal. Results are discussed in relation to the theoretical role of emotions in the etiology of PTSD following the experience of a stressful life event.
A versatile software package for inter-subject correlation based analyses of fMRI.
Kauppi, Jukka-Pekka; Pajula, Juha; Tohka, Jussi
2014-01-01
In the inter-subject correlation (ISC) based analysis of the functional magnetic resonance imaging (fMRI) data, the extent of shared processing across subjects during the experiment is determined by calculating correlation coefficients between the fMRI time series of the subjects in the corresponding brain locations. This implies that ISC can be used to analyze fMRI data without explicitly modeling the stimulus and thus ISC is a potential method to analyze fMRI data acquired under complex naturalistic stimuli. Despite of the suitability of ISC based approach to analyze complex fMRI data, no generic software tools have been made available for this purpose, limiting a widespread use of ISC based analysis techniques among neuroimaging community. In this paper, we present a graphical user interface (GUI) based software package, ISC Toolbox, implemented in Matlab for computing various ISC based analyses. Many advanced computations such as comparison of ISCs between different stimuli, time window ISC, and inter-subject phase synchronization are supported by the toolbox. The analyses are coupled with re-sampling based statistical inference. The ISC based analyses are data and computation intensive and the ISC toolbox is equipped with mechanisms to execute the parallel computations in a cluster environment automatically and with an automatic detection of the cluster environment in use. Currently, SGE-based (Oracle Grid Engine, Son of a Grid Engine, or Open Grid Scheduler) and Slurm environments are supported. In this paper, we present a detailed account on the methods behind the ISC Toolbox, the implementation of the toolbox and demonstrate the possible use of the toolbox by summarizing selected example applications. We also report the computation time experiments both using a single desktop computer and two grid environments demonstrating that parallelization effectively reduces the computing time. The ISC Toolbox is available in https://code.google.com/p/isc-toolbox/
A versatile software package for inter-subject correlation based analyses of fMRI
Kauppi, Jukka-Pekka; Pajula, Juha; Tohka, Jussi
2014-01-01
In the inter-subject correlation (ISC) based analysis of the functional magnetic resonance imaging (fMRI) data, the extent of shared processing across subjects during the experiment is determined by calculating correlation coefficients between the fMRI time series of the subjects in the corresponding brain locations. This implies that ISC can be used to analyze fMRI data without explicitly modeling the stimulus and thus ISC is a potential method to analyze fMRI data acquired under complex naturalistic stimuli. Despite of the suitability of ISC based approach to analyze complex fMRI data, no generic software tools have been made available for this purpose, limiting a widespread use of ISC based analysis techniques among neuroimaging community. In this paper, we present a graphical user interface (GUI) based software package, ISC Toolbox, implemented in Matlab for computing various ISC based analyses. Many advanced computations such as comparison of ISCs between different stimuli, time window ISC, and inter-subject phase synchronization are supported by the toolbox. The analyses are coupled with re-sampling based statistical inference. The ISC based analyses are data and computation intensive and the ISC toolbox is equipped with mechanisms to execute the parallel computations in a cluster environment automatically and with an automatic detection of the cluster environment in use. Currently, SGE-based (Oracle Grid Engine, Son of a Grid Engine, or Open Grid Scheduler) and Slurm environments are supported. In this paper, we present a detailed account on the methods behind the ISC Toolbox, the implementation of the toolbox and demonstrate the possible use of the toolbox by summarizing selected example applications. We also report the computation time experiments both using a single desktop computer and two grid environments demonstrating that parallelization effectively reduces the computing time. The ISC Toolbox is available in https://code.google.com/p/isc-toolbox/ PMID:24550818
Canonical PSO Based K-Means Clustering Approach for Real Datasets.
Dey, Lopamudra; Chakraborty, Sanjay
2014-01-01
"Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.
NASA Astrophysics Data System (ADS)
Haghighi, Babak; Choi, Jiwoong; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long
2017-11-01
Accurate modeling of small airway diameters in patients with chronic obstructive pulmonary disease (COPD) is a crucial step toward patient-specific CFD simulations of regional airflow and particle transport. We proposed to use computed tomography (CT) imaging-based cluster membership to identify structural characteristics of airways in each cluster and use them to develop cluster-specific airway diameter models. We analyzed 284 COPD smokers with airflow limitation, and 69 healthy controls. We used multiscale imaging-based cluster analysis (MICA) to classify smokers into 4 clusters. With representative cluster patients and healthy controls, we performed multiple regressions to quantify variation of airway diameters by generation as well as by cluster. The cluster 2 and 4 showed more diameter decrease as generation increases than other clusters. The cluster 4 had more rapid decreases of airway diameters in the upper lobes, while cluster 2 in the lower lobes. We then used these regression models to estimate airway diameters in CT unresolved regions to obtain pressure-volume hysteresis curves using a 1D resistance model. These 1D flow solutions can be used to provide the patient-specific boundary conditions for 3D CFD simulations in COPD patients. Support for this study was provided, in part, by NIH Grants U01-HL114494, R01-HL112986 and S10-RR022421.
Pettengill, James B; Moeller, David A
2012-09-01
The origins of hybrid zones between parapatric taxa have been of particular interest for understanding the evolution of reproductive isolation and the geographic context of species divergence. One challenge has been to distinguish between allopatric divergence (followed by secondary contact) versus primary intergradation (parapatric speciation) as alternative divergence histories. Here, we use complementary phylogeographic and population genetic analyses to investigate the recent divergence of two subspecies of Clarkia xantiana and the formation of a hybrid zone within the narrow region of sympatry. We tested alternative phylogeographic models of divergence using approximate Bayesian computation (ABC) and found strong support for a secondary contact model and little support for a model allowing for gene flow throughout the divergence process (i.e. primary intergradation). Two independent methods for inferring the ancestral geography of each subspecies, one based on probabilistic character state reconstructions and the other on palaeo-distribution modelling, also support a model of divergence in allopatry and range expansion leading to secondary contact. The membership of individuals to genetic clusters suggests geographic substructure within each taxon where allopatric and sympatric samples are primarily found in separate clusters. We also observed coincidence and concordance of genetic clines across three types of molecular markers, which suggests that there is a strong barrier to gene flow. Taken together, our results provide evidence for allopatric divergence followed by range expansion leading to secondary contact. The location of refugial populations and the directionality of range expansion are consistent with expectations based on climate change since the last glacial maximum. Our approach also illustrates the utility of combining phylogeographic hypothesis testing with species distribution modelling and fine-scale population genetic analyses for inferring the geography of the divergence process. © 2012 Blackwell Publishing Ltd.
Yiu, Sean; Farewell, Vernon T; Tom, Brian D M
2018-02-01
In psoriatic arthritis, it is important to understand the joint activity (represented by swelling and pain) and damage processes because both are related to severe physical disability. The paper aims to provide a comprehensive investigation into both processes occurring over time, in particular their relationship, by specifying a joint multistate model at the individual hand joint level, which also accounts for many of their important features. As there are multiple hand joints, such an analysis will be based on the use of clustered multistate models. Here we consider an observation level random-effects structure with dynamic covariates and allow for the possibility that a subpopulation of patients is at minimal risk of damage. Such an analysis is found to provide further understanding of the activity-damage relationship beyond that provided by previous analyses. Consideration is also given to the modelling of mean sojourn times and jump probabilities. In particular, a novel model parameterization which allows easily interpretable covariate effects to act on these quantities is proposed.
Study of clusters and hypernuclei production within PHSD+FRIGA model
NASA Astrophysics Data System (ADS)
Kireyeu, Viktar; Le Fèvre, Arnaud; Bratkovskaya, Elena
2017-03-01
We report on the results on the dynamical modelling of cluster formation with the new combined PHSD+FRIGA model at Nuclotron and NICA energies. The FRIGA clusterization algorithm, which can be applied to the transport models, is based on the simulated annealing technique to obtain the most bound configuration of fragments and nucleons. The PHSD+FRIGA model is able to predict isotope yields as well as hypernucleus production. Based on present predictions of the combined model we study the possibility to detect such clusters and hypernuclei in the BM@N and MPD/NICA detectors.
Modelling volatility recurrence intervals in the Chinese commodity futures market
NASA Astrophysics Data System (ADS)
Zhou, Weijie; Wang, Zhengxin; Guo, Haiming
2016-09-01
The law of extreme event occurrence attracts much research. The volatility recurrence intervals of Chinese commodity futures market prices are studied: the results show that the probability distributions of the scaled volatility recurrence intervals have a uniform scaling curve for different thresholds q. So we can deduce the probability distribution of extreme events from normal events. The tail of a scaling curve can be well fitted by a Weibull form, which is significance-tested by KS measures. Both short-term and long-term memories are present in the recurrence intervals with different thresholds q, which denotes that the recurrence intervals can be predicted. In addition, similar to volatility, volatility recurrence intervals also have clustering features. Through Monte Carlo simulation, we artificially synthesise ARMA, GARCH-class sequences similar to the original data, and find out the reason behind the clustering. The larger the parameter d of the FIGARCH model, the stronger the clustering effect is. Finally, we use the Fractionally Integrated Autoregressive Conditional Duration model (FIACD) to analyse the recurrence interval characteristics. The results indicated that the FIACD model may provide a method to analyse volatility recurrence intervals.
DOE Office of Scientific and Technical Information (OSTI.GOV)
FINSTERLE, STEFAN; JUNG, YOOJIN; KOWALSKY, MICHAEL
2016-09-15
iTOUGH2 (inverse TOUGH2) provides inverse modeling capabilities for TOUGH2, a simulator for multi-dimensional, multi-phase, multi-component, non-isothermal flow and transport in fractured porous media. iTOUGH2 performs sensitivity analyses, data-worth analyses, parameter estimation, and uncertainty propagation analyses in geosciences and reservoir engineering and other application areas. iTOUGH2 supports a number of different combinations of fluids and components (equation-of-state (EOS) modules). In addition, the optimization routines implemented in iTOUGH2 can also be used for sensitivity analysis, automatic model calibration, and uncertainty quantification of any external code that uses text-based input and output files using the PEST protocol. iTOUGH2 solves the inverse problem bymore » minimizing a non-linear objective function of the weighted differences between model output and the corresponding observations. Multiple minimization algorithms (derivative-free, gradient-based, and second-order; local and global) are available. iTOUGH2 also performs Latin Hypercube Monte Carlo simulations for uncertainty propagation analyses. A detailed residual and error analysis is provided. This upgrade includes (a) global sensitivity analysis methods, (b) dynamic memory allocation (c) additional input features and output analyses, (d) increased forward simulation capabilities, (e) parallel execution on multicore PCs and Linux clusters, and (f) bug fixes. More details can be found at http://esd.lbl.gov/iTOUGH2.« less
Canonical PSO Based K-Means Clustering Approach for Real Datasets
Dey, Lopamudra; Chakraborty, Sanjay
2014-01-01
“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms. PMID:27355083
On the question of fractal packing structure in metallic glasses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, Jun; Asta, Mark; Ritchie, Robert O.
2017-07-25
This work addresses the long-standing debate over fractal models of packing structure in metallic glasses (MGs). Through detailed fractal and percolation analyses of MG structures, derived from simulations spanning a range of compositions and quenching rates, we conclude that there is no fractal atomic-level structure associated with the packing of all atoms or solute-centered clusters. The results are in contradiction with conclusions derived from previous studies based on analyses of shifts in radial distribution function and structure factor peaks associated with volume changes induced by pressure and compositional variations. Here in this paper, the interpretation of such shifts is shownmore » to be challenged by the heterogeneous nature of MG structure and deformation at the atomic scale. Moreover, our analysis in the present work illustrates clearly the percolation theory applied to MGs, for example, the percolation threshold and characteristics of percolation clusters formed by subsets of atoms, which can have important consequences for structure–property relationships in these amorphous materials.« less
Tholken, Sophia; Schrabback, Tim; Reiprich, Thomas H.; ...
2018-03-05
Here, observations of relaxed, massive, and distant clusters can provide important tests of standard cosmological models, for example by using the gas mass fraction. To perform this test, the dynamical state of the cluster and its gas properties have to be investigated. X-ray analyses provide one of the best opportunities to access this information and to determine important properties such as temperature profiles, gas mass, and the total X-ray hydrostatic mass. For the last of these, weak gravitational lensing analyses are complementary independent probes that are essential in order to test whether X-ray masses could be biased.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tholken, Sophia; Schrabback, Tim; Reiprich, Thomas H.
Here, observations of relaxed, massive, and distant clusters can provide important tests of standard cosmological models, for example by using the gas mass fraction. To perform this test, the dynamical state of the cluster and its gas properties have to be investigated. X-ray analyses provide one of the best opportunities to access this information and to determine important properties such as temperature profiles, gas mass, and the total X-ray hydrostatic mass. For the last of these, weak gravitational lensing analyses are complementary independent probes that are essential in order to test whether X-ray masses could be biased.
Multilocus microsatellite typing shows three different genetic clusters of Leishmania major in Iran.
Mahnaz, Tashakori; Al-Jawabreh, Amer; Kuhls, Katrin; Schönian, Gabriele
2011-10-01
Ten polymorphic microsatellite markers were used to analyse 25 strains of Leishmania major collected from cutaneous leishmaniasis cases in different endemic areas in Iran. Nine of the markers were polymorphic, revealing 21 different genotypes. The data displayed significant microsatellite polymorphism with rare allelic heterozygosity. Bayesian statistic and distance based analyses identified three genetic clusters among the 25 strains analysed. Cluster I represented mainly strains isolated in the west and south-west of Iran, with the exception of four strains originating from central Iran. Cluster II comprised strains from the central part of Iran, and cluster III included only strains from north Iran. The geographical distribution of L. major in Iran was supported by comparing the microsatellite profiles of the 25 Iranian strains to those of 105 strains collected in 19 Asian and African countries. The Iranian clusters I and II were separated from three previously described populations comprising strains from Africa, the Middle East and Central Asia whereas cluster III grouped together with the Central Asian population. The considerable genetic variability of L. major might be related to the existence of different populations of Phlebotomus papatasi and/or to differences in reservoir host abundance in different parts of Iran. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
The XMM Cluster Survey: the halo occupation number of BOSS galaxies in X-ray clusters
NASA Astrophysics Data System (ADS)
Mehrtens, Nicola; Romer, A. Kathy; Nichol, Robert C.; Collins, Chris A.; Sahlén, Martin; Rooney, Philip J.; Mayers, Julian A.; Bermeo-Hernandez, A.; Bristow, Martyn; Capozzi, Diego; Christodoulou, L.; Comparat, Johan; Hilton, Matt; Hoyle, Ben; Kay, Scott T.; Liddle, Andrew R.; Mann, Robert G.; Masters, Karen; Miller, Christopher J.; Parejko, John K.; Prada, Francisco; Ross, Ashley J.; Schneider, Donald P.; Stott, John P.; Streblyanska, Alina; Viana, Pedro T. P.; White, Martin; Wilcox, Harry; Zehavi, Idit
2016-12-01
We present a direct measurement of the mean halo occupation distribution (HOD) of galaxies taken from the eleventh data release (DR11) of the Sloan Digital Sky Survey-III Baryon Oscillation Spectroscopic Survey (BOSS). The HOD of BOSS low-redshift (LOWZ: 0.2 < z < 0.4) and Constant-Mass (CMASS: 0.43 < z < 0.7) galaxies is inferred via their association with the dark matter haloes of 174 X-ray-selected galaxy clusters drawn from the XMM Cluster Survey (XCS). Halo masses are determined for each galaxy cluster based on X-ray temperature measurements, and range between log10(M180/M⊙) = 13 and 15. Our directly measured HODs are consistent with the HOD-model fits inferred via the galaxy-clustering analyses of Parejko et al. for the BOSS LOWZ sample and White et al. for the BOSS CMASS sample. Under the simplifying assumption that the other parameters that describe the HOD hold the values measured by these authors, we have determined a best-fitting alpha-index of 0.91 ± 0.08 and 1.27^{+0.03}_{-0.04} for the CMASS and LOWZ HOD, respectively. These alpha-index values are consistent with those measured by White et al. and Parejko et al. In summary, our study provides independent support for the HOD models assumed during the development of the BOSS mock-galaxy catalogues that have subsequently been used to derive BOSS cosmological constraints.
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach. PMID:25794172
Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.
2008-01-01
Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…
Garcia, Danilo; MacDonald, Shane; Archer, Trevor
2015-01-01
Background. The notion of the affective system as being composed of two dimensions led Archer and colleagues to the development of the affective profiles model. The model consists of four different profiles based on combinations of individuals' experience of high/low positive and negative affect: self-fulfilling, low affective, high affective, and self-destructive. During the past 10 years, an increasing number of studies have used this person-centered model as the backdrop for the investigation of between and within individual differences in ill-being and well-being. The most common approach to this profiling is by dividing individuals' scores of self-reported affect using the median of the population as reference for high/low splits. However, scores just-above and just-below the median might become high and low by arbitrariness, not by reality. Thus, it is plausible to criticize the validity of this variable-oriented approach. Our aim was to compare the median splits approach with a person-oriented approach, namely, cluster analysis. Method. The participants (N = 2, 225) were recruited through Amazons' Mechanical Turk and asked to self-report affect using the Positive Affect Negative Affect Schedule. We compared the profiles' homogeneity and Silhouette coefficients to discern differences in homogeneity and heterogeneity between approaches. We also conducted exact cell-wise analyses matching the profiles from both approaches and matching profiles and gender to investigate profiling agreement with respect to affectivity levels and affectivity and gender. All analyses were conducted using the ROPstat software. Results. The cluster approach (weighted average of cluster homogeneity coefficients = 0.62, Silhouette coefficients = 0.68) generated profiles with greater homogeneity and more distinctive from each other compared to the median splits approach (weighted average of cluster homogeneity coefficients = 0.75, Silhouette coefficients = 0.59). Most of the participants (n = 1,736, 78.0%) were allocated to the same profile (Rand Index = .83), however, 489 (21.98%) were allocated to different profiles depending on the approach. Both approaches allocated females and males similarly in three of the four profiles. Only the cluster analysis approach classified men significantly more often than chance to a self-fulfilling profile (type) and females less often than chance to this very same profile (antitype). Conclusions. Although the question whether one approach is more appropriate than the other is still without answer, the cluster method allocated individuals to profiles that are more in accordance with the conceptual basis of the model and also to expected gender differences. More importantly, regardless of the approach, our findings suggest that the model mirrors a complex and dynamic adaptive system.
Clustering of financial time series
NASA Astrophysics Data System (ADS)
D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo
2013-05-01
This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.
MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS
Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...
Meguid, Robert A; Bronsert, Michael R; Juarez-Colunga, Elizabeth; Hammermeister, Karl E; Henderson, William G
2016-07-01
To develop parsimonious prediction models for postoperative mortality, overall morbidity, and 6 complication clusters applicable to a broad range of surgical operations in adult patients. Quantitative risk assessment tools are not routinely used for preoperative patient assessment, shared decision making, informed consent, and preoperative patient optimization, likely due in part to the burden of data collection and the complexity of incorporation into routine surgical practice. Multivariable forward selection stepwise logistic regression analyses were used to develop predictive models for 30-day mortality, overall morbidity, and 6 postoperative complication clusters, using 40 preoperative variables from 2,275,240 surgical cases in the American College of Surgeons National Surgical Quality Improvement Program data set, 2005 to 2012. For the mortality and overall morbidity outcomes, prediction models were compared with and without preoperative laboratory variables, and generic models (based on all of the data from 9 surgical specialties) were compared with specialty-specific models. In each model, the cumulative c-index was used to examine the contribution of each added predictor variable. C-indexes, Hosmer-Lemeshow analyses, and Brier scores were used to compare discrimination and calibration between models. For the mortality and overall morbidity outcomes, the prediction models without the preoperative laboratory variables performed as well as the models with the laboratory variables, and the generic models performed as well as the specialty-specific models. The c-indexes were 0.938 for mortality, 0.810 for overall morbidity, and for the 6 complication clusters ranged from 0.757 for infectious to 0.897 for pulmonary complications. Across the 8 prediction models, the first 7 to 11 variables entered accounted for at least 99% of the c-index of the full model (using up to 28 nonlaboratory predictor variables). Our results suggest that it will be possible to develop parsimonious models to predict 8 important postoperative outcomes for a broad surgical population, without the need for surgeon specialty-specific models or inclusion of laboratory variables.
KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.
Laetsch, Dominik R; Blaxter, Mark L
2017-10-05
The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.
Schachner, Maja K; He, Jia; Heizmann, Boris; Van de Vijver, Fons J R
2017-01-01
School adjustment determines long-term adjustment in society. Yet, immigrant youth do better in some countries than in others. Drawing on acculturation research (Berry, 1997; Ward, 2001) and self-determination theory (Ryan and Deci, 2000), we investigated indirect effects of adolescent immigrants' acculturation orientations on school adjustment (school-related attitudes, truancy, and mathematics achievement) through school belonging. Analyses were based on data from the Programme for International Student Assessment from six European countries, which were combined into three clusters based on their migrant integration and multicultural policies: Those with the most supportive policies (Belgium and Finland), those with moderately supportive policies (Italy and Portugal), and those with the most unsupportive policies (Denmark and Slovenia). In a multigroup path model, we confirmed most associations. As expected, mainstream orientation predicted higher belonging and better outcomes in all clusters, whereas the added value of students' ethnic orientation was only observed in some clusters. Results are discussed in terms of differences in acculturative climate and policies between countries of settlement.
Ji, N Y; Capone, G T; Kaufmann, W E
2011-11-01
The diagnostic validity of autism spectrum disorder (ASD) based on Diagnostic and Statistical Manual of Mental Disorders (DSM) has been challenged in Down syndrome (DS), because of the high prevalence of cognitive impairments in this population. Therefore, we attempted to validate DSM-based diagnoses via an unbiased categorisation of participants with a DSM-independent behavioural instrument. Based on scores on the Aberrant Behaviour Checklist - Community, we performed sequential factor (four DS-relevant factors: Autism-Like Behaviour, Disruptive Behaviour, Hyperactivity, Self-Injury) and cluster analyses on a 293-participant paediatric DS clinic cohort. The four resulting clusters were compared with DSM-delineated groups: DS + ASD, DS + None (no DSM diagnosis), DS + DBD (disruptive behaviour disorder) and DS + SMD (stereotypic movement disorder), the latter two as comparison groups. Two clusters were identified with DS + ASD: Cluster 1 (35.1%) with higher disruptive behaviour and Cluster 4 (48.2%) with more severe autistic behaviour and higher percentage of late onset ASD. The majority of participants in DS + None (71.9%) and DS + DBD (87.5%) were classified into Cluster 2 and 3, respectively, while participants in DS + SMD were relatively evenly distributed throughout the four clusters. Our unbiased, DSM-independent analyses, using a rating scale specifically designed for individuals with severe intellectual disability, demonstrated that DSM-based criteria of ASD are applicable to DS individuals despite their cognitive impairments. Two DS + ASD clusters were identified and supported the existence of at least two subtypes of ASD in DS, which deserve further characterisation. Despite the prominence of stereotypic behaviour in DS, the SMD diagnosis was not identified by cluster analysis, suggesting that high-level stereotypy is distributed throughout DS. Further supporting DSM diagnoses, typically behaving DS participants were easily distinguished as a group from those with maladaptive behaviours. © 2011 The Authors. Journal of Intellectual Disability Research © 2011 Blackwell Publishing Ltd.
NASA Astrophysics Data System (ADS)
Hassan, Kazi; Allen, Deonie; Haynes, Heather
2016-04-01
This paper considers 1D hydraulic model data on the effect of high flow clusters and sequencing on sediment transport. Using observed flow gauge data from the River Caldew, England, a novel stochastic modelling approach was developed in order to create alternative 50 year flow sequences. Whilst the observed probability density of gauge data was preserved in all sequences, the order in which those flows occurred was varied using the output from a Hidden Markov Model (HMM) with generalised Pareto distribution (GP). In total, one hundred 50 year synthetic flow series were generated and used as the inflow boundary conditions for individual flow series model runs using the 1D sediment transport model HEC-RAS. The model routed graded sediment through the case study river reach to define the long-term morphological changes. Comparison of individual simulations provided a detailed understanding of the sensitivity of channel capacity to flow sequence. Specifically, each 50 year synthetic flow sequence was analysed using a 3-month, 6-month or 12-month rolling window approach and classified for clusters in peak discharge. As a cluster is described as a temporal grouping of flow events above a specified threshold, the threshold condition used herein is considered as a morphologically active channel forming discharge event. Thus, clusters were identified for peak discharges in excess of 10%, 20%, 50%, 100% and 150% of the 1 year Return Period (RP) event. The window of above-peak flows also required cluster definition and was tested for timeframes 1, 2, 10 and 30 days. Subsequently, clusters could be described in terms of the number of events, maximum peak flow discharge, cumulative flow discharge and skewness (i.e. a description of the flow sequence). The model output for each cluster was analysed for the cumulative flow volume and cumulative sediment transport (mass). This was then compared to the total sediment transport of a single flow event of equivalent flow volume. Results illustrate that clustered flood events generated sediment loads up to an order of magnitude greater than that of individual events of the same flood volume. Correlations were significant for sediment volume compared to both maximum flow discharge (R2<0.8) and number of events (R2 -0.5 to -0.7) within the cluster. The strongest correlations occurred for clusters with a greater number of flow events only slightly above-threshold. This illustrates that the numerical model can capture a degree of the non-linear morphological response to flow magnitude. Analysis of the relationship between morphological change and the skewness of flow events within each cluster was also determined, illustrating only minor sensitivity to cluster peak distribution skewness. This is surprising and discussion is presented on model limitations, including the capability of sediment transport formulae to effectively account for temporal processes of antecedent flow, hysteresis, local supply etc.
NASA Astrophysics Data System (ADS)
Elangasinghe, M. A.; Singhal, N.; Dirks, K. N.; Salmond, J. A.; Samarasinghe, S.
2014-09-01
This paper uses artificial neural networks (ANN), combined with k-means clustering, to understand the complex time series of PM10 and PM2.5 concentrations at a coastal location of New Zealand based on data from a single site. Out of available meteorological parameters from the network (wind speed, wind direction, solar radiation, temperature, relative humidity), key factors governing the pattern of the time series concentrations were identified through input sensitivity analysis performed on the trained neural network model. The transport pathways of particulate matter under these key meteorological parameters were further analysed through bivariate concentration polar plots and k-means clustering techniques. The analysis shows that the external sources such as marine aerosols and local sources such as traffic and biomass burning contribute equally to the particulate matter concentrations at the study site. These results are in agreement with the results of receptor modelling by the Auckland Council based on Positive Matrix Factorization (PMF). Our findings also show that contrasting concentration-wind speed relationships exist between marine aerosols and local traffic sources resulting in very noisy and seemingly large random PM10 concentrations. The inclusion of cluster rankings as an input parameter to the ANN model showed a statistically significant (p < 0.005) improvement in the performance of the ANN time series model and also showed better performance in picking up high concentrations. For the presented case study, the correlation coefficient between observed and predicted concentrations improved from 0.77 to 0.79 for PM2.5 and from 0.63 to 0.69 for PM10 and reduced the root mean squared error (RMSE) from 5.00 to 4.74 for PM2.5 and from 6.77 to 6.34 for PM10. The techniques presented here enable the user to obtain an understanding of potential sources and their transport characteristics prior to the implementation of costly chemical analysis techniques or advanced air dispersion models.
Short template switch events explain mutation clusters in the human genome.
Löytynoja, Ari; Goldman, Nick
2017-06-01
Resequencing efforts are uncovering the extent of genetic variation in humans and provide data to study the evolutionary processes shaping our genome. One recurring puzzle in both intra- and inter-species studies is the high frequency of complex mutations comprising multiple nearby base substitutions or insertion-deletions. We devised a generalized mutation model of template switching during replication that extends existing models of genome rearrangement and used this to study the role of template switch events in the origin of short mutation clusters. Applied to the human genome, our model detects thousands of template switch events during the evolution of human and chimp from their common ancestor and hundreds of events between two independently sequenced human genomes. Although many of these are consistent with a template switch mechanism previously proposed for bacteria, our model also identifies new types of mutations that create short inversions, some flanked by paired inverted repeats. The local template switch process can create numerous complex mutation patterns, including hairpin loop structures, and explains multinucleotide mutations and compensatory substitutions without invoking positive selection, speculative mechanisms, or implausible coincidence. Clustered sequence differences are challenging for current mapping and variant calling methods, and we show that many erroneous variant annotations exist in human reference data. Local template switch events may have been neglected as an explanation for complex mutations because of biases in commonly used analyses. Incorporation of our model into reference-based analysis pipelines and comparisons of de novo assembled genomes will lead to improved understanding of genome variation and evolution. © 2017 Löytynoja and Goldman; Published by Cold Spring Harbor Laboratory Press.
Geographical Analysis of the Distribution and Spread of Human Rabies in China from 2005 to 2011
Yin, Wenwu; Yu, Hongjie; Si, Yali; Li, Jianhui; Zhou, Yuanchun; Zhou, Xiaoyan; Magalhães, Ricardo J. Soares.
2013-01-01
Background Rabies is a significant public health problem in China in that it records the second highest case incidence globally. Surveillance data on canine rabies in China is lacking and human rabies notifications can be a useful indicator of areas where animal and human rabies control could be integrated. Previous spatial epidemiological studies lacked adequate spatial resolution to inform targeted rabies control decisions. We aimed to describe the spatiotemporal distribution of human rabies and model its geographical spread to provide an evidence base to inform future integrated rabies control strategies in China. Methods We geo-referenced a total of 17,760 human rabies cases of China from 2005 to 2011. In our spatial analyses we used Gaussian kernel density analysis, average nearest neighbor distance, Spatial Temporal Density-Based Spatial Clustering of Applications with Noise and developed a model of rabies spatiotemporal spread. Findings Human rabies cases increased from 2005 to 2007 and decreased during 2008 to 2011 companying change of the spatial distribution. The ANN distance among human rabies cases increased between 2005 and 2011, and the degree of clustering of human rabies cases decreased during that period. A total 480 clusters were detected by ST-DBSCAN, 89.4% clusters initiated before 2007. Most of clusters were mainly found in South of China. The number and duration of cluster decreased significantly after 2008. Areas with the highest density of human rabies cases varied spatially each year and in some areas remained with high outbreak density for several years. Though few places have recovered from human rabies, most of affected places are still suffering from the disease. Conclusion Human rabies in mainland China is geographically clustered and its spatial extent changed during 2005 to 2011. The results provide a scientific basis for public health authorities in China to improve human rabies control and prevention program. PMID:23991098
Nilsson, Daniel; Lindman, Magdalena; Victor, Trent; Dozza, Marco
2018-04-01
Single-vehicle run-off-road crashes are a major traffic safety concern, as they are associated with a high proportion of fatal outcomes. In addressing run-off-road crashes, the development and evaluation of advanced driver assistance systems requires test scenarios that are representative of the variability found in real-world crashes. We apply hierarchical agglomerative cluster analysis to define similarities in a set of crash data variables, these clusters can then be used as the basis in test scenario development. Out of 13 clusters, nine test scenarios are derived, corresponding to crashes characterised by: drivers drifting off the road in daytime and night-time, high speed departures, high-angle departures on narrow roads, highways, snowy roads, loss-of-control on wet roadways, sharp curves, and high speeds on roads with severe road surface conditions. In addition, each cluster was analysed with respect to crash variables related to the crash cause and reason for the unintended lane departure. The study shows that cluster analysis of representative data provides a statistically based method to identify relevant properties for run-off-road test scenarios. This was done to support development of vehicle-based run-off-road countermeasures and driver behaviour models used in virtual testing. Future studies should use driver behaviour from naturalistic driving data to further define how test-scenarios and behavioural causation mechanisms should be included. Copyright © 2018 Elsevier Ltd. All rights reserved.
Ecological tolerances of Miocene larger benthic foraminifera from Indonesia
NASA Astrophysics Data System (ADS)
Novak, Vibor; Renema, Willem
2018-01-01
To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.
Investigating the internal structure of galaxies and clusters through strong gravitational lensing
NASA Astrophysics Data System (ADS)
Jigish Gandhi, Pratik; Grillo, Claudio; Bonamigo, Mario
2018-01-01
Gravitational lensing studies have radically improved our understanding of the internal structure of galaxies and cluster-scale systems. In particular, the combination of strong lensing and stellar dynamics or stellar population synthesis models have made it possible to characterize numerous fundamental properties of the galaxies as well as dark matter halos and subhalos with unprecedented robustness and accuracy. Here we demonstrate the usefulness and accuracy of strong lensing as a probe for characterising the properties of cluster members as well as dark matter halos, to show that such characterisation carried out via lensing analyses alone is as viable as those carried out through a combination of spectroscopy and lensing analyses.Our study uses focuses on the early-type galaxy cluster MACS J1149.5+2223 at redshift 0.54 in the Hubble Frontier Fields (HFF) program, where the first magnified and spatially resolved multiple images of supernova (SN) “Refsdal” and its late-type host galaxy at redshift 1.489 were detected. The Refsdal system is unique in being the first ever multiply-imaged supernova, with it’s first four images appearing in an Einstein Cross configuration around one of the cluster members in 2015. In our lensing analyses we use HST data of the multiply-imaged SN Refsdal to constrain the dynamical masses, velocity dispersions, and virial radii of individual galaxies and dark matter halos in the MACS J1149.5+2223 cluster. For our lensing models we select a sample of 300 cluster members within approximately 500 kpc from the BCG, and a set of reliable multiple images associated with 18 distinct knots in the SN host spiral galaxy, as well as multiple images of the supernova itself. Our results provide accurate measurements of the masses, velocity dispersions, and radii of the cluster’s dark matter halo as well as three chosen members galaxies, in strong agreement with those obtained by Grillo et al 2015, demonstrating the usefulness of strong lensing in characterising the properties of cluster-scale systems.
Sloan, Chantel D.; Nordsborg, Rikke B.; Jacquez, Geoffrey M.; Raaschou-Nielsen, Ole; Meliker, Jaymie R.
2015-01-01
Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population. PMID:25756204
Sloan, Chantel D; Nordsborg, Rikke B; Jacquez, Geoffrey M; Raaschou-Nielsen, Ole; Meliker, Jaymie R
2015-01-01
Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population.
CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.
Fidaner, Işık Barış; Cankorur-Cetinkaya, Ayca; Dikicioglu, Duygu; Kirdar, Betul; Cemgil, Ali Taylan; Oliver, Stephen G
2016-02-01
Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. sgo24@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Austin, Peter C
2010-04-22
Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Markov Chain Model-Based Optimal Cluster Heads Selection for Wireless Sensor Networks
Ahmed, Gulnaz; Zou, Jianhua; Zhao, Xi; Sadiq Fareed, Mian Muhammad
2017-01-01
The longer network lifetime of Wireless Sensor Networks (WSNs) is a goal which is directly related to energy consumption. This energy consumption issue becomes more challenging when the energy load is not properly distributed in the sensing area. The hierarchal clustering architecture is the best choice for these kind of issues. In this paper, we introduce a novel clustering protocol called Markov chain model-based optimal cluster heads (MOCHs) selection for WSNs. In our proposed model, we introduce a simple strategy for the optimal number of cluster heads selection to overcome the problem of uneven energy distribution in the network. The attractiveness of our model is that the BS controls the number of cluster heads while the cluster heads control the cluster members in each cluster in such a restricted manner that a uniform and even load is ensured in each cluster. We perform an extensive range of simulation using five quality measures, namely: the lifetime of the network, stable and unstable region in the lifetime of the network, throughput of the network, the number of cluster heads in the network, and the transmission time of the network to analyze the proposed model. We compare MOCHs against Sleep-awake Energy Efficient Distributed (SEED) clustering, Artificial Bee Colony (ABC), Zone Based Routing (ZBR), and Centralized Energy Efficient Clustering (CEEC) using the above-discussed quality metrics and found that the lifetime of the proposed model is almost 1095, 2630, 3599, and 2045 rounds (time steps) greater than SEED, ABC, ZBR, and CEEC, respectively. The obtained results demonstrate that the MOCHs is better than SEED, ABC, ZBR, and CEEC in terms of energy efficiency and the network throughput. PMID:28241492
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.
Sun, Zhe; Wang, Ting; Deng, Ke; Wang, Xiao-Feng; Lafyatis, Robert; Ding, Ying; Hu, Ming; Chen, Wei
2018-01-01
Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods. DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html. wei.chen@chp.edu or hum@ccf.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi
2018-03-13
Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Irradiation-induced microchemical changes in highly irradiated 316 stainless steel
NASA Astrophysics Data System (ADS)
Fujii, K.; Fukuya, K.
2016-02-01
Cold-worked 316 stainless steel specimens irradiated to 74 dpa in a pressurized water reactor (PWR) were analyzed by atom probe tomography (APT) to extend knowledge of solute clusters and segregation at higher doses. The analyses confirmed that those clusters mainly enriched in Ni-Si or Ni-Si-Mn were formed at high number density. The clusters were divided into three types based on their size and Mn content; small Ni-Si clusters (3-4 nm in diameter), and large Ni-Si and Ni-Si-Mn clusters (8-10 nm in diameter). The total cluster number density was 7.7 × 1023 m-3. The fraction of large clusters was almost 1/10 of the total density. The average composition (in at%) for small clusters was: Fe, 54; Cr, 12; Mn, 1; Ni, 22; Si, 11; Mo, 1, and for large clusters it was: Fe, 44; Cr, 9; Mn, 2; Ni, 29; Si, 14; Mo,1. It was likely that some of the Ni-Si clusters correspond to γ‧ phase precipitates while the Ni-Si-Mn clusters were precursors of G phase precipitates. The APT analyses at grain boundaries confirmed enrichment of Ni, Si, P and Cu and depletion of Fe, Cr, Mo and Mn. The segregation behavior was consistent with previous knowledge of radiation induced segregation.
Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...
2016-09-20
In this paper, we use Planck data to detect the cross-correlation between the thermal Sunyaev-Zeldovich (tSZ) effect and the infrared emission from the galaxies that make up the the cosmic infrared background (CIB). We first perform a stacking analysis towards Planck-confirmed galaxy clusters. We detect infrared emission produced by dusty galaxies inside these clusters and demonstrate that the infrared emission is about 50% more extended than the tSZ effect. Modelling the emission with a Navarro-Frenk-White profile, we find that the radial profile concentration parameter is c 500 = 1.00 +0.18 -0.15 . This indicates that infrared galaxies in the outskirtsmore » of clusters have higher infrared flux than cluster-core galaxies. We also study the cross-correlation between tSZ and CIB anisotropies, following three alternative approaches based on power spectrum analyses: (i) using a catalogue of confirmed clusters detected in Planck data; (ii) using an all-sky tSZ map built from Planck frequency maps; and (iii) using cross-spectra between Planck frequency maps. With the three different methods, we detect the tSZ-CIB cross-power spectrum at significance levels of (i) 6σ; (ii) 3σ; and (iii) 4σ. We model the tSZ-CIB cross-correlation signature and compare predictions with the measurements. The amplitude of the cross-correlation relative to the fiducial model is A tSZ-CIB = 1.2 ± 0.3. Finally, this result is consistent with predictions for the tSZ-CIB cross-correlation assuming the best-fit cosmological model from Planck 2015 results along with the tSZ and CIB scaling relations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ade, P. A. R.; Aghanim, N.; Arnaud, M.
In this paper, we use Planck data to detect the cross-correlation between the thermal Sunyaev-Zeldovich (tSZ) effect and the infrared emission from the galaxies that make up the the cosmic infrared background (CIB). We first perform a stacking analysis towards Planck-confirmed galaxy clusters. We detect infrared emission produced by dusty galaxies inside these clusters and demonstrate that the infrared emission is about 50% more extended than the tSZ effect. Modelling the emission with a Navarro-Frenk-White profile, we find that the radial profile concentration parameter is c 500 = 1.00 +0.18 -0.15 . This indicates that infrared galaxies in the outskirtsmore » of clusters have higher infrared flux than cluster-core galaxies. We also study the cross-correlation between tSZ and CIB anisotropies, following three alternative approaches based on power spectrum analyses: (i) using a catalogue of confirmed clusters detected in Planck data; (ii) using an all-sky tSZ map built from Planck frequency maps; and (iii) using cross-spectra between Planck frequency maps. With the three different methods, we detect the tSZ-CIB cross-power spectrum at significance levels of (i) 6σ; (ii) 3σ; and (iii) 4σ. We model the tSZ-CIB cross-correlation signature and compare predictions with the measurements. The amplitude of the cross-correlation relative to the fiducial model is A tSZ-CIB = 1.2 ± 0.3. Finally, this result is consistent with predictions for the tSZ-CIB cross-correlation assuming the best-fit cosmological model from Planck 2015 results along with the tSZ and CIB scaling relations.« less
Network Analysis to Risk Stratify Patients With Exercise Intolerance.
Oldham, William M; Oliveira, Rudolf K F; Wang, Rui-Sheng; Opotowsky, Alexander R; Rubins, David M; Hainer, Jon; Wertheim, Bradley M; Alba, George A; Choudhary, Gaurav; Tornyos, Adrienn; MacRae, Calum A; Loscalzo, Joseph; Leopold, Jane A; Waxman, Aaron B; Olschewski, Horst; Kovacs, Gabor; Systrom, David M; Maron, Bradley A
2018-03-16
Current methods assessing clinical risk because of exercise intolerance in patients with cardiopulmonary disease rely on a small subset of traditional variables. Alternative strategies incorporating the spectrum of factors underlying prognosis in at-risk patients may be useful clinically, but are lacking. Use unbiased analyses to identify variables that correspond to clinical risk in patients with exercise intolerance. Data from 738 consecutive patients referred for invasive cardiopulmonary exercise testing at a single center (2011-2015) were analyzed retrospectively (derivation cohort). A correlation network of invasive cardiopulmonary exercise testing parameters was assembled using |r|>0.5. From an exercise network of 39 variables (ie, nodes) and 98 correlations (ie, edges) corresponding to P <9.5e -46 for each correlation, we focused on a subnetwork containing peak volume of oxygen consumption (pVo 2 ) and 9 linked nodes. K-mean clustering based on these 10 variables identified 4 novel patient clusters characterized by significant differences in 44 of 45 exercise measurements ( P <0.01). Compared with a probabilistic model, including 23 independent predictors of pVo 2 and pVo 2 itself, the network model was less redundant and identified clusters that were more distinct. Cluster assignment from the network model was predictive of subsequent clinical events. For example, a 4.3-fold ( P <0.0001; 95% CI, 2.2-8.1) and 2.8-fold ( P =0.0018; 95% CI, 1.5-5.2) increase in hazard for age- and pVo 2 -adjusted all-cause 3-year hospitalization, respectively, were observed between the highest versus lowest risk clusters. Using these data, we developed the first risk-stratification calculator for patients with exercise intolerance. When applying the risk calculator to patients in 2 independent invasive cardiopulmonary exercise testing cohorts (Boston and Graz, Austria), we observed a clinical risk profile that paralleled the derivation cohort. Network analyses were used to identify novel exercise groups and develop a point-of-care risk calculator. These data expand the range of useful clinical variables beyond pVo 2 that predict hospitalization in patients with exercise intolerance. © 2018 American Heart Association, Inc.
Regression analysis on the variation in efficiency frontiers for prevention stage of HIV/AIDS.
Kamae, Maki S; Kamae, Isao; Cohen, Joshua T; Neumann, Peter J
2011-01-01
To investigate how the cost effectiveness of preventing HIV/AIDS varies across possible efficiency frontiers (EFs) by taking into account potentially relevant external factors, such as prevention stage, and how the EFs can be characterized using regression analysis given uncertainty of the QALY-cost estimates. We reviewed cost-effectiveness estimates for the prevention and treatment of HIV/AIDS published from 2002-2007 and catalogued in the Tufts Medical Center Cost-Effectiveness Analysis (CEA) Registry. We constructed efficiency frontier (EF) curves by plotting QALYs against costs, using methods used by the Institute for Quality and Efficiency in Health Care (IQWiG) in Germany. We stratified the QALY-cost ratios by prevention stage, country of study, and payer perspective, and estimated EF equations using log and square-root models. A total of 53 QALY-cost ratios were identified for HIV/AIDS in the Tufts CEA Registry. Plotted ratios stratified by prevention stage were visually grouped into a cluster consisting of primary/secondary prevention measures and a cluster consisting of tertiary measures. Correlation coefficients for each cluster were statistically significant. For each cluster, we derived two EF equations - one based on the log model, and one based on the square-root model. Our findings indicate that stratification of HIV/AIDS interventions by prevention stage can yield distinct EFs, and that the correlation and regression analyses are useful for parametrically characterizing EF equations. Our study has certain limitations, such as the small number of included articles and the potential for study populations to be non-representative of countries of interest. Nonetheless, our approach could help develop a deeper appreciation of cost effectiveness beyond the deterministic approach developed by IQWiG.
NASA Astrophysics Data System (ADS)
Chirivì, G.; Suyu, S. H.; Grillo, C.; Halkola, A.; Balestra, I.; Caminha, G. B.; Mercurio, A.; Rosati, P.
2018-06-01
Exploiting the powerful tool of strong gravitational lensing by galaxy clusters to study the highest-redshift Universe and cluster mass distributions relies on precise lens mass modelling. In this work, we aim to present the first attempt at modelling line-of-sight (LOS) mass distribution in addition to that of the cluster, extending previous modelling techniques that assume mass distributions to be on a single lens plane. We have focussed on the Hubble Frontier Field cluster MACS J0416.1-2403, and our multi-plane model reproduces the observed image positions with a rms offset of 0.''53. Starting from this best-fitting model, we simulated a mock cluster that resembles MACS J0416.1-2403 in order to explore the effects of LOS structures on cluster mass modelling. By systematically analysing the mock cluster under different model assumptions, we find that neglecting the lensing environment has a significant impact on the reconstruction of image positions (rms 0.''3); accounting for LOS galaxies as if they were at the cluster redshift can partially reduce this offset. Moreover, foreground galaxies are more important to include into the model than the background ones. While the magnification factor of the lensed multiple images are recovered within 10% for 95% of them, those 5% that lie near critical curves can be significantly affected by the exclusion of the lensing environment in the models. In addition, LOS galaxies cannot explain the apparent discrepancy in the properties of massive sub-halos between MACS J0416.1-2403 and N-body simulated clusters. Since our model of MACS J0416.1-2403 with LOS galaxies only reduced modestly the rms offset in the image positions, we conclude that additional complexities would be needed in future models of MACS J0416.1-2403.
Gender differences in psychiatric disorders and clusters of self-esteem among detained adolescents.
Van Damme, Lore; Colins, Olivier F; Vanderplasschen, Wouter
2014-12-30
Detained minors display substantial mental health needs. This study focused on two features (psychopathology and self-esteem) that have received considerable attention in the literature and clinical work, but have rarely been studied simultaneously in detained youths. The aims of this study were to examine gender differences in psychiatric disorders and clusters of self-esteem, and to test the hypothesis that the cluster of adolescents with lower (versus higher) levels of self-esteem have higher rates of psychiatric disorders. The prevalence of psychiatric disorders was assessed in 440 Belgian, detained adolescents using the Diagnostic Interview Schedule for Children-IV. Self-esteem was assessed using the Self-perception Profile for Adolescents. Model-based cluster analyses were performed to identify youths with lower and/or higher levels of self-esteem across several domains. Girls have higher rates for most psychiatric disorders and lower levels of self-esteem than boys. A higher number of clusters was identified in boys (four) than girls (three). Generally, the cluster of adolescents with lower (versus higher) levels of self-esteem had a higher prevalence of psychiatric disorders. These results suggest that the detection of low levels of self-esteem in adolescents, especially girls, might help clinicians to identify a subgroup of detained adolescents with the highest prevalence of psychopathology.
Butyrate production in phylogenetically diverse Firmicutes isolated from the chicken caecum
Eeckhaut, Venessa; Van Immerseel, Filip; Croubels, Siska; De Baere, Siegrid; Haesebrouck, Freddy; Ducatelle, Richard; Louis, Petra; Vandamme, Peter
2011-01-01
Summary Sixteen butyrate‐producing bacteria were isolated from the caecal content of chickens and analysed phylogenetically. They did not represent a coherent phylogenetic group, but were allied to four different lineages in the Firmicutes phylum. Fourteen strains appeared to represent novel species, based on a level of ≤ 98.5% 16S rRNA gene sequence similarity towards their nearest validly named neighbours. The highest butyrate concentrations were produced by the strains belonging to clostridial clusters IV and XIVa, clusters which are predominant in the chicken caecal microbiota. In only one of the 16 strains tested, the butyrate kinase operon could be amplified, while the butyryl‐CoA : acetate CoA‐transferase gene was detected in eight strains belonging to clostridial clusters IV, XIVa and XIVb. None of the clostridial cluster XVI isolates carried this gene based on degenerate PCR analyses. However, another CoA‐transferase gene more similar to propionate CoA‐transferase was detected in the majority of the clostridial cluster XVI isolates. Since this gene is located directly downstream of the remaining butyrate pathway genes in several human cluster XVI bacteria, it may be involved in butyrate formation in these bacteria. The present study indicates that butyrate producers related to cluster XVI may play a more important role in the chicken gut than in the human gut. PMID:21375722
Dark matter searches with Cherenkov telescopes: nearby dwarf galaxies or local galaxy clusters?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sánchez-Conde, Miguel A.; Cannoni, Mirco; Gómez, Mario E.
2011-12-01
In this paper, we compare dwarf galaxies and galaxy clusters in order to elucidate which object class is the best target for gamma-ray DM searches with imaging atmospheric Cherenkov telescopes (IACTs). We have built a mixed dwarfs+clusters sample containing some of the most promising nearby dwarf galaxies (Draco, Ursa Minor, Wilman 1 and Segue 1) and local galaxy clusters (Perseus, Coma, Ophiuchus, Virgo, Fornax, NGC 5813 and NGC 5846), and then compute their DM annihilation flux profiles by making use of the latest modeling of their DM density profiles. We also include in our calculations the effect of DM substructure.more » Willman 1 appears as the best candidate in the sample. However, its mass modeling is still rather uncertain, so probably other candidates with less uncertainties and quite similar fluxes, namely Ursa Minor and Segue 1, might be better options. As for galaxy clusters, Virgo represents the one with the highest flux. However, its large spatial extension can be a serious handicap for IACT observations and posterior data analysis. Yet, other local galaxy cluster candidates with more moderate emission regions, such as Perseus, may represent good alternatives. After comparing dwarfs and clusters, we found that the former exhibit annihilation flux profiles that, at the center, are roughly one order of magnitude higher than those of clusters, although galaxy clusters can yield similar, or even higher, integrated fluxes for the whole object once substructure is taken into account. Even when any of these objects are strictly point-like according to the properties of their annihilation signals, we conclude that dwarf galaxies are best suited for observational strategies based on the search of point-like sources, while galaxy clusters represent best targets for analyses that can deal with rather extended emissions. Finally, we study the detection prospects for present and future IACTs in the framework of the constrained minimal supersymmetric standard model. We find that the level of the annihilation flux from these targets is below the sensitivities of current IACTs and the future CTA.« less
Dark Matter Searches with Cherenkov Telescopes: Nearby Dwarf Galaxies or Local Galaxy Clusters?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanchez-Conde, Miguel A.; /KIPAC, Menlo Park /SLAC /IAC, La Laguna /Laguna U., Tenerife; Cannoni, Mirco
2012-06-06
In this paper, we compare dwarf galaxies and galaxy clusters in order to elucidate which object class is the best target for gamma-ray DM searches with imaging atmospheric Cherenkov telescopes (IACTs). We have built a mixed dwarfs+clusters sample containing some of the most promising nearby dwarf galaxies (Draco, Ursa Minor, Wilman 1 and Segue 1) and local galaxy clusters (Perseus, Coma, Ophiuchus, Virgo, Fornax, NGC 5813 and NGC 5846), and then compute their DM annihilation flux profiles by making use of the latest modeling of their DM density profiles. We also include in our calculations the effect of DM substructure.more » Willman 1 appears as the best candidate in the sample. However, its mass modeling is still rather uncertain, so probably other candidates with less uncertainties and quite similar fluxes, namely Ursa Minor and Segue 1, might be better options. As for galaxy clusters, Virgo represents the one with the highest flux. However, its large spatial extension can be a serious handicap for IACT observations and posterior data analysis. Yet, other local galaxy cluster candidates with more moderate emission regions, such as Perseus, may represent good alternatives. After comparing dwarfs and clusters, we found that the former exhibit annihilation flux profiles that, at the center, are roughly one order of magnitude higher than those of clusters, although galaxy clusters can yield similar, or even higher, integrated fluxes for the whole object once substructure is taken into account. Even when any of these objects are strictly point-like according to the properties of their annihilation signals, we conclude that dwarf galaxies are best suited for observational strategies based on the search of point-like sources, while galaxy clusters represent best targets for analyses that can deal with rather extended emissions. Finally, we study the detection prospects for present and future IACTs in the framework of the constrained minimal supersymmetric standard model. We find that the level of the annihilation flux from these targets is below the sensitivities of current IACTs and the future CTA.« less
Dark matter searches with Cherenkov telescopes: nearby dwarf galaxies or local galaxy clusters?
NASA Astrophysics Data System (ADS)
Sánchez-Conde, Miguel A.; Cannoni, Mirco; Zandanel, Fabio; Gómez, Mario E.; Prada, Francisco
2011-12-01
In this paper, we compare dwarf galaxies and galaxy clusters in order to elucidate which object class is the best target for gamma-ray DM searches with imaging atmospheric Cherenkov telescopes (IACTs). We have built a mixed dwarfs+clusters sample containing some of the most promising nearby dwarf galaxies (Draco, Ursa Minor, Wilman 1 and Segue 1) and local galaxy clusters (Perseus, Coma, Ophiuchus, Virgo, Fornax, NGC 5813 and NGC 5846), and then compute their DM annihilation flux profiles by making use of the latest modeling of their DM density profiles. We also include in our calculations the effect of DM substructure. Willman 1 appears as the best candidate in the sample. However, its mass modeling is still rather uncertain, so probably other candidates with less uncertainties and quite similar fluxes, namely Ursa Minor and Segue 1, might be better options. As for galaxy clusters, Virgo represents the one with the highest flux. However, its large spatial extension can be a serious handicap for IACT observations and posterior data analysis. Yet, other local galaxy cluster candidates with more moderate emission regions, such as Perseus, may represent good alternatives. After comparing dwarfs and clusters, we found that the former exhibit annihilation flux profiles that, at the center, are roughly one order of magnitude higher than those of clusters, although galaxy clusters can yield similar, or even higher, integrated fluxes for the whole object once substructure is taken into account. Even when any of these objects are strictly point-like according to the properties of their annihilation signals, we conclude that dwarf galaxies are best suited for observational strategies based on the search of point-like sources, while galaxy clusters represent best targets for analyses that can deal with rather extended emissions. Finally, we study the detection prospects for present and future IACTs in the framework of the constrained minimal supersymmetric standard model. We find that the level of the annihilation flux from these targets is below the sensitivities of current IACTs and the future CTA.
Atmospheric effects on cluster analyses. [for remote sensing application
NASA Technical Reports Server (NTRS)
Kiang, R. K.
1979-01-01
Ground reflected radiance, from which information is extracted through techniques of cluster analyses for remote sensing application, is altered by the atmosphere when it reaches the satellite. Therefore it is essential to understand the effects of the atmosphere on Landsat measurements, cluster characteristics and analysis accuracy. A doubling model is employed to compute the effective reflectivity, observed from the satellite, as a function of ground reflectivity, solar zenith angle and aerosol optical thickness for standard atmosphere. The relation between the effective reflectivity and ground reflectivity is approximately linear. It is shown that for a horizontally homogeneous atmosphere, the classification statistics from a maximum likelihood classifier remains unchanged under these transforms. If inhomogeneity is present, the divergence between clusters is reduced, and correlation between spectral bands increases. Radiance reflected by the background area surrounding the target may also reach the satellite. The influence of background reflectivity on effective reflectivity is discussed.
Wang, Jong-Yi; Liang, Yia-Wen; Yeh, Chun-Chen; Liu, Chiu-Shong; Wang, Chen-Yu
2018-02-21
Spousal clustering of cancer warrants attention. Whether the common environment or high-age vulnerability determines cancer clustering is unclear. The risk of clustering in couples versus non-couples is undetermined. The time to cancer clustering after the first cancer diagnosis is yet to be reported. This study investigated cancer clustering over time among couples by using nationwide data. A cohort of 5643 married couples in the 2002-2013 Taiwan National Health Insurance Research Database was identified and randomly matched with 5643 non-couple pairs through dual propensity score matching. Factors associated with clustering (both spouses with tumours) were analysed by using the Cox proportional hazard model. Propensity-matched analysis revealed that the risk of clustering of all tumours among couples (13.70%) was significantly higher than that among non-couples (11.84%) (OR=1.182, 95% CI 1.058 to 1.321, P=0.0031). The median time to clustering of all tumours and of malignant tumours was 2.92 and 2.32 years, respectively. Risk characteristics associated with clustering included high age and comorbidity. Shared environmental factors among spouses might be linked to a high incidence of cancer clustering. Cancer incidence in one spouse may signal cancer vulnerability in the other spouse. Promoting family-oriented cancer care in vulnerable families and preventing shared lifestyle risk factors for cancer are suggested. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NASA Astrophysics Data System (ADS)
Benson, Bryant Joseph
Context: Galaxy clusters are the most massive gravitationally bound structures in the universe and are formed through the process of hierarchical clustering, in which smaller systems undergo a series of mergers to form ever larger clusters. Because of the masses involved, mergers between these giants provide a unique laboratory for observing many interesting astrophysical processes. These merging systems also act as large dark matter colliders, because the dark matter halos of the clusters involved pass through each other during of the merger. This offers us a means to observe if dark matter-dark matter collisions result in momentum exchange beyond what occurs from gravity alone. Such observations can help us to unravel some of the mysteries behind dark matter, such as does it interact with itself through mechanisms beyond gravity, and how strong are those interactions. Answers to questions like these are what will eventually allow us to discover what dark matter really is. However, the extremely long time scales for these mergers (˜several billion years) make each observation a single snapshot in the long merger history, and we must infer many of the details necessary for understanding the full merger process. Furthermore, current weak lensing analyses lack the precision required to detect a signal from self-interacting dark matter. Uncertain weak lensing mass and position estimates also yield large uncertainties in the dynamical reconstruction of the merger scenarios. Need: In order to better model the dynamics of merging galaxy cluster systems, and to potentially measure any signal from self-interacting dark matter, we need to obtain more precise measurements on the masses and positions of the dark matter halos involved. Gravitational lensing offers a robust method for mapping the mass in these clusters because it directly measures the gravitational field, and does not depend on the dynamical state of the system that has been disturbed in the merger process. Of the lensing methods, weak gravitational lensing is the only way that we can probe a wide field and measure the total mass of the cluster. However, the precision of conventional weak lensing techniques is currently limited by shape noise (uncertainty in the shear due to the dispersion in the intrinsic shapes and orientations of unlensed galaxies). A possible avenue forward is to eliminate shape noise as a source of uncertainty in shear measurements via a technique to be described below. This would eliminate the largest source of uncertainty in weak lensing analyses, and enable us to obtain mass and position estimates of dark matter halos with a much higher level of precision. Task: In this dissertation we perform statistical clustering, conventional weak lensing analyses, and dynamical reconstruction on the merging galaxy cluster system ZwCl 2341.1+0000 in order to test the capabilities of the dynamical modeling on a complex, multiple merger. We use targeted optical spectroscopy to identify cluster member galaxies, which we then use to model the galaxy substructures. We also obtain a dynamical mass estimate using the galaxy velocity dispersions, and perform weak lensing analyses in the forms of aperture densitometry to place an upper bound on the total cluster mass, and multiple NFW profile halo fitting to approximate the masses and positions of the individual dark matter halos present in the merger. The masses, positions, and line of sight velocities of those clusters are then used to constrain the parameters describing the best fit merger scenario, with radio relic positions and polarization used to further tighten those constraints. We also develop a new method for obtaining weak lensing data from individual source galaxies in the form of shear measurements that are independent of shape noise, and direct measurements of the convergence. We accomplish this by simultaneously modeling the pre-lensing velocity and intensity profiles of a lensed, rotating disk galaxy, and the lensing transform required to distort those into the lensed profiles we observe. We test this method with a host of idealized simulations to characterize its capabilities in a best-case scenario and forecast the possible improvements it can bring to the precision of weak lensing analyses on galaxy clusters. (Abstract shortened by ProQuest.).
Model-based Clustering of Categorical Time Series with Multinomial Logit Classification
NASA Astrophysics Data System (ADS)
Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea
2010-09-01
A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.
Żuk, Magdalena; Pezowicz, Celina
2015-01-01
Objective. The purpose of the present work was to assess the validity of a six-degrees-of-freedom gait analysis model based on the ISB recommendation on definitions of joint coordinate systems (ISB 6DOF) through a quantitative comparison with the Helen Hays model (HH) and repeatability assessment. Methods. Four healthy subjects were analysed with both marker sets: an HH marker set and four marker clusters in ISB 6DOF. A navigated pointer was used to indicate the anatomical landmark position in the cluster reference system according to the ISB recommendation. Three gait cycles were selected from the data collected simultaneously for the two marker sets. Results. Two protocols showed good intertrial repeatability, which apart from pelvic rotation did not exceed 2°. The greatest differences between protocols were observed in the transverse plane as well as for knee angles. Knee internal/external rotation revealed the lowest subject-to-subject and interprotocol repeatability and inconsistent patterns for both protocols. Knee range of movement in transverse plane was overestimated for the HH set (the mean is 34°), which could indicate the cross-talk effect. Conclusions. The ISB 6DOF anatomically based protocol enabled full 3D kinematic description of joints according to the current standard with clinically acceptable intertrial repeatability and minimal equipment requirements.
Baseline adjustments for binary data in repeated cross-sectional cluster randomized trials.
Nixon, R M; Thompson, S G
2003-09-15
Analysis of covariance models, which adjust for a baseline covariate, are often used to compare treatment groups in a controlled trial in which individuals are randomized. Such analysis adjusts for any baseline imbalance and usually increases the precision of the treatment effect estimate. We assess the value of such adjustments in the context of a cluster randomized trial with repeated cross-sectional design and a binary outcome. In such a design, a new sample of individuals is taken from the clusters at each measurement occasion, so that baseline adjustment has to be at the cluster level. Logistic regression models are used to analyse the data, with cluster level random effects to allow for different outcome probabilities in each cluster. We compare the estimated treatment effect and its precision in models that incorporate a covariate measuring the cluster level probabilities at baseline and those that do not. In two data sets, taken from a cluster randomized trial in the treatment of menorrhagia, the value of baseline adjustment is only evident when the number of subjects per cluster is large. We assess the generalizability of these findings by undertaking a simulation study, and find that increased precision of the treatment effect requires both large cluster sizes and substantial heterogeneity between clusters at baseline, but baseline imbalance arising by chance in a randomized study can always be effectively adjusted for. Copyright 2003 John Wiley & Sons, Ltd.
The structure of DSM-IV-TR personality disorder diagnoses in NESARC: a reanalysis.
Trull, Timothy J; Vergés, Alvaro; Wood, Phillip K; Sher, Kenneth J
2013-12-01
Cox, Clara, Worobec, and Grant (2012) recently presented results from a series of analyses aimed at identifying the factor structure underlying the DSM-IV-TR (APA, 2000) personality diagnoses assessed in the large NESARC study. Cox et al. (2012) concluded that the best fitting model was one that modeled three lower-order factors (the three clusters of PDs as outlined by DSM-IV-TR), which in turn loaded on a single PD higher-order factor. Our reanalyses of the NESARC Wave 1 and Wave 2 data for personality disorder diagnoses revealed that the best fitting model was that of a general PD factor that spans each of the ten DSM-IV PD diagnoses, and our reanalyses do not support the three-cluster hierarchical structure outlined by Cox et al. (2012) and DSM-IV-TR. Finally, we note the importance of modeling the Wave 2 assessment method factor in analyses of NESARC PD data.
Pilot testing model to uncover industrial symbiosis in Brazilian industrial clusters.
Saraceni, Adriana Valélia; Resende, Luis Mauricio; de Andrade Júnior, Pedro Paulo; Pontes, Joseane
2017-04-01
The main objective of this study was to create a pilot model to uncover industrial symbiosis practices in Brazilian industrial clusters. For this purpose, a systematic revision was conducted in journals selected from two categories of the ISI Web of Knowledge: Engineering, Environmental and Engineering, Industrial. After an in-depth revision of literature, results allowed the creation of an analysis structure. A methodology based on fuzzy logic was applied and used to attribute the weights of industrial symbiosis variables. It was thus possible to extract the intensity indicators of the interrelations required to analyse the development level of each correlation between the variables. Determination of variables and their weights initially resulted in a framework for the theory of industrial symbiosis assessments. Research results allowed the creation of a pilot model that could precisely identify the loopholes or development levels in each sphere. Ontology charts for data analysis were also generated. This study contributes to science by presenting the foundations for building an instrument that enables application and compilation of the pilot model, in order to identify opportunity to symbiotic development, which derives from "uncovering" existing symbioses.
A two step Bayesian approach for genomic prediction of breeding values.
Shariati, Mohammad M; Sørensen, Peter; Janss, Luc
2012-05-21
In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.
Cognitive Clusters in Specific Learning Disorder.
Poletti, Michele; Carretta, Elisa; Bonvicini, Laura; Giorgi-Rossi, Paolo
The heterogeneity among children with learning disabilities still represents a barrier and a challenge in their conceptualization. Although a dimensional approach has been gaining support, the categorical approach is still the most adopted, as in the recent fifth edition of the Diagnostic and Statistical Manual of Mental Disorders. The introduction of the single overarching diagnostic category of specific learning disorder (SLD) could underemphasize interindividual clinical differences regarding intracategory cognitive functioning and learning proficiency, according to current models of multiple cognitive deficits at the basis of neurodevelopmental disorders. The characterization of specific cognitive profiles associated with an already manifest SLD could help identify possible early cognitive markers of SLD risk and distinct trajectories of atypical cognitive development leading to SLD. In this perspective, we applied a cluster analysis to identify groups of children with a Diagnostic and Statistical Manual-based diagnosis of SLD with similar cognitive profiles and to describe the association between clusters and SLD subtypes. A sample of 205 children with a diagnosis of SLD were enrolled. Cluster analyses (agglomerative hierarchical and nonhierarchical iterative clustering technique) were used successively on 10 core subtests of the Wechsler Intelligence Scale for Children-Fourth Edition. The 4-cluster solution was adopted, and external validation found differences in terms of SLD subtype frequencies and learning proficiency among clusters. Clinical implications of these findings are discussed, tracing directions for further studies.
NASA Astrophysics Data System (ADS)
Fensch, J.; Mieske, S.; Müller-Seidlitz, J.; Hilker, M.
2014-07-01
Aims: We investigate the colour-magnitude relation of metal-poor globular clusters, the so-called blue tilt, in the Hydra and Centaurus galaxy clusters and constrain the primordial conditions for star cluster self-enrichment. Methods: We analyse U,I photometry for about 2500 globular clusters in the central regions of Hydra and Centaurus, based on VLT/FORS1 data. We measure the relation between mean colour and luminosity for the blue and red subpopulation of the globular cluster samples. We convert these relations into mass-metallicity space and compare the obtained GC mass-metallicity relation with predictions from the star cluster self-enrichment model by Bailin & Harris (2009, ApJ, 695, 1082). For this we include effects of dynamical and stellar evolution and a physically well motivated primordial mass-radius scaling. Results: We obtain a mass-metallicity scaling of Z ∝ M0.27 ± 0.05 for Centaurus GCs and Z ∝ M0.40 ± 0.06 for Hydra GCs, consistent with the range of observed relations in other environments. We find that the GC mass-metallicity relation already sets in at present-day masses of a few and is well established in the luminosity range of massive MW clusters like ω Centauri. The inclusion of a primordial mass-radius scaling of star clusters significantly improves the fit of the self-enrichment model to the data. The self-enrichment model accurately reproduces the observed relations for average primordial half-light radii rh ~ 1-1.5 pc, star formation efficiencies f⋆ ~ 0.3-0.4, and pre-enrichment levels of [Fe/H] - 1.7 dex. The slightly steeper blue tilt for Hydra can be explained either by a ~30% smaller average rh at fixed f⋆ ~ 0.3, or analogously by a ~20% smaller f⋆ at fixed rh ~ 1.5 pc. Within the self-enrichment scenario, the observed blue tilt implies a correlation between GC mass and width of the stellar metallicity distribution. We find that this implied correlation matches the trend of width with GC mass measured in Galactic GCs, including extreme cases like ω Centauri and M 54. Conclusions: First, we found that a primordial star cluster mass-radius relation provides a significant improvement to the self-enrichment model fits. Second we show that broadened metallicity distributions as found in some massive MW globular clusters may have arisen naturally from self-enrichment processes, without the need of a dwarf galaxy progenitor.
NASA Astrophysics Data System (ADS)
Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi
2018-04-01
Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
NASA Astrophysics Data System (ADS)
Kumar, Rohit; Puri, Rajeev K.
2018-03-01
Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.
Local-world and cluster-growing weighted networks with controllable clustering
NASA Astrophysics Data System (ADS)
Yang, Chun-Xia; Tang, Min-Xuan; Tang, Hai-Qiang; Deng, Qiang-Qiang
2014-12-01
We constructed an improved weighted network model by introducing local-world selection mechanism and triangle coupling mechanism based on the traditional BBV model. The model gives power-law distributions of degree, strength and edge weight and presents the linear relationship both between the degree and strength and between the degree and the clustering coefficient. Particularly, the model is equipped with an ability to accelerate the speed increase of strength exceeding that of degree. Besides, the model is more sound and efficient in tuning clustering coefficient than the original BBV model. Finally, based on our improved model, we analyze the virus spread process and find that reducing the size of local-world has a great inhibited effect on virus spread.
Contextual Approach with Guided Discovery Learning and Brain Based Learning in Geometry Learning
NASA Astrophysics Data System (ADS)
Kartikaningtyas, V.; Kusmayadi, T. A.; Riyadi
2017-09-01
The aim of this study was to combine the contextual approach with Guided Discovery Learning (GDL) and Brain Based Learning (BBL) in geometry learning of junior high school. Furthermore, this study analysed the effect of contextual approach with GDL and BBL in geometry learning. GDL-contextual and BBL-contextual was built from the steps of GDL and BBL that combined with the principles of contextual approach. To validate the models, it uses quasi experiment which used two experiment groups. The sample had been chosen by stratified cluster random sampling. The sample was 150 students of grade 8th in junior high school. The data were collected through the student’s mathematics achievement test that given after the treatment of each group. The data analysed by using one way ANOVA with different cell. The result shows that GDL-contextual has not different effect than BBL-contextual on mathematics achievement in geometry learning. It means both the two models could be used in mathematics learning as the innovative way in geometry learning.
Genome-Wide Analysis of Type VI System Clusters and Effectors in Burkholderia Species.
Nguyen, Thao Thi; Lee, Hyun-Hee; Park, Inmyoung; Seo, Young-Su
2018-02-01
Type VI secretion system (T6SS) has been discovered in a variety of gram-negative bacteria as a versatile weapon to stimulate the killing of eukaryotic cells or prokaryotic competitors. Type VI secretion effectors (T6SEs) are well known as key virulence factors for important pathogenic bacteria. In many Burkholderia species, T6SS has evolved as the most complicated secretion pathway with distinguished types to translocate diverse T6SEs, suggesting their essential roles in this genus. Here we attempted to detect and characterize T6SSs and potential T6SEs in target genomes of plant-associated and environmental Burkholderia species based on computational analyses. In total, 66 potential functional T6SS clusters were found in 30 target Burkholderia bacterial genomes, of which 33% possess three or four clusters. The core proteins in each cluster were specified and phylogenetic trees of three components (i.e., TssC, TssD, TssL) were constructed to elucidate the relationship among the identified T6SS clusters. Next, we identified 322 potential T6SEs in the target genomes based on homology searches and explored the important domains conserved in effector candidates. In addition, using the screening approach based on the profile hidden Markov model (pHMM) of T6SEs that possess markers for type VI effectors (MIX motif) (MIX T6SEs), 57 revealed proteins that were not included in training datasets were recognized as novel MIX T6SE candidates from the Burkholderia species. This approach could be useful to identify potential T6SEs from other bacterial genomes.
A hybrid algorithm for clustering of time series data based on affinity search technique.
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
Katz, R
1992-11-01
Cluster management is a management model that fosters decentralization of management, develops leadership potential of staff, and creates ownership of unit-based goals. Unlike shared governance models, there is no formal structure created by committees and it is less threatening for managers. There are two parts to the cluster management model. One is the formation of cluster groups, consisting of all staff and facilitated by a cluster leader. The cluster groups function for communication and problem-solving. The second part of the cluster management model is the creation of task forces. These task forces are designed to work on short-term goals, usually in response to solving one of the unit's goals. Sometimes the task forces are used for quality improvement or system problems. Clusters are groups of not more than five or six staff members, facilitated by a cluster leader. A cluster is made up of individuals who work the same shift. For example, people with job titles who work days would be in a cluster. There would be registered nurses, licensed practical nurses, nursing assistants, and unit clerks in the cluster. The cluster leader is chosen by the manager based on certain criteria and is trained for this specialized role. The concept of cluster management, criteria for choosing leaders, training for leaders, using cluster groups to solve quality improvement issues, and the learning process necessary for manager support are described.
Studt, Lena; Niehaus, Eva-Maria; Espino, Jose J.; Huß, Kathleen; Michielse, Caroline B.; Albermann, Sabine; Wagner, Dominik; Bergner, Sonja V.; Connolly, Lanelle R.; Fischer, Andreas; Reuter, Gunter; Kleigrewe, Karin; Bald, Till; Wingfield, Brenda D.; Ophir, Ron; Freeman, Stanley; Hippler, Michael; Smith, Kristina M.; Brown, Daren W.; Proctor, Robert H.; Münsterkötter, Martin; Freitag, Michael; Humpf, Hans-Ulrich; Güldener, Ulrich; Tudzynski, Bettina
2013-01-01
The fungus Fusarium fujikuroi causes “bakanae” disease of rice due to its ability to produce gibberellins (GAs), but it is also known for producing harmful mycotoxins. However, the genetic capacity for the whole arsenal of natural compounds and their role in the fungus' interaction with rice remained unknown. Here, we present a high-quality genome sequence of F. fujikuroi that was assembled into 12 scaffolds corresponding to the 12 chromosomes described for the fungus. We used the genome sequence along with ChIP-seq, transcriptome, proteome, and HPLC-FTMS-based metabolome analyses to identify the potential secondary metabolite biosynthetic gene clusters and to examine their regulation in response to nitrogen availability and plant signals. The results indicate that expression of most but not all gene clusters correlate with proteome and ChIP-seq data. Comparison of the F. fujikuroi genome to those of six other fusaria revealed that only a small number of gene clusters are conserved among these species, thus providing new insights into the divergence of secondary metabolism in the genus Fusarium. Noteworthy, GA biosynthetic genes are present in some related species, but GA biosynthesis is limited to F. fujikuroi, suggesting that this provides a selective advantage during infection of the preferred host plant rice. Among the genome sequences analyzed, one cluster that includes a polyketide synthase gene (PKS19) and another that includes a non-ribosomal peptide synthetase gene (NRPS31) are unique to F. fujikuroi. The metabolites derived from these clusters were identified by HPLC-FTMS-based analyses of engineered F. fujikuroi strains overexpressing cluster genes. In planta expression studies suggest a specific role for the PKS19-derived product during rice infection. Thus, our results indicate that combined comparative genomics and genome-wide experimental analyses identified novel genes and secondary metabolites that contribute to the evolutionary success of F. fujikuroi as a rice pathogen. PMID:23825955
Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf
2017-09-01
Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.
Regional SAR Image Segmentation Based on Fuzzy Clustering with Gamma Mixture Model
NASA Astrophysics Data System (ADS)
Li, X. L.; Zhao, Q. H.; Li, Y.
2017-09-01
Most of stochastic based fuzzy clustering algorithms are pixel-based, which can not effectively overcome the inherent speckle noise in SAR images. In order to deal with the problem, a regional SAR image segmentation algorithm based on fuzzy clustering with Gamma mixture model is proposed in this paper. First, initialize some generating points randomly on the image, the image domain is divided into many sub-regions using Voronoi tessellation technique. Each sub-region is regarded as a homogeneous area in which the pixels share the same cluster label. Then, assume the probability of the pixel to be a Gamma mixture model with the parameters respecting to the cluster which the pixel belongs to. The negative logarithm of the probability represents the dissimilarity measure between the pixel and the cluster. The regional dissimilarity measure of one sub-region is defined as the sum of the measures of pixels in the region. Furthermore, the Markov Random Field (MRF) model is extended from pixels level to Voronoi sub-regions, and then the regional objective function is established under the framework of fuzzy clustering. The optimal segmentation results can be obtained by the solution of model parameters and generating points. Finally, the effectiveness of the proposed algorithm can be proved by the qualitative and quantitative analysis from the segmentation results of the simulated and real SAR images.
A Single-Cell Roadmap of Lineage Bifurcation in Human ESC Models of Embryonic Brain Development.
Yao, Zizhen; Mich, John K; Ku, Sherman; Menon, Vilas; Krostag, Anne-Rachel; Martinez, Refugio A; Furchtgott, Leon; Mulholland, Heather; Bort, Susan; Fuqua, Margaret A; Gregor, Ben W; Hodge, Rebecca D; Jayabalu, Anu; May, Ryan C; Melton, Samuel; Nelson, Angelique M; Ngo, N Kiet; Shapovalova, Nadiya V; Shehata, Soraya I; Smith, Michael W; Tait, Leah J; Thompson, Carol L; Thomsen, Elliot R; Ye, Chaoyang; Glass, Ian A; Kaykas, Ajamete; Yao, Shuyuan; Phillips, John W; Grimley, Joshua S; Levi, Boaz P; Wang, Yanling; Ramanathan, Sharad
2017-01-05
During human brain development, multiple signaling pathways generate diverse cell types with varied regional identities. Here, we integrate single-cell RNA sequencing and clonal analyses to reveal lineage trees and molecular signals underlying early forebrain and mid/hindbrain cell differentiation from human embryonic stem cells (hESCs). Clustering single-cell transcriptomic data identified 41 distinct populations of progenitor, neuronal, and non-neural cells across our differentiation time course. Comparisons with primary mouse and human gene expression data demonstrated rostral and caudal progenitor and neuronal identities from early brain development. Bayesian analyses inferred a unified cell-type lineage tree that bifurcates between cortical and mid/hindbrain cell types. Two methods of clonal analyses confirmed these findings and further revealed the importance of Wnt/β-catenin signaling in controlling this lineage decision. Together, these findings provide a rich transcriptome-based lineage map for studying human brain development and modeling developmental disorders. Copyright © 2017 Elsevier Inc. All rights reserved.
Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.
Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc
2018-01-01
In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.
Physical model of protein cluster positioning in growing bacteria
NASA Astrophysics Data System (ADS)
Wasnik, Vaibhav; Wang, Hui; Wingreen, Ned S.; Mukhopadhyay, Ranjan
2017-10-01
Chemotaxic receptors in bacteria form clusters at cell poles and also laterally, and this clustering plays an important role in signal transduction. These clusters were found to be periodically arranged on the surface of the bacterium Escherichia coli, independent of any known positioning mechanism. In this work we extend a model based on diffusion and aggregation to more realistic geometries and present a means based on ‘bursty’ protein production to distinguish spontaneous positioning from an independently existing positioning mechanism. We also consider the case of isotropic cellular growth and characterize the degree of order arising spontaneously. Our model could also be relevant for other examples of periodically positioned protein clusters in bacteria.
NASA Astrophysics Data System (ADS)
Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.
2017-11-01
In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.
The galaxy clustering crisis in abundance matching
NASA Astrophysics Data System (ADS)
Campbell, Duncan; van den Bosch, Frank C.; Padmanabhan, Nikhil; Mao, Yao-Yuan; Zentner, Andrew R.; Lange, Johannes U.; Jiang, Fangzhou; Villarreal, Antonio
2018-06-01
Galaxy clustering on small scales is significantly underpredicted by sub-halo abundance matching (SHAM) models that populate (sub-)haloes with galaxies based on peak halo mass, Mpeak. SHAM models based on the peak maximum circular velocity, Vpeak, have had much better success. The primary reason for Mpeak-based models fail is the relatively low abundance of satellite galaxies produced in these models compared to those based on Vpeak. Despite success in predicting clustering, a simple Vpeak-based SHAM model results in predictions for galaxy growth that are at odds with observations. We evaluate three possible remedies that could `save' mass-based SHAM: (1) SHAM models require a significant population of `orphan' galaxies as a result of artificial disruption/merging of sub-haloes in modern high-resolution dark matter simulations; (2) satellites must grow significantly after their accretion; and (3) stellar mass is significantly affected by halo assembly history. No solution is entirely satisfactory. However, regardless of the particulars, we show that popular SHAM models based on Mpeak cannot be complete physical models as presented. Either Vpeak truly is a better predictor of stellar mass at z ˜ 0 and it remains to be seen how the correlation between stellar mass and Vpeak comes about, or SHAM models are missing vital component(s) that significantly affect galaxy clustering.
Factors influencing the quality of life of haemodialysis patients according to symptom cluster.
Shim, Hye Yeung; Cho, Mi-Kyoung
2018-05-01
To identify the characteristics in each symptom cluster and factors influencing the quality of life of haemodialysis patients in Korea according to cluster. Despite developments in renal replacement therapy, haemodialysis still restricts the activities of daily living due to pain and impairs physical functioning induced by the disease and its complications. Descriptive survey. Two hundred and thirty dialysis patients aged >18 years. They completed self-administered questionnaires of Dialysis Symptom Index and Kidney Disease Quality of Life instrument-Short Form 1.3. To determine the optimal number of clusters, the collected data were analysed using polytomous variable latent class analysis in R software (poLCA) to estimate the latent class models and the latent class regression models for polytomous outcome variables. Differences in characteristics, symptoms and QOL according to the symptom cluster of haemodialysis patients were analysed using the independent t test and chi-square test. The factors influencing the QOL according to symptom cluster were identified using hierarchical multiple regression analysis. Physical and emotional symptoms were significantly more severe, and the QOL was significantly worse in Cluster 1 than in Cluster 2. The factors influencing the QOL were spouse, job, insurance type and physical and emotional symptoms in Cluster 1, with these variables having an explanatory power of 60.9%. Physical and emotional symptoms were the only influencing factors in Cluster 2, and they had an explanatory power of 37.4%. Mitigating the symptoms experienced by haemodialysis patients and improving their QOL require educational and therapeutic symptom management interventions that are tailored according to the characteristics and symptoms in each cluster. The findings of this study are expected to lead to practical guidelines for addressing the symptoms experienced by haemodialysis patients, and they provide basic information for developing nursing interventions to manage these symptoms and improve the QOL of these patients. © 2017 John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Li; Tunega, Daniel; Xu, Lai
2013-08-29
In a previous study (J. Phys. Chem. C 2011, 115, 12403) cluster models for the TiO2 rutile (110) surface and MP2 calculations were used to develop an analytic potential energy function for dimethyl methylphosphonate (DMMP) interacting with this surface. In the work presented here, this analytic potential and MP2 cluster models are compared with DFT "slab" calculations for DMMP interacting with the TiO2 (110) surface and with DFT cluster models for the TiO2 (110) surface. The DFT slab calculations were performed with the PW91 and PBE functionals. The analytic potential gives DMMP/ TiO2 (110) potential energy curves in excellent agreementmore » with those obtained from the slab calculations. The cluster models for the TiO2 (110) surface, used for the MP2 calculations, were extended to DFT calculations with the B3LYP, PW91, and PBE functional. These DFT calculations do not give DMMP/TiO2 (110) interaction energies which agree with those from the DFT slab calculations. Analyses of the wave functions for these cluster models show that they do not accurately represent the HOMO and LUMO for the surface, which should be 2p and 3d orbitals, respectively, and the models also do not give an accurate band gap. The MP2 cluster models do not accurately represent the LUMO and that they give accurate DMMP/TiO2 (110) interaction energies is apparently fortuitous, arising from their highly inaccurate band gaps. Accurate cluster models, consisting of 7, 10, and 15 Ti-atoms and which have the correct HOMO and LUMO properties, are proposed. The work presented here illustrates the care that must be taken in "constructing" cluster models which accurately model surfaces.« less
Armour, Cherie; Contractor, Ateka; Shea, Tracie; Elhai, Jon D; Pietrzak, Robert H
2016-02-01
Scarce data are available regarding the dimensional structure of Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) posttraumatic stress disorder (PTSD) symptoms and how factors relate to external constructs. We evaluated six competing models of DSM-5 PTSD symptoms, including Anhedonia, Externalizing Behaviors, and Hybrid models, using confirmatory factor analyses in a sample of 412 trauma-exposed college students. We then examined whether PTSD symptom clusters were differentially related to measures of anger and impulsivity using Wald chi-square tests. The seven-factor Hybrid model was deemed optimal compared with the alternatives. All symptom clusters were associated with anger; the strongest association was between externalizing behaviors and anger (r = 0.54). All symptom clusters, except re-experiencing and avoidance, were associated with impulsivity, with the strongest association between externalizing behaviors and impulsivity (r = 0.49). A seven-factor Hybrid model provides superior fit to DSM-5 PTSD symptom data, with the externalizing behaviors factor being most strongly related to anger and impulsivity.
Alpha-cluster preformation factor within cluster-formation model for odd-A and odd-odd heavy nuclei
NASA Astrophysics Data System (ADS)
Saleh Ahmed, Saad M.
2017-06-01
The alpha-cluster probability that represents the preformation of alpha particle in alpha-decay nuclei was determined for high-intensity alpha-decay mode odd-A and odd-odd heavy nuclei, 82 < Z < 114, 111 < N < 174. This probability was calculated using the energy-dependent formula derived from the formulation of clusterisation states representation (CSR) and the hypothesised cluster-formation model (CFM) as in our previous work. Our previous successful determination of phenomenological values of alpha-cluster preformation factors for even-even nuclei motivated us to expand the work to cover other types of nuclei. The formation energy of interior alpha cluster needed to be derived for the different nuclear systems with considering the unpaired-nucleon effect. The results showed the phenomenological value of alpha preformation probability and reflected the unpaired nucleon effect and the magic and sub-magic effects in nuclei. These results and their analyses presented are very useful for future work concerning the calculation of the alpha decay constants and the progress of its theory.
NASA Astrophysics Data System (ADS)
Kuncarayakti, H.; Galbany, L.; Anderson, J. P.; Krühler, T.; Hamuy, M.
2016-09-01
Context. Stellar populations are the building blocks of galaxies, including the Milky Way. The majority, if not all, extragalactic studies are entangled with the use of stellar population models given the unresolved nature of their observation. Extragalactic systems contain multiple stellar populations with complex star formation histories. However, studies of these systems are mainly based upon the principles of simple stellar populations (SSP). Hence, it is critical to examine the validity of SSP models. Aims: This work aims to empirically test the validity of SSP models. This is done by comparing SSP models against observations of spatially resolved young stellar population in the determination of its physical properties, that is, age and metallicity. Methods: Integral field spectroscopy of a young stellar cluster in the Milky Way, NGC 3603, was used to study the properties of the cluster as both a resolved and unresolved stellar population. The unresolved stellar population was analysed using the Hα equivalent width as an age indicator and the ratio of strong emission lines to infer metallicity. In addition, spectral energy distribution (SED) fitting using STARLIGHT was used to infer these properties from the integrated spectrum. Independently, the resolved stellar population was analysed using the colour-magnitude diagram (CMD) to determine age and metallicity. As the SSP model represents the unresolved stellar population, the derived age and metallicity were tested to determine whether they agree with those derived from resolved stars. Results: The age and metallicity estimate of NGC 3603 derived from integrated spectroscopy are confirmed to be within the range of those derived from the CMD of the resolved stellar population, including other estimates found in the literature. The result from this pilot study supports the reliability of SSP models for studying unresolved young stellar populations. Based on observations collected at the European Organisation for Astronomical Research in the Southern Hemisphere under ESO programme 60.A-9344.
Hierarchical modeling of cluster size in wildlife surveys
Royle, J. Andrew
2008-01-01
Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).
Murray, Nicholas P; Hunfalvay, Melissa
2017-02-01
Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.
Structural and Functional Analyses of the Proteins Involved in the Iron-Sulfur Cluster Biosynthesis
NASA Astrophysics Data System (ADS)
Wada, Kei
The iron-sulfur (Fe-S) clusters are ubiquitous prosthetic groups that are required to maintain such fundamental life processes as respiratory chain, photosynthesis and the regulation of gene expression. Assembly of intracellular Fe-S cluster requires the sophisticated biosynthetic systems called ISC and SUF machineries. To shed light on the molecular mechanism of Fe-S cluster assembly mediated by SUF machinery, several structures of the SUF components and their sub-complex were determined. The structural findings together with biochemical characterization of the core-complex (SufB-SufC-SufD complex) have led me to propose a working model for the cluster biosynthesis in the SUF machinery.
Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2010-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138
Thermodynamically accessible titanium clusters TiN, N = 2-32.
Lazauskas, Tomas; Sokol, Alexey A; Buckeridge, John; Catlow, C Richard A; Escher, Susanne G E T; Farrow, Matthew R; Mora-Fonz, David; Blum, Volker W; Phaahla, Tshegofatso M; Chauke, Hasani R; Ngoepe, Phuti E; Woodley, Scott M
2018-05-10
We have performed a genetic algorithm search on the tight-binding interatomic potential energy surface (PES) for small TiN (N = 2-32) clusters. The low energy candidate clusters were further refined using density functional theory (DFT) calculations with the PBEsol exchange-correlation functional and evaluated with the PBEsol0 hybrid functional. The resulting clusters were analysed in terms of their structural features, growth mechanism and surface area. The results suggest a growth mechanism that is based on forming coordination centres by interpenetrating icosahedra, icositetrahedra and Frank-Kasper polyhedra. We identify centres of coordination, which act as centres of bulk nucleation in medium sized clusters and determine the morphological features of the cluster.
Pfeiffenberger, Erik; Chaleil, Raphael A.G.; Moal, Iain H.
2017-01-01
ABSTRACT Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc. PMID:27935158
NASA Astrophysics Data System (ADS)
Sawangwit, U.; Shanks, T.; Abdalla, F. B.; Cannon, R. D.; Croom, S. M.; Edge, A. C.; Ross, Nicholas P.; Wake, D. A.
2011-10-01
We present the angular correlation function measured from photometric samples comprising 1562 800 luminous red galaxies (LRGs). Three LRG samples were extracted from the Sloan Digital Sky Survey (SDSS) imaging data, based on colour-cut selections at redshifts, z≈ 0.35, 0.55 and 0.7 as calibrated by the spectroscopic surveys, SDSS-LRG, 2dF-SDSS LRG and QSO (quasi-stellar object) (2SLAQ) and the AAΩ-LRG survey. The galaxy samples cover ≈7600 deg2 of sky, probing a total cosmic volume of ≈5.5 h-3 Gpc3. The small- and intermediate-scale correlation functions generally show significant deviations from a single power-law fit with a well-detected break at ≈1 h-1 Mpc, consistent with the transition scale between the one- and two-halo terms in halo occupation models. For galaxy separations 1-20 h-1 Mpc and at fixed luminosity, we see virtually no evolution of the clustering with redshift and the data are consistent with a simple high peaks biasing model where the comoving LRG space density is constant with z. At fixed z, the LRG clustering amplitude increases with luminosity in accordance with the simple high peaks model, with a typical LRG dark matter halo mass 1013-1014 h-1 M⊙. For r < 1 h-1 Mpc, the evolution is slightly faster and the clustering decreases towards high redshift consistent with a virialized clustering model. However, assuming the halo occupation distribution (HOD) and Λ cold dark matter (ΛCDM) halo merger frameworks, ˜2-3 per cent/Gyr of the LRGs are required to merge in order to explain the small scales clustering evolution, consistent with previous results. At large scales, our result shows good agreement with the SDSS-LRG result of Eisenstein et al. but we find an apparent excess clustering signal beyond the baryon acoustic oscillations (BAO) scale. Angular power spectrum analyses of similar LRG samples also detect a similar apparent large-scale clustering excess but more data are required to check for this feature in independent galaxy data sets. Certainly, if the ΛCDM model were correct then we would have to conclude that this excess was caused by systematics at the level of Δw≈ 0.001-0.0015 in the photometric AAΩ-LRG sample.
Study of Clusters and Hypernuclei production within PHSD+FRIGA model
NASA Astrophysics Data System (ADS)
Kireyeu, V.; Le Fèvre, A.; Bratkovskaya, E.
2017-01-01
We report on the results on the dynamical modelling of cluster formation with the new combined PHSD+FRIGA model at Nuclotron and NICA energies. The FRIGA clusterisation algorithm, which can be applied to the transport models, is based on the simulated annealing technique to obtain the most bound configuration of fragments and nucleons. The PHSD+FRIGA model is able to predict isotope yields as well as hyper-nucleus production. Based on present predictions of the combined model we study the possibility to detect such clusters and hypernuclei in the BM@N and MPD/NICA detectors.
NASA Astrophysics Data System (ADS)
Liu, Jianjun; Kan, Jianquan
2018-04-01
In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.
Does the cognitive dispute of psychotic symptoms do harm to the therapeutic alliance?
Wittorf, Andreas; Jakobi, Ute E; Bannert, Kerstin K; Bechdolf, Andreas; Müller, Bernhard W; Sartory, Gudrun; Wagner, Michael; Wiedemann, Georg; Wölwer, Wolfgang; Herrlich, Jutta; Buchkremer, Gerhard; Klingberg, Stefan
2010-07-01
We examined whether the cognitive dispute of psychotic symptoms has a negative impact on the course of the therapeutic alliance. Sixty-seven patients with persistent psychotic symptoms received either cognitive behavioral therapy (CBT) or supportive therapy. Questionnaire-based alliance ratings were repeatedly obtained throughout the course of therapy. Patient and therapist alliance ratings were examined separately. Data analyses comprised repeated measurement analyses of variance and cluster analytic procedures. Neither patient nor therapist alliance ratings showed a differential course throughout the treatments. This was despite the implementation of disputing strategies in later stages of CBT. Irrespective of the treatment condition a cluster with a positive alliance rating and a cluster with a poorer rating were found for therapist and patient ratings, respectively. Baseline symptoms and insight differentiated between the types of clusters. In conclusion, CBT-specific interventions that challenge psychotic symptoms do not necessarily negatively influence the course of the alliance.
Echodu, Richard; Opiyo, Elizabeth A.; Dion, Kirstin; Halyard, Alexis; Dunn, Augustine W.; Aksoy, Serap; Caccone, Adalgisa
2017-01-01
Uganda is the only country where the chronic and acute forms of human African Trypanosomiasis (HAT) or sleeping sickness both occur and are separated by < 100 km in areas north of Lake Kyoga. In Uganda, Glossina fuscipes fuscipes is the main vector of the Trypanosoma parasites responsible for these diseases as well for the animal African Trypanosomiasis (AAT), or Nagana. We used highly polymorphic microsatellite loci and a mitochondrial DNA (mtDNA) marker to provide fine scale spatial resolution of genetic structure of G. f. fuscipes from 42 sampling sites from the northern region of Uganda where a merger of the two disease belts is feared. Based on microsatellite analyses, we found that G. f. fuscipes in northern Uganda are structured into three distinct genetic clusters with varying degrees of interconnectivity among them. Based on genetic assignment and spatial location, we grouped the sampling sites into four genetic units corresponding to northwestern Uganda in the Albert Nile drainage, northeastern Uganda in the Lake Kyoga drainage, western Uganda in the Victoria Nile drainage, and a transition zone between the two northern genetic clusters characterized by high level of genetic admixture. An analysis using HYBRIDLAB supported a hybrid swarm model as most consistent with tsetse genotypes in these admixed samples. Results of mtDNA analyses revealed the presence of 30 haplotypes representing three main haplogroups, whose location broadly overlaps with the microsatellite defined clusters. Migration analyses based on microsatellites point to moderate migration among the northern units located in the Albert Nile, Achwa River, Okole River, and Lake Kyoga drainages, but not between the northern units and the Victoria Nile drainage in the west. Effective population size estimates were variable with low to moderate sizes in most populations and with evidence of recent population bottlenecks, especially in the northeast unit of the Lake Kyoga drainage. Our microsatellite and mtDNA based analyses indicate that G. f. fuscipes movement along the Achwa and Okole rivers may facilitate northwest expansion of the Rhodesiense disease belt in Uganda. We identified tsetse migration corridors and recommend a rolling carpet approach from south of Lake Kyoga northward to minimize disease dispersal and prevent vector re-colonization. Additionally, our findings highlight the need for continuing tsetse monitoring efforts during and after control. PMID:28453513
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
Wernisch, Lorenz
2017-01-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.
Gabasova, Evelina; Reid, John; Wernisch, Lorenz
2017-10-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
Clustering change patterns using Fourier transformation with time-course gene expression data.
Kim, Jaehee
2011-01-01
To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials
Diaz-Ordaz, Karla; Bartlett, Jonathan W
2016-01-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.
Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W
2017-06-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
Yin, Yihang; Liu, Fengzheng; Zhou, Xiang; Li, Quanzhong
2015-08-07
Wireless sensor networks (WSNs) have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA). First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.
Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong
2013-01-01
As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.
Zaag, Rim; Tamby, Jean Philippe; Guichard, Cécile; Tariq, Zakia; Rigaill, Guillem; Delannoy, Etienne; Renou, Jean-Pierre; Balzergue, Sandrine; Mary-Huard, Tristan; Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Brunaud, Véronique
2015-01-01
CATdb (http://urgv.evry.inra.fr/CATdb) is a database providing a public access to a large collection of transcriptomic data, mainly for Arabidopsis but also for other plants. This resource has the rare advantage to contain several thousands of microarray experiments obtained with the same technical protocol and analyzed by the same statistical pipelines. In this paper, we present GEM2Net, a new module of CATdb that takes advantage of this homogeneous dataset to mine co-expression units and decipher Arabidopsis gene functions. GEM2Net explores 387 stress conditions organized into 18 biotic and abiotic stress categories. For each one, a model-based clustering is applied on expression differences to identify clusters of co-expressed genes. To characterize functions associated with these clusters, various resources are analyzed and integrated: Gene Ontology, subcellular localization of proteins, Hormone Families, Transcription Factor Families and a refined stress-related gene list associated to publications. Exploiting protein-protein interactions and transcription factors-targets interactions enables to display gene networks. GEM2Net presents the analysis of the 18 stress categories, in which 17,264 genes are involved and organized within 681 co-expression clusters. The meta-data analyses were stored and organized to compose a dynamic Web resource. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.
Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai
2016-03-01
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS
McParland, Damien; Gormley, Isobel Claire; McCormick, Tyler H.; Clark, Samuel J.; Kabudula, Chodziwadziwa Whiteson; Collinson, Mark A.
2014-01-01
The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region. PMID:25485026
Medium resolution spectroscopy and chemical composition of Galactic globular clusters
NASA Astrophysics Data System (ADS)
Khamidullina, D. A.; Sharina, M. E.; Shimansky, V. V.; Davoust, E.
We used integrated-light medium-resolution spectra of six Galactic globular clusters and model stellar atmospheres to carry out population synthesis and to derive chemical composition and age of the clusters. We used medium-resolution spectra of globular clusters published by Schiavon et al. (2005), as well as our long-slit observations with the 1.93 m telescope of the Haute Provence Observatory. The observed spectra were fitted to the theoretical ones interactively. As an initial approach, we used masses, radii and log g of stars in the clusters corresponding to the best fitting isochrones in the observed color-magnitude diagrams. The computed synthetic blanketed spectra of stars were summed according to the Chabrier mass function. To improve the determination of age and helium content, the shape and depth of the Balmer absorption lines was analysed. The abundances of Mg, Ca, C and several other elements were derived. A reasonable agreement with the literature data both in chemical composition and in age of the clusters is found. Our method might be useful for the development of stellar population models and for a better understanding of extragalactic star clusters.
Modulated Modularity Clustering as an Exploratory Tool for Functional Genomic Inference
Stone, Eric A.; Ayroles, Julien F.
2009-01-01
In recent years, the advent of high-throughput assays, coupled with their diminishing cost, has facilitated a systems approach to biology. As a consequence, massive amounts of data are currently being generated, requiring efficient methodology aimed at the reduction of scale. Whole-genome transcriptional profiling is a standard component of systems-level analyses, and to reduce scale and improve inference clustering genes is common. Since clustering is often the first step toward generating hypotheses, cluster quality is critical. Conversely, because the validation of cluster-driven hypotheses is indirect, it is critical that quality clusters not be obtained by subjective means. In this paper, we present a new objective-based clustering method and demonstrate that it yields high-quality results. Our method, modulated modularity clustering (MMC), seeks community structure in graphical data. MMC modulates the connection strengths of edges in a weighted graph to maximize an objective function (called modularity) that quantifies community structure. The result of this maximization is a clustering through which tightly-connected groups of vertices emerge. Our application is to systems genetics, and we quantitatively compare MMC both to the hierarchical clustering method most commonly employed and to three popular spectral clustering approaches. We further validate MMC through analyses of human and Drosophila melanogaster expression data, demonstrating that the clusters we obtain are biologically meaningful. We show MMC to be effective and suitable to applications of large scale. In light of these features, we advocate MMC as a standard tool for exploration and hypothesis generation. PMID:19424432
Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P
2015-04-14
Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for choline fermentation (the cut gene cluster) have been recently identified, there has been no characterization of these genes in human gut isolates and microbial communities. In this work, we use multiple approaches to demonstrate that the pathway encoded by the cut genes is present and functional in a diverse range of human gut bacteria and is also widespread in stool metagenomes. We also developed a PCR-based strategy to detect a key functional gene (cutC) involved in this pathway and applied it to characterize newly isolated choline-utilizing strains. Both our analyses of the cut gene cluster and this molecular tool will aid efforts to further understand the role of choline metabolism in the human gut microbiota and its link to disease. Copyright © 2015 Martínez-del Campo et al.
Distributive Education Competency-Based Curriculum Models by Occupational Clusters. Final Report.
ERIC Educational Resources Information Center
Davis, Rodney E.; Husted, Stewart W.
To meet the needs of distributive education teachers and students, a project was initiated to develop competency-based curriculum models for marketing and distributive education clusters. The models which were developed incorporate competencies, materials and resources, teaching methodologies/learning activities, and evaluative criteria for the…
A Symmetric Time-Varying Cluster Rate of Descent Model
NASA Technical Reports Server (NTRS)
Ray, Eric S.
2015-01-01
A model of the time-varying rate of descent of the Orion vehicle was developed based on the observed correlation between canopy projected area and drag coefficient. This initial version of the model assumes cluster symmetry and only varies the vertical component of velocity. The cluster fly-out angle is modeled as a series of sine waves based on flight test data. The projected area of each canopy is synchronized with the primary fly-out angle mode. The sudden loss of projected area during canopy collisions is modeled at minimum fly-out angles, leading to brief increases in rate of descent. The cluster geometry is converted to drag coefficient using empirically derived constants. A more complete model is under development, which computes the aerodynamic response of each canopy to its local incidence angle.
Döpp, Carola M E; Graff, Maud J L; Teerenstra, Steven; Nijhuis-van der Sanden, Maria W G; Olde Rikkert, Marcel G M; Vernooij-Dassen, Myrra J F J
2013-05-30
To evaluate the effectiveness of a multifaceted implementation strategy on physicians' referral rate to and knowledge on the community occupational therapy in dementia program (COTiD program). A cluster randomized controlled trial with 28 experimental and 17 control clusters was conducted. Cluster included a minimum of one physician, one manager, and two occupational therapists. In the control group physicians and managers received no interventions and occupational therapists received a postgraduate course. In the experimental group physicians and managers had access to a website, received newsletters, and were approached by telephone. In addition, physicians were offered one outreach visit. In the experimental group occupational therapists received the postgraduate course, training days, outreach visits, regional meetings, and access to a reporting system. Main outcome measure was the number of COTiD referrals received by each cluster which was assessed at 6 and 12 months after the start of the intervention. Referrals were included from both participating physicians (enrolled in the study and received either the control or experimental intervention) and non-participating physicians (not enrolled but of whom referrals were received by participating occupational therapists). Mixed model analyses were used to analyze the data. All analyses were based on the principle of intention-to-treat. At 12 months experimental clusters received significantly more referrals with an average of 5,24 referrals (SD 5,75) to the COTiD program compared to 2,07 referrals in the control group (SD 5,14). The effect size at 12 months was 0.58. Although no difference in referral rate was found for the physicians participating in the study, the number of referrals from non-participating physicians (t -2,55 / 43 / 0,02) differed significantly at 12 months. Passive dissemination strategies are less likely to result in changes in professional behavior. The amount of physicians exposed to active strategies was limited. In spite of this we found a significant difference in the number of referrals which was accounted for by more referrals of non-participating physicians in the experimental clusters. We hypothesize that the increase in referrals was caused by an increase in occupational therapists' efforts to promote their services within their network. NCT01117285.
Geographic Variation of Amyotrophic Lateral Sclerosis Incidence in New Jersey, 2009–2011
Henry, Kevin A.; Fagliano, Jerald; Jordan, Heather M.; Rechtman, Lindsay; Kaye, Wendy E.
2015-01-01
Few analyses in the United States have examined geographic variation and socioeconomic disparities in amyotrophic lateral sclerosis (ALS) incidence, because of lack of population-based incidence data. In this analysis, we used population-based ALS data to identify whether ALS incidence clusters geographically and to determine whether ALS risk varies by area-based socioeconomic status (SES). This study included 493 incident ALS cases diagnosed (via El Escorial criteria) in New Jersey between 2009 and 2011. Geographic variation and clustering of ALS incidence was assessed using a spatial scan statistic and Bayesian geoadditive models. Poisson regression was used to estimate the associations between ALS risk and SES based on census-tract median income while controlling for age, sex, and race. ALS incidence varied across and within counties, but there were no statistically significant geographic clusters. SES was associated with ALS incidence. After adjustment for age, sex, and race, the relative risk of ALS was significantly higher (relative risk (RR) = 1.37, 95% confidence interval (CI): 1.02, 1.82) in the highest income quartile than in the lowest. The relative risk of ALS was significantly lower among blacks (RR = 0.57, 95% CI: 0.39, 0.83) and Asians (RR = 0.63, 95% CI: 0.41, 0.97) than among whites. Our findings suggest that ALS incidence in New Jersey appears to be associated with SES and race. PMID:26041711
Su, Chun-Kuei; Chiang, Chia-Hsun; Lee, Chia-Ming; Fan, Yu-Pei; Ho, Chiu-Ming; Shyu, Liang-Yu
2013-01-01
Sympathetic nerves conveying central commands to regulate visceral functions often display activities in synchronous bursts. To understand how individual fibers fire synchronously, we establish “oligofiber recording techniques” to record “several” nerve fiber activities simultaneously, using in vitro splanchnic sympathetic nerve–thoracic spinal cord preparations of neonatal rats as experimental models. While distinct spike potentials were easily recorded from collagenase-dissociated sympathetic fibers, a problem arising from synchronous nerve discharges is a higher incidence of complex waveforms resulted from spike overlapping. Because commercial softwares do not provide an explicit solution for spike overlapping, a series of custom-made LabVIEW programs incorporated with MATLAB scripts was therefore written for spike sorting. Spikes were represented as data points after waveform feature extraction and automatically grouped by k-means clustering followed by principal component analysis (PCA) to verify their waveform homogeneity. For dissimilar waveforms with exceeding Hotelling's T2 distances from the cluster centroids, a unique data-based subtraction algorithm (SA) was used to determine if they were the complex waveforms resulted from superimposing a spike pattern close to the cluster centroid with the other signals that could be observed in original recordings. In comparisons with commercial software, higher accuracy was achieved by analyses using our algorithms for the synthetic data that contained synchronous spiking and complex waveforms. Moreover, both T2-selected and SA-retrieved spikes were combined as unit activities. Quantitative analyses were performed to evaluate if unit activities truly originated from single fibers. We conclude that applications of our programs can help to resolve synchronous sympathetic nerve discharges (SND). PMID:24198782
Vanbinst, Kiran; Ceulemans, Eva; Peters, Lien; Ghesquière, Pol; De Smedt, Bert
2018-02-01
Although symbolic numerical magnitude processing skills are key for learning arithmetic, their developmental trajectories remain unknown. Therefore, we delineated during the first 3years of primary education (5-8years of age) groups with distinguishable developmental trajectories of symbolic numerical magnitude processing skills using a model-based clustering approach. Three clusters were identified and were labeled as inaccurate, accurate but slow, and accurate and fast. The clusters did not differ in age, sex, socioeconomic status, or IQ. We also tested whether these clusters differed in domain-specific (nonsymbolic magnitude processing and digit identification) and domain-general (visuospatial short-term memory, verbal working memory, and processing speed) cognitive competencies that might contribute to children's ability to (efficiently) process the numerical meaning of Arabic numerical symbols. We observed minor differences between clusters in these cognitive competencies except for verbal working memory for which no differences were observed. Follow-up analyses further revealed that the above-mentioned cognitive competencies did not merely account for the cluster differences in children's development of symbolic numerical magnitude processing skills, suggesting that other factors account for these individual differences. On the other hand, the three trajectories of symbolic numerical magnitude processing revealed remarkable and stable differences in children's arithmetic fact retrieval, which stresses the importance of symbolic numerical magnitude processing for learning arithmetic. Copyright © 2017 Elsevier Inc. All rights reserved.
Spatial dynamics of invasion: the geometry of introduced species.
Korniss, Gyorgy; Caraco, Thomas
2005-03-07
Many exotic species combine low probability of establishment at each introduction with rapid population growth once introduction does succeed. To analyse this phenomenon, we note that invaders often cluster spatially when rare, and consequently an introduced exotic's population dynamics should depend on locally structured interactions. Ecological theory for spatially structured invasion relies on deterministic approximations, and determinism does not address the observed uncertainty of the exotic-introduction process. We take a new approach to the population dynamics of invasion and, by extension, to the general question of invasibility in any spatial ecology. We apply the physical theory for nucleation of spatial systems to a lattice-based model of competition between plant species, a resident and an invader, and the analysis reaches conclusions that differ qualitatively from the standard ecological theories. Nucleation theory distinguishes between dynamics of single- and multi-cluster invasion. Low introduction rates and small system size produce single-cluster dynamics, where success or failure of introduction is inherently stochastic. Single-cluster invasion occurs only if the cluster reaches a critical size, typically preceded by a number of failed attempts. For this case, we identify the functional form of the probability distribution of time elapsing until invasion succeeds. Although multi-cluster invasion for sufficiently large systems exhibits spatial averaging and almost-deterministic dynamics of the global densities, an analytical approximation from nucleation theory, known as Avrami's law, describes our simulation results far better than standard ecological approximations.
A null model for microbial diversification
Straub, Timothy J.
2017-01-01
Whether prokaryotes (Bacteria and Archaea) are naturally organized into phenotypically and genetically cohesive units comparable to animal or plant species remains contested, frustrating attempts to estimate how many such units there might be, or to identify the ecological roles they play. Analyses of gene sequences in various closely related prokaryotic groups reveal that sequence diversity is typically organized into distinct clusters, and processes such as periodic selection and extensive recombination are understood to be drivers of cluster formation (“speciation”). However, observed patterns are rarely compared with those obtainable with simple null models of diversification under stochastic lineage birth and death and random genetic drift. Via a combination of simulations and analyses of core and phylogenetic marker genes, we show that patterns of diversity for the genera Escherichia, Neisseria, and Borrelia are generally indistinguishable from patterns arising under a null model. We suggest that caution should thus be taken in interpreting observed clustering as a result of selective evolutionary forces. Unknown forces do, however, appear to play a role in Helicobacter pylori, and some individual genes in all groups fail to conform to the null model. Taken together, we recommend the presented birth−death model as a null hypothesis in prokaryotic speciation studies. It is only when the real data are statistically different from the expectations under the null model that some speciation process should be invoked. PMID:28630293
Comparisons of non-Gaussian statistical models in DNA methylation analysis.
Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-06-16
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis
Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-01-01
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Ontology-based topic clustering for online discussion data
NASA Astrophysics Data System (ADS)
Wang, Yongheng; Cao, Kening; Zhang, Xiaoming
2013-03-01
With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.
Price Formation Based on Particle-Cluster Aggregation
NASA Astrophysics Data System (ADS)
Wang, Shijun; Zhang, Changshui
In the present work, we propose a microscopic model of financial markets based on particle-cluster aggregation on a two-dimensional small-world information network in order to simulate the dynamics of the stock markets. "Stylized facts" of the financial market time series, such as fat-tail distribution of returns, volatility clustering and multifractality, are observed in the model. The results of the model agree with empirical data taken from historical records of the daily closures of the NYSE composite index.
Chang, Hsien-Yen; Weiner, Jonathan P
2010-01-18
Diagnosis-based risk adjustment is becoming an important issue globally as a result of its implications for payment, high-risk predictive modelling and provider performance assessment. The Taiwanese National Health Insurance (NHI) programme provides universal coverage and maintains a single national computerized claims database, which enables the application of diagnosis-based risk adjustment. However, research regarding risk adjustment is limited. This study aims to examine the performance of the Adjusted Clinical Group (ACG) case-mix system using claims-based diagnosis information from the Taiwanese NHI programme. A random sample of NHI enrollees was selected. Those continuously enrolled in 2002 were included for concurrent analyses (n = 173,234), while those in both 2002 and 2003 were included for prospective analyses (n = 164,562). Health status measures derived from 2002 diagnoses were used to explain the 2002 and 2003 health expenditure. A multivariate linear regression model was adopted after comparing the performance of seven different statistical models. Split-validation was performed in order to avoid overfitting. The performance measures were adjusted R2 and mean absolute prediction error of five types of expenditure at individual level, and predictive ratio of total expenditure at group level. The more comprehensive models performed better when used for explaining resource utilization. Adjusted R2 of total expenditure in concurrent/prospective analyses were 4.2%/4.4% in the demographic model, 15%/10% in the ACGs or ADGs (Aggregated Diagnosis Group) model, and 40%/22% in the models containing EDCs (Expanded Diagnosis Cluster). When predicting expenditure for groups based on expenditure quintiles, all models underpredicted the highest expenditure group and overpredicted the four other groups. For groups based on morbidity burden, the ACGs model had the best performance overall. Given the widespread availability of claims data and the superior explanatory power of claims-based risk adjustment models over demographics-only models, Taiwan's government should consider using claims-based models for policy-relevant applications. The performance of the ACG case-mix system in Taiwan was comparable to that found in other countries. This suggested that the ACG system could be applied to Taiwan's NHI even though it was originally developed in the USA. Many of the findings in this paper are likely to be relevant to other diagnosis-based risk adjustment methodologies.
A roadmap of clustering algorithms: finding a match for a biomedical application.
Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael
2009-05-01
Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110
NASA Astrophysics Data System (ADS)
Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin
2014-06-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.
Hierarchical Dirichlet process model for gene expression clustering
2013-01-01
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
Prediction of Fracture Behavior in Rock and Rock-like Materials Using Discrete Element Models
NASA Astrophysics Data System (ADS)
Katsaga, T.; Young, P.
2009-05-01
The study of fracture initiation and propagation in heterogeneous materials such as rock and rock-like materials are of principal interest in the field of rock mechanics and rock engineering. It is crucial to study and investigate failure prediction and safety measures in civil and mining structures. Our work offers a practical approach to predict fracture behaviour using discrete element models. In this approach, the microstructures of materials are presented through the combination of clusters of bonded particles with different inter-cluster particle and bond properties, and intra-cluster bond properties. The geometry of clusters is transferred from information available from thin sections, computed tomography (CT) images and other visual presentation of the modeled material using customized AutoCAD built-in dialog- based Visual Basic Application. Exact microstructures of the tested sample, including fractures, faults, inclusions and void spaces can be duplicated in the discrete element models. Although the microstructural fabrics of rocks and rock-like structures may have different scale, fracture formation and propagation through these materials are alike and will follow similar mechanics. Synthetic material provides an excellent condition for validating the modelling approaches, as fracture behaviours are known with the well-defined composite's properties. Calibration of the macro-properties of matrix material and inclusions (aggregates), were followed with the overall mechanical material responses calibration by adjusting the interfacial properties. The discrete element model predicted similar fracture propagation features and path as that of the real sample material. The path of the fractures and matrix-inclusion interaction was compared using computed tomography images. Initiation and fracture formation in the model and real material were compared using Acoustic Emission data. Analysing the temporal and spatial evolution of AE events, collected during the sample testing, in relation to the CT images allows the precise reconstruction of the failure sequence. Our proposed modelling approach illustrates realistic fracture formation and growth predictions at different loading conditions.
Cluster Analysis of Atmospheric Dynamics and Pollution Transport in a Coastal Area
NASA Astrophysics Data System (ADS)
Sokolov, Anton; Dmitriev, Egor; Maksimovich, Elena; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmentin, Marc; Locoge, Nadine
2016-11-01
Summertime atmospheric dynamics in the coastal zone of the industrialized Dunkerque agglomeration in northern France was characterized by a cluster analysis of back trajectories in the context of pollution transport. The MESO-NH atmospheric model was used to simulate the local dynamics at multiple scales with horizontal resolution down to 500 m, and for the online calculation of the Lagrangian backward trajectories with 30-min temporal resolution. Airmass transport was performed along six principal pathways obtained by the weighted k-means clustering technique. Four of these centroids corresponded to a range of wind speeds over the English Channel: two for wind directions from the north-east and two from the south-west. Another pathway corresponded to a south-westerly continental transport. The backward trajectories of the largest and most dispersed sixth cluster contained low wind speeds, including sea-breeze circulations. Based on analyses of meteorological data and pollution measurements, the principal atmospheric pathways were related to local air-contamination events. Continuous air quality and meteorological data were collected during the Benzene-Toluene-Ethylbenzene-Xylene 2006 campaign. The sites of the pollution measurements served as the endpoints for the backward trajectories. Pollutant transport pathways corresponding to the highest air contamination were defined.
Liu, Xin
2015-10-30
In a cognitive sensor network (CSN), the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs) becomes very large. In this paper, a novel wireless power transfer (WPT)-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio frequency (RF) energy of the primary node (PN) to the cluster heads, in order to supply the electrical energy needed for sensing and cooperation. A joint resource optimization is formulated to maximize the spectrum access probability of the CSN, through jointly allocating sensing time and clustering number. According to the resource optimization results, a clustering algorithm is proposed. The simulation results have shown that compared to the traditional model, the cluster heads of the proposed model can achieve more transmission power and there exists optimal sensing time and clustering number to maximize the spectrum access probability.
Reconstruction of a digital core containing clay minerals based on a clustering algorithm.
He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling
2017-10-01
It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.
Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo
2017-12-01
To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo
2015-02-01
Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.
Online clustering algorithms for radar emitter classification.
Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max
2005-08-01
Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
Dai, Di; Shang, Hong; Han, Xiao-Xu; Zhao, Bin; Liu, Jing; Ding, Hai-Bo; Xu, Jun-Jie; Chu, Zhen-Xing
2015-04-01
To investigate the molecular subtypes of prevalent HIV-1 strains and characterize the genetics of dominant strains among men who have sex with men. Molecular epidemiology surveys in this study concentrated on the prevalent HIV-1 strains in Liaoning province by year. 229 adult patients infected with HIV-1 and part of a high-risk group of men who have sex with men were recruited. Reverse transcription and nested PCR amplification were performed. Sequencing reactions were conducted and edited, followed by codon-based alignment. NJ phylogenetic tree analyses detected two distinct CRF01_AE phylogenetic clusters, designated clusters 1 and 2. Clusters 1 and 2 accounted for 12.8% and 84.2% of sequences in the pol gene and 17.6% and 73.1% of sequences in the env gene, respectively. Another six samples were distributed on other phylogenetic clusters. Cluster 1 increased significantly from 5.6% to 20.0%, but cluster 2 decreased from 87.5% to 80.0%. Genetic distance analysis indicated that CRF01_AE cluster 1 in Liaoning was homologous to epidemic CRF01_AE strains, but CRF01_AE cluster 2 was different from other scattered strains. Additionally, significant differences were found in tetra-peptide motifs at the tip of V3 loop between cluster 1 and 2; however, differences in coreceptor usage were not detected. This study shows that subtype CRF01_AE strain may be the most prevalent epidemic strain in the men who have sex with men. Genetic characteristics of the subtype CRF01_AE cluster strain in Liaoning showed homology to the prevalent strains of men who have sex with men in other parts of China. © 2015 Wiley Periodicals, Inc.
Resaland, Geir K; Aadland, Eivind; Moe, Vegard Fusche; Aadland, Katrine N; Skrede, Turid; Stavnsbo, Mette; Suominen, Laura; Steene-Johannessen, Jostein; Glosvik, Øyvind; Andersen, John R; Kvalheim, Olav M; Engelsrud, Gunn; Andersen, Lars B; Holme, Ingar M; Ommundsen, Yngvar; Kriemler, Susi; van Mechelen, Willem; McKay, Heather A; Ekelund, Ulf; Anderssen, Sigmund A
2016-10-01
To investigate the effect of a seven-month, school-based cluster-randomized controlled trial on academic performance in 10-year-old children. In total, 1129 fifth-grade children from 57 elementary schools in Sogn og Fjordane County, Norway, were cluster-randomized by school either to the intervention group or to the control group. The children in the 28 intervention schools participated in a physical activity intervention between November 2014 and June 2015 consisting of three components: 1) 90min/week of physically active educational lessons mainly carried out in the school playground; 2) 5min/day of physical activity breaks during classroom lessons; 3) 10min/day physical activity homework. Academic performance in numeracy, reading and English was measured using standardized Norwegian national tests. Physical activity was measured objectively by accelerometry. We found no effect of the intervention on academic performance in primary analyses (standardized difference 0.01-0.06, p>0.358). Subgroup analyses, however, revealed a favorable intervention effect for those who performed the poorest at baseline (lowest tertile) for numeracy (p=0.005 for the subgroup∗group interaction), compared to controls (standardized difference 0.62, 95% CI 0.19-1.07). This large, rigorously conducted cluster RCT in 10-year-old children supports the notion that there is still inadequate evidence to conclude that increased physical activity in school enhances academic achievement in all children. Still, combining physical activity and learning seems a viable model to stimulate learning in those academically weakest schoolchildren. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
2008-05-12
The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
NASA Astrophysics Data System (ADS)
Jee, Myungkook James
2006-06-01
Clusters of galaxies, the largest gravitationally bound objects in the Universe, are useful tracers of cosmic evolution, and particularly detailed studies of still-forming clusters at high-redshifts can considerably enhance our understanding of the structure formation. We use two powerful methods that have become recently available for the study of these distant clusters: spaced- based gravitational weak-lensing and high-resolution X-ray observations. Detailed analyses of five high-redshift (0.8 < z < 1.3) clusters are presented based on the deep Advanced Camera for Surveys (ACS) and Chandra X-ray images. We show that, when the instrumental characteristics are properly understood, the newly installed ACS on the Hubble Space Telescope (HST) can detect subtle shape distortions of background galaxies down to the limiting magnitudes of the observations, which enables the mapping of the cluster dark matter in unprecedented high-resolution. The cluster masses derived from this HST /ACS weak-lensing study have been compared with those from the re-analyses of the archival Chandra X-ray data. We find that there are interesting offsets between the cluster galaxy, intracluster medium (ICM), and dark matter centroids, and possible scenarios are discussed. If the offset is confirmed to be uniquitous in other clusters, the explanation may necessitate major refinements in our current understanding of the nature of dark matter, as well as the cluster galaxy dynamics. CL0848+4452, the highest-redshift ( z = 1.27) cluster yet detected in weak-lensing, has a significant discrepancy between the weak- lensing and X-ray masses. If this trend is found to be severe and common also for other X-ray weak clusters at redshifts beyond the unity, the conventional X-ray determination of cluster mass functions, often inferred from their immediate X-ray properties such as the X-ray luminosity and temperature via the so-called mass-luminosity (M-L) and mass-temperature (M-T) relations, will become highly unstable in this redshift regime. Therefore, the relatively unbiased weak-lensing measurements of the cluster mass properties can be used to adequately calibrate the scaling relations in future high-redshift cluster investigations.
Evaluation of Low-Voltage Distribution Network Index Based on Improved Principal Component Analysis
NASA Astrophysics Data System (ADS)
Fan, Hanlu; Gao, Suzhou; Fan, Wenjie; Zhong, Yinfeng; Zhu, Lei
2018-01-01
In order to evaluate the development level of the low-voltage distribution network objectively and scientifically, chromatography analysis method is utilized to construct evaluation index model of low-voltage distribution network. Based on the analysis of principal component and the characteristic of logarithmic distribution of the index data, a logarithmic centralization method is adopted to improve the principal component analysis algorithm. The algorithm can decorrelate and reduce the dimensions of the evaluation model and the comprehensive score has a better dispersion degree. The clustering method is adopted to analyse the comprehensive score because the comprehensive score of the courts is concentrated. Then the stratification evaluation of the courts is realized. An example is given to verify the objectivity and scientificity of the evaluation method.
Automatic pole-like object modeling via 3D part-based analysis of point cloud
NASA Astrophysics Data System (ADS)
He, Liu; Yang, Haoxiang; Huang, Yuchun
2016-10-01
Pole-like objects, including trees, lampposts and traffic signs, are indispensable part of urban infrastructure. With the advance of vehicle-based laser scanning (VLS), massive point cloud of roadside urban areas becomes applied in 3D digital city modeling. Based on the property that different pole-like objects have various canopy parts and similar trunk parts, this paper proposed the 3D part-based shape analysis to robustly extract, identify and model the pole-like objects. The proposed method includes: 3D clustering and recognition of trunks, voxel growing and part-based 3D modeling. After preprocessing, the trunk center is identified as the point that has local density peak and the largest minimum inter-cluster distance. Starting from the trunk centers, the remaining points are iteratively clustered to the same centers of their nearest point with higher density. To eliminate the noisy points, cluster border is refined by trimming boundary outliers. Then, candidate trunks are extracted based on the clustering results in three orthogonal planes by shape analysis. Voxel growing obtains the completed pole-like objects regardless of overlaying. Finally, entire trunk, branch and crown part are analyzed to obtain seven feature parameters. These parameters are utilized to model three parts respectively and get signal part-assembled 3D model. The proposed method is tested using the VLS-based point cloud of Wuhan University, China. The point cloud includes many kinds of trees, lampposts and other pole-like posters under different occlusions and overlaying. Experimental results show that the proposed method can extract the exact attributes and model the roadside pole-like objects efficiently.
Cross-scale analysis of cluster correspondence using different operational neighborhoods
NASA Astrophysics Data System (ADS)
Lu, Yongmei; Thill, Jean-Claude
2008-09-01
Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of multi-location events is examined for cluster correspondence if they are associated with the same operational neighborhood. This paper discusses the construction of operational neighborhoods for cluster correspondence analysis based on the spatial properties of the underlying zoning system and the scales at which the zones are aggregated into neighborhoods. Impacts of this construction on the degree of cluster correspondence are also analyzed. Empirical analyses of cluster correspondence between paired vehicle theft and recovery locations are conducted on different zoning methods and across a series of geographic scales and the dynamics of cluster correspondence patterns are discussed.
Torheim, Turid; Groendahl, Aurora R; Andersen, Erlend K F; Lyng, Heidi; Malinen, Eirik; Kvaal, Knut; Futsaether, Cecilia M
2016-11-01
Solid tumors are known to be spatially heterogeneous. Detection of treatment-resistant tumor regions can improve clinical outcome, by enabling implementation of strategies targeting such regions. In this study, K-means clustering was used to group voxels in dynamic contrast enhanced magnetic resonance images (DCE-MRI) of cervical cancers. The aim was to identify clusters reflecting treatment resistance that could be used for targeted radiotherapy with a dose-painting approach. Eighty-one patients with locally advanced cervical cancer underwent DCE-MRI prior to chemoradiotherapy. The resulting image time series were fitted to two pharmacokinetic models, the Tofts model (yielding parameters K trans and ν e ) and the Brix model (A Brix , k ep and k el ). K-means clustering was used to group similar voxels based on either the pharmacokinetic parameter maps or the relative signal increase (RSI) time series. The associations between voxel clusters and treatment outcome (measured as locoregional control) were evaluated using the volume fraction or the spatial distribution of each cluster. One voxel cluster based on the RSI time series was significantly related to locoregional control (adjusted p-value 0.048). This cluster consisted of low-enhancing voxels. We found that tumors with poor prognosis had this RSI-based cluster gathered into few patches, making this cluster a potential candidate for targeted radiotherapy. None of the voxels clusters based on Tofts or Brix parameter maps were significantly related to treatment outcome. We identified one group of tumor voxels significantly associated with locoregional relapse that could potentially be used for dose painting. This tumor voxel cluster was identified using the raw MRI time series rather than the pharmacokinetic maps.
Patanasatienkul, Thitiwan; Sanchez, Javier; Rees, Erin E; Pfeiffer, Dirk; Revie, Crawford W
2015-06-15
Sea lice infestation levels on wild chum and pink salmon in the Broughton Archipelago region are known to vary spatially and temporally; however, the locations of areas associated with a high infestation level had not been investigated yet. In the present study, the multivariate spatial scan statistic based on a Poisson model was used to assess spatial clustering of elevated sea lice (Caligus clemensi and Lepeophtheirus salmonis) infestation levels on wild chum and pink salmon sampled between March and July of 2004 to 2012 in the Broughton Archipelago and Knight Inlet regions of British Columbia, Canada. Three covariates, seine type (beach and purse seining), fish size, and year effect, were used to provide adjustment within the analyses. The analyses were carried out across the five months/datasets and between two fish species to assess the consistency of the identified clusters. Sea lice stages were explored separately for the early life stages (non-motile) and the late life stages of sea lice (motile). Spatial patterns in fish migration were also explored using monthly plots showing the average number of each fish species captured per sampling site. The results revealed three clusters for non-motile C. clemensi, two clusters for non-motile L. salmonis, and one cluster for the motile stage in each of the sea lice species. In general, the location and timing of clusters detected for both fish species were similar. Early in the season, the clusters of elevated sea lice infestation levels on wild fish are detected in areas closer to the rivers, with decreasing relative risks as the season progresses. Clusters were detected further from the estuaries later in the season, accompanied by increasing relative risks. In addition, the plots for fish migration exhibit similar patterns for both fish species in that, as expected, the juveniles move from the rivers toward the open ocean as the season progresses The identification of space-time clustering of infestation on wild fish from this study can help in targeting investigations of factors associated with these infestations and thereby support the development of more effective sea lice control measures. Copyright © 2015 Elsevier B.V. All rights reserved.
A Linear Algebra Measure of Cluster Quality.
ERIC Educational Resources Information Center
Mather, Laura A.
2000-01-01
Discussion of models for information retrieval focuses on an application of linear algebra to text clustering, namely, a metric for measuring cluster quality based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. Explains term-document matrices and clustering algorithms. (Author/LRW)
Alvarado-Sizzo, Hernán; Parra, Fabiola; Arreola-Nava, Hilda Julieta; Terrazas, Teresa; Sánchez, Cristian
2018-01-01
The Stenocereus griseus species complex (SGSC) has long been considered taxonomically challenging because the number of taxa belonging to the complex and their geographical boundaries remain poorly understood. Bayesian clustering and genetic distance-based methods were used based on nine microsatellite loci in 377 individuals of three main putative species of the complex. The resulting genetic clusters were assessed for ecological niche divergence and areolar morphology, particularly spination patterns. We based our species boundaries on concordance between genetic, ecological, and morphological data, and were able to resolve four species, three of them corresponding to S. pruinosus from central Mexico, S. laevigatus from southern Mexico, and S. griseus from northern South America. A fourth species, previously considered to be S. griseus and commonly misidentified as S. pruinosus in northern Mexico showed significant genetic, ecological, and morphological differentiation suggesting that it should be considered a new species, S. huastecorum, which we describe here. We show that population genetic analyses, ecological niche modeling, and morphological studies are complementary approaches for delimiting species in taxonomically challenging plant groups such as the SGSC. PMID:29342184
Alvarado-Sizzo, Hernán; Casas, Alejandro; Parra, Fabiola; Arreola-Nava, Hilda Julieta; Terrazas, Teresa; Sánchez, Cristian
2018-01-01
The Stenocereus griseus species complex (SGSC) has long been considered taxonomically challenging because the number of taxa belonging to the complex and their geographical boundaries remain poorly understood. Bayesian clustering and genetic distance-based methods were used based on nine microsatellite loci in 377 individuals of three main putative species of the complex. The resulting genetic clusters were assessed for ecological niche divergence and areolar morphology, particularly spination patterns. We based our species boundaries on concordance between genetic, ecological, and morphological data, and were able to resolve four species, three of them corresponding to S. pruinosus from central Mexico, S. laevigatus from southern Mexico, and S. griseus from northern South America. A fourth species, previously considered to be S. griseus and commonly misidentified as S. pruinosus in northern Mexico showed significant genetic, ecological, and morphological differentiation suggesting that it should be considered a new species, S. huastecorum, which we describe here. We show that population genetic analyses, ecological niche modeling, and morphological studies are complementary approaches for delimiting species in taxonomically challenging plant groups such as the SGSC.
Network structure of subway passenger flows
NASA Astrophysics Data System (ADS)
Xu, Q.; Mao, B. H.; Bai, Y.
2016-03-01
The results of transportation infrastructure network analyses have been used to analyze complex networks in a topological context. However, most modeling approaches, including those based on complex network theory, do not fully account for real-life traffic patterns and may provide an incomplete view of network functions. This study utilizes trip data obtained from the Beijing Subway System to characterize individual passenger movement patterns. A directed weighted passenger flow network was constructed from the subway infrastructure network topology by incorporating trip data. The passenger flow networks exhibit several properties that can be characterized by power-law distributions based on flow size, and log-logistic distributions based on the fraction of boarding and departing passengers. The study also characterizes the temporal patterns of in-transit and waiting passengers and provides a hierarchical clustering structure for passenger flows. This hierarchical flow organization varies in the spatial domain. Ten cluster groups were identified, indicating a hierarchical urban polycentric structure composed of large concentrated flows at urban activity centers. These empirical findings provide insights regarding urban human mobility patterns within a large subway network.
Algorithmic localisation of noise sources in the tip region of a low-speed axial flow fan
NASA Astrophysics Data System (ADS)
Tóth, Bence; Vad, János
2017-04-01
An objective and algorithmised methodology is proposed to analyse beamform data obtained for axial fans. Its application is demonstrated in a case study regarding the tip region of a low-speed cooling fan. First, beamforming is carried out in a co-rotating frame of reference. Then, a distribution of source strength is extracted along the circumference of the rotor at the blade tip radius in each analysed third-octave band. The circumferential distributions are expanded into Fourier series, which allows for filtering out the effects of perturbations, on the basis of an objective criterion. The remaining Fourier components are then considered as base sources to determine the blade-passage-periodic flow mechanisms responsible for the broadband noise. Based on their frequency and angular location, the base sources are grouped together. This is done using the fuzzy c-means clustering method to allow the overlap of the source mechanisms. The number of clusters is determined in a validity analysis. Finally, the obtained clusters are assigned to source mechanisms based on the literature. Thus, turbulent boundary layer - trailing edge interaction noise, tip leakage flow noise, and double leakage flow noise are identified.
Avoiding Boundary Estimates in Hierarchical Linear Models through Weakly Informative Priors
ERIC Educational Resources Information Center
Chung, Yeojin; Rabe-Hesketh, Sophia; Gelman, Andrew; Dorie, Vincent; Liu, Jinchen
2012-01-01
Hierarchical or multilevel linear models are widely used for longitudinal or cross-sectional data on students nested in classes and schools, and are particularly important for estimating treatment effects in cluster-randomized trials, multi-site trials, and meta-analyses. The models can allow for variation in treatment effects, as well as…
Espinosa, Manuel O; Polop, Francisco; Rotela, Camilo H; Abril, Marcelo; Scavuzzo, Carlos M
2016-11-21
The main objective of this study was to obtain and analyse the space-time dynamics of Aedes aegypti breeding sites in Clorinda City, Formosa Province, Argentina coupled with landscape analysis using the maximum entropy approach in order to generate a dengue vector niche model. In urban areas, without vector control activities, 12 entomologic (larval) samplings were performed during three years (October 2011 to October 2014). The entomologic surveillance area represented 16,511 houses. Predictive models for Aedes distribution were developed using vector breeding abundance data, density analysis, clustering and geoprocessing techniques coupled with Earth observation satellite data. The spatial analysis showed a vector spatial distribution pattern with clusters of high density in the central region of Clorinda with a well-defined high-risk area in the western part of the city. It also showed a differential temporal behaviour among different areas, which could have implications for risk models and control strategies at the urban scale. The niche model obtained for Ae. aegypti, based on only one year of field data, showed that 85.8% of the distribution of breeding sites is explained by the percentage of water supply (48.2%), urban distribution (33.2%), and the percentage of urban coverage (4.4%). The consequences for the development of control strategies are discussed with reference to the results obtained using distribution maps based on environmental variables.
Shen, Chung-Wei; Chen, Yi-Hau
2018-03-13
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jianbao; Ma, Zhongjun, E-mail: mzj1234402@163.com; Chen, Guanrong
All edges in the classical Watts and Strogatz's small-world network model are unweighted and cooperative (positive). By introducing competitive (negative) inter-cluster edges and assigning edge weights to mimic more realistic networks, this paper develops a modified model which possesses co-competitive weighted couplings and cluster structures while maintaining the common small-world network properties of small average shortest path lengths and large clustering coefficients. Based on theoretical analysis, it is proved that the new model with inter-cluster co-competition balance has an important dynamical property of robust cluster synchronous pattern formation. More precisely, clusters will neither merge nor split regardless of adding ormore » deleting nodes and edges, under the condition of inter-cluster co-competition balance. Numerical simulations demonstrate the robustness of the model against the increase of the coupling strength and several topological variations.« less
NASA Astrophysics Data System (ADS)
Zhang, Jianbao; Ma, Zhongjun; Chen, Guanrong
2014-06-01
All edges in the classical Watts and Strogatz's small-world network model are unweighted and cooperative (positive). By introducing competitive (negative) inter-cluster edges and assigning edge weights to mimic more realistic networks, this paper develops a modified model which possesses co-competitive weighted couplings and cluster structures while maintaining the common small-world network properties of small average shortest path lengths and large clustering coefficients. Based on theoretical analysis, it is proved that the new model with inter-cluster co-competition balance has an important dynamical property of robust cluster synchronous pattern formation. More precisely, clusters will neither merge nor split regardless of adding or deleting nodes and edges, under the condition of inter-cluster co-competition balance. Numerical simulations demonstrate the robustness of the model against the increase of the coupling strength and several topological variations.
Friesen, Melissa C; Shortreed, Susan M; Wheeler, David C; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S; Baris, Dalsu; Karagas, Margaret R; Schwenn, Molly; Johnson, Alison; Armenti, Karla R; Silverman, Debra T; Yu, Kai
2015-05-01
Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m(-3) respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters' homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job's estimate and the mean estimate for all jobs within the cluster. Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.
ERIC Educational Resources Information Center
Henrico County Public Schools, Glen Allen, VA. Virginia Vocational Curriculum and Resource Center.
Developed in Virginia, this publication contains task analysis guides to support selected tech prep programs that prepare students for careers in the health and human services cluster. Occupations profiled are physical therapist aide and physical therapist assistant. Each guide contains the following elements: (1) an occupational task list derived…
Selim, Alfredo; Rogers, William; Qian, Shirley; Rothendler, James A; Kent, Erin E; Kazis, Lewis E
2018-04-19
To develop bridging algorithms to score the Veterans Rand-12 (VR-12) scales for comparability to those of the SF-36® for facilitating multi-cohort studies using data from the National Cancer Institute Surveillance, Epidemiology, and End Results Program (SEER) linked to Medicare Health Outcomes Survey (MHOS), and to provide a model for minimizing non-statistical error in pooled analyses stemming from changes to survey instruments over time. Observational study of MHOS cohorts 1-12 (1998-2011). We modeled 2-year follow-up SF-36 scale scores from cohorts 1-6 based on baseline SF-36 scores, age, and gender, yielding 100 clusters using Classification and Regression Trees. Within each cluster, we averaged follow-up SF-36 scores. Using the same cluster specifications, expected follow-up SF-36 scores, based on cohorts 1-6, were computed for cohorts 7-8 (where the VR-12 was the follow-up survey). We created a new criterion validity measure, termed "extensibility," calculated from the square root of the mean square difference between expected SF-36 scale averages and observed VR-12 item score from cohorts 7-8, weighted by cluster size. VR-12 items were rescored to minimize this quantity. Extensibility of rescored VR-12 items and scales was considerably improved from the "simple" scoring method for comparability to the SF-36 scales. The algorithms are appropriate across a wide range of potential subsamples within the MHOS and provide robust application for future studies that span the SF-36 and VR-12 eras. It is possible that these surveys in a different setting outside the MHOS, especially in younger age groups, could produce somewhat different results.
An algebraic cluster model based on the harmonic oscillator basis
NASA Technical Reports Server (NTRS)
Levai, Geza; Cseh, J.
1995-01-01
We discuss the semimicroscopic algebraic cluster model introduced recently, in which the internal structure of the nuclear clusters is described by the harmonic oscillator shell model, while their relative motion is accounted for by the Vibron model. The algebraic formulation of the model makes extensive use of techniques associated with harmonic oscillators and their symmetry group, SU(3). The model is applied to some cluster systems and is found to reproduce important characteristics of nuclei in the sd-shell region. An approximate SU(3) dynamical symmetry is also found to hold for the C-12 + C-12 system.
Multi-mode clustering model for hierarchical wireless sensor networks
NASA Astrophysics Data System (ADS)
Hu, Xiangdong; Li, Yongfu; Xu, Huifen
2017-03-01
The topology management, i.e., clusters maintenance, of wireless sensor networks (WSNs) is still a challenge due to its numerous nodes, diverse application scenarios and limited resources as well as complex dynamics. To address this issue, a multi-mode clustering model (M2 CM) is proposed to maintain the clusters for hierarchical WSNs in this study. In particular, unlike the traditional time-trigger model based on the whole-network and periodic style, the M2 CM is proposed based on the local and event-trigger operations. In addition, an adaptive local maintenance algorithm is designed for the broken clusters in the WSNs using the spatial-temporal demand changes accordingly. Numerical experiments are performed using the NS2 network simulation platform. Results validate the effectiveness of the proposed model with respect to the network maintenance costs, node energy consumption and transmitted data as well as the network lifetime.
Non-proportional odds multivariate logistic regression of ordinal family data.
Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C
2015-03-01
Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Morgan, Katy E; Forbes, Andrew B; Keogh, Ruth H; Jairath, Vipul; Kahan, Brennan C
2017-01-30
In cluster randomised cross-over (CRXO) trials, clusters receive multiple treatments in a randomised sequence over time. In such trials, there is usual correlation between patients in the same cluster. In addition, within a cluster, patients in the same period may be more similar to each other than to patients in other periods. We demonstrate that it is necessary to account for these correlations in the analysis to obtain correct Type I error rates. We then use simulation to compare different methods of analysing a binary outcome from a two-period CRXO design. Our simulations demonstrated that hierarchical models without random effects for period-within-cluster, which do not account for any extra within-period correlation, performed poorly with greatly inflated Type I errors in many scenarios. In scenarios where extra within-period correlation was present, a hierarchical model with random effects for cluster and period-within-cluster only had correct Type I errors when there were large numbers of clusters; with small numbers of clusters, the error rate was inflated. We also found that generalised estimating equations did not give correct error rates in any scenarios considered. An unweighted cluster-level summary regression performed best overall, maintaining an error rate close to 5% for all scenarios, although it lost power when extra within-period correlation was present, especially for small numbers of clusters. Results from our simulation study show that it is important to model both levels of clustering in CRXO trials, and that any extra within-period correlation should be accounted for. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Dynamic Fuzzy Model Development for a Drum-type Boiler-turbine Plant Through GK Clustering
NASA Astrophysics Data System (ADS)
Habbi, Ahcène; Zelmat, Mimoun
2008-10-01
This paper discusses a TS fuzzy model identification method for an industrial drum-type boiler plant using the GK fuzzy clustering approach. The fuzzy model is constructed from a set of input-output data that covers a wide operating range of the physical plant. The reference data is generated using a complex first-principle-based mathematical model that describes the key dynamical properties of the boiler-turbine dynamics. The proposed fuzzy model is derived by means of fuzzy clustering method with particular attention on structure flexibility and model interpretability issues. This may provide a basement of a new way to design model based control and diagnosis mechanisms for the complex nonlinear plant.
Combining Mixture Components for Clustering*
Baudry, Jean-Patrick; Raftery, Adrian E.; Celeux, Gilles; Lo, Kenneth; Gottardo, Raphaël
2010-01-01
Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K. These clusterings can be compared on substantive grounds, and we also describe an automatic way of selecting the number of clusters via a piecewise linear regression fit to the rescaled entropy plot. We illustrate the method with simulated data and a flow cytometry dataset. Supplemental Materials are available on the journal Web site and described at the end of the paper. PMID:20953302
Zhang, Wei; Zhang, Xiaolong; Qiang, Yan; Tian, Qi; Tang, Xiaoxian
2017-01-01
The fast and accurate segmentation of lung nodule image sequences is the basis of subsequent processing and diagnostic analyses. However, previous research investigating nodule segmentation algorithms cannot entirely segment cavitary nodules, and the segmentation of juxta-vascular nodules is inaccurate and inefficient. To solve these problems, we propose a new method for the segmentation of lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise (DBSCAN). First, our method uses three-dimensional computed tomography image features of the average intensity projection combined with multi-scale dot enhancement for preprocessing. Hexagonal clustering and morphological optimized sequential linear iterative clustering (HMSLIC) for sequence image oversegmentation is then proposed to obtain superpixel blocks. The adaptive weight coefficient is then constructed to calculate the distance required between superpixels to achieve precise lung nodules positioning and to obtain the subsequent clustering starting block. Moreover, by fitting the distance and detecting the change in slope, an accurate clustering threshold is obtained. Thereafter, a fast DBSCAN superpixel sequence clustering algorithm, which is optimized by the strategy of only clustering the lung nodules and adaptive threshold, is then used to obtain lung nodule mask sequences. Finally, the lung nodule image sequences are obtained. The experimental results show that our method rapidly, completely and accurately segments various types of lung nodule image sequences. PMID:28880916
Hu, Valerie W.; Steinberg, Mara E.
2009-01-01
Heterogeneity in phenotypic presentation of ASD has been cited as one explanation for the difficulty in pinpointing specific genes involved in autism. Recent studies have attempted to reduce the “noise” in genetic and other biological data by reducing the phenotypic heterogeneity of the sample population. The current study employs multiple clustering algorithms on 123 item scores from the Autism Diagnostic Interview-Revised (ADI-R) diagnostic instrument of nearly 2000 autistic individuals to identify subgroups of autistic probands with clinically relevant behavioral phenotypes in order to isolate more homogeneous groups of subjects for gene expression analyses. Our combined cluster analyses suggest optimal division of the autistic probands into 4 phenotypic clusters based on similarity of symptom severity across the 123 selected item scores. One cluster is characterized by severe language deficits, while another exhibits milder symptoms across the domains. A third group possesses a higher frequency of savant skills while the fourth group exhibited intermediate severity across all domains. Grouping autistic individuals by multivariate cluster analysis of ADI-R scores reveals meaningful phenotypes of subgroups within the autistic spectrum which we show, in a related (accompanying) study, to be associated with distinct gene expression profiles. PMID:19455643
Chen, Ling; Feng, Yanqin; Sun, Jianguo
2017-10-01
This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
Friesen, Melissa C.; Shortreed, Susan M.; Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Armenti, Karla R.; Silverman, Debra T.; Yu, Kai
2015-01-01
Objectives: Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Methods: Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m−3 respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters’ homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job’s estimate and the mean estimate for all jobs within the cluster. Results: Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. Conclusions: This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. PMID:25477475
Hogerwerf, Lenny; Holstege, Manon M C; Benincà, Elisa; Dijkstra, Frederika; van der Hoek, Wim
2017-07-26
Human psittacosis is a highly under diagnosed zoonotic disease, commonly linked to psittacine birds. Psittacosis in birds, also known as avian chlamydiosis, is endemic in poultry, but the risk for people living close to poultry farms is unknown. Therefore, our study aimed to explore the temporal and spatial patterns of human psittacosis infections and identify possible associations with poultry farming in the Netherlands. We analysed data on 700 human cases of psittacosis notified between 01-01-2000 and 01-09-2015. First, we studied the temporal behaviour of psittacosis notifications by applying wavelet analysis. Then, to identify possible spatial patterns, we applied spatial cluster analysis. Finally, we investigated the possible spatial association between psittacosis notifications and data on the Dutch poultry sector at municipality level using a multivariable model. We found a large spatial cluster that covered a highly poultry-dense area but additional clusters were found in areas that had a low poultry density. There were marked geographical differences in the awareness of psittacosis and the amount and the type of laboratory diagnostics used for psittacosis, making it difficult to draw conclusions about the correlation between the large cluster and poultry density. The multivariable model showed that the presence of chicken processing plants and slaughter duck farms in a municipality was associated with a higher rate of human psittacosis notifications. The significance of the associations was influenced by the inclusion or exclusion of farm density in the model. Our temporal and spatial analyses showed weak associations between poultry-related variables and psittacosis notifications. Because of the low number of psittacosis notifications available for analysis, the power of our analysis was relative low. Because of the exploratory nature of this research, the associations found cannot be interpreted as evidence for airborne transmission of psittacosis from poultry to the general population. Further research is needed to determine the prevalence of C. psittaci in Dutch poultry. Also, efforts to promote PCR-based testing for C. psittaci and genotyping for source tracing are important to reduce the diagnostic deficit, and to provide better estimates of the human psittacosis burden, and the possible role of poultry.
Parcellation of left parietal tool representations by functional connectivity
Garcea, Frank E.; Z. Mahon, Bradford
2014-01-01
Manipulating a tool according to its function requires the integration of visual, conceptual, and motor information, a process subserved in part by left parietal cortex. How these different types of information are integrated and how their integration is reflected in neural responses in the parietal lobule remains an open question. Here, participants viewed images of tools and animals during functional magnetic resonance imaging (fMRI). K-means clustering over time series data was used to parcellate left parietal cortex into subregions based on functional connectivity to a whole brain network of regions involved in tool processing. One cluster, in the inferior parietal cortex, expressed privileged functional connectivity to the left ventral premotor cortex. A second cluster, in the vicinity of the anterior intraparietal sulcus, expressed privileged functional connectivity with the left medial fusiform gyrus. A third cluster in the superior parietal lobe expressed privileged functional connectivity with dorsal occipital cortex. Control analyses using Monte Carlo style permutation tests demonstrated that the clustering solutions were outside the range of what would be observed based on chance ‘lumpiness’ in random data, or mere anatomical proximity. Finally, hierarchical clustering analyses were used to formally relate the resulting parcellation scheme of left parietal tool representations to previous work that has parcellated the left parietal lobule on purely anatomical grounds. These findings demonstrate significant heterogeneity in the functional organization of manipulable object representations in left parietal cortex, and outline a framework that generates novel predictions about the causes of some forms of upper limb apraxia. PMID:24892224
Shimoyama, Hiromitsu
2018-05-07
Calmodulin (CaM) is a multifunctional calcium-binding protein, which regulates various biochemical processes. CaM acts via structural changes and complex forming with its target enzymes. CaM has two globular domains (N-lobe and C-lobe) connected by a long linker region. Upon calcium binding, the N-lobe and C-lobe undergo local conformational changes, after that, entire CaM wraps the target enzyme through a large conformational change. However, the regulation mechanism, such as allosteric interactions regulating the conformational changes, is still unclear. In order to clarify the allosteric interactions, in this study, experimentally obtained 'real' structures are compared to 'model' structures lacking the allosteric interactions. As the allosteric interactions would be absent in calcium-free CaM (apo-CaM), allostery-eliminated calcium-bound CaM (holo-CaM) models were constructed by combining the apo-CaM's linker and the holo-CaM's N- and C-lobe. Before the comparison, the 'real' and 'model' structures were clustered and cluster-cluster relationship was determined by a principal component analysis. The structures were compared based on the relationship, then, a distance map and a contact probability analysis clarified that the inter-domain motion is regulated by several groups of inter-domain contacting residue pairs. The analyses suggested that these residues cause inter-domain translation and rotation, and as a consequence, the motion encourage structural diversity. The resultant diversity would contribute to the functional versatility of CaM.
Sánchez, Ariel G.; Grieb, Jan Niklas; Salazar-Albornoz, Salvador; ...
2016-09-30
The cosmological information contained in anisotropic galaxy clustering measurements can often be compressed into a small number of parameters whose posterior distribution is well described by a Gaussian. Here, we present a general methodology to combine these estimates into a single set of consensus constraints that encode the total information of the individual measurements, taking into account the full covariance between the different methods. We also illustrate this technique by applying it to combine the results obtained from different clustering analyses, including measurements of the signature of baryon acoustic oscillations and redshift-space distortions, based on a set of mock cataloguesmore » of the final SDSS-III Baryon Oscillation Spectroscopic Survey (BOSS). Our results show that the region of the parameter space allowed by the consensus constraints is smaller than that of the individual methods, highlighting the importance of performing multiple analyses on galaxy surveys even when the measurements are highly correlated. Our paper is part of a set that analyses the final galaxy clustering data set from BOSS. The methodology presented here is used in Alam et al. to produce the final cosmological constraints from BOSS.« less
Possible world based consistency learning model for clustering and classifying uncertain data.
Liu, Han; Zhang, Xianchao; Zhang, Xiaotong
2018-06-01
Possible world has shown to be effective for handling various types of data uncertainty in uncertain data management. However, few uncertain data clustering and classification algorithms are proposed based on possible world. Moreover, existing possible world based algorithms suffer from the following issues: (1) they deal with each possible world independently and ignore the consistency principle across different possible worlds; (2) they require the extra post-processing procedure to obtain the final result, which causes that the effectiveness highly relies on the post-processing method and the efficiency is also not very good. In this paper, we propose a novel possible world based consistency learning model for uncertain data, which can be extended both for clustering and classifying uncertain data. This model utilizes the consistency principle to learn a consensus affinity matrix for uncertain data, which can make full use of the information across different possible worlds and then improve the clustering and classification performance. Meanwhile, this model imposes a new rank constraint on the Laplacian matrix of the consensus affinity matrix, thereby ensuring that the number of connected components in the consensus affinity matrix is exactly equal to the number of classes. This also means that the clustering and classification results can be directly obtained without any post-processing procedure. Furthermore, for the clustering and classification tasks, we respectively derive the efficient optimization methods to solve the proposed model. Experimental results on real benchmark datasets and real world uncertain datasets show that the proposed model outperforms the state-of-the-art uncertain data clustering and classification algorithms in effectiveness and performs competitively in efficiency. Copyright © 2018 Elsevier Ltd. All rights reserved.
Bansal, Ravi; Peterson, Bradley S
2018-06-01
Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
2013-01-01
Background There is a rising public and political demand for prospective cancer cluster monitoring. But there is little empirical evidence on the performance of established cluster detection tests under conditions of small and heterogeneous sample sizes and varying spatial scales, such as are the case for most existing population-based cancer registries. Therefore this simulation study aims to evaluate different cluster detection methods, implemented in the open soure environment R, in their ability to identify clusters of lung cancer using real-life data from an epidemiological cancer registry in Germany. Methods Risk surfaces were constructed with two different spatial cluster types, representing a relative risk of RR = 2.0 or of RR = 4.0, in relation to the overall background incidence of lung cancer, separately for men and women. Lung cancer cases were sampled from this risk surface as geocodes using an inhomogeneous Poisson process. The realisations of the cancer cases were analysed within small spatial (census tracts, N = 1983) and within aggregated large spatial scales (communities, N = 78). Subsequently, they were submitted to the cluster detection methods. The test accuracy for cluster location was determined in terms of detection rates (DR), false-positive (FP) rates and positive predictive values. The Bayesian smoothing models were evaluated using ROC curves. Results With moderate risk increase (RR = 2.0), local cluster tests showed better DR (for both spatial aggregation scales > 0.90) and lower FP rates (both < 0.05) than the Bayesian smoothing methods. When the cluster RR was raised four-fold, the local cluster tests showed better DR with lower FPs only for the small spatial scale. At a large spatial scale, the Bayesian smoothing methods, especially those implementing a spatial neighbourhood, showed a substantially lower FP rate than the cluster tests. However, the risk increases at this scale were mostly diluted by data aggregation. Conclusion High resolution spatial scales seem more appropriate as data base for cancer cluster testing and monitoring than the commonly used aggregated scales. We suggest the development of a two-stage approach that combines methods with high detection rates as a first-line screening with methods of higher predictive ability at the second stage. PMID:24314148
Detection of protein complex from protein-protein interaction network using Markov clustering
NASA Astrophysics Data System (ADS)
Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.
2017-05-01
Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.
Galaxy clusters in the cosmic web
NASA Astrophysics Data System (ADS)
Acebrón, A.; Durret, F.; Martinet, N.; Adami, C.; Guennou, L.
2014-12-01
Simulations of large scale structure formation in the universe predict that matter is essentially distributed along filaments at the intersection of which lie galaxy clusters. We have analysed 9 clusters in the redshift range 0.4
Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition.
Bianne-Bernard, Anne-Laure; Menasri, Farès; Al-Hajj Mohamad, Rami; Mokbel, Chafic; Kermorvant, Christopher; Likforman-Sulem, Laurence
2011-10-01
This study aims at building an efficient word recognition system resulting from the combination of three handwriting recognizers. The main component of this combined system is an HMM-based recognizer which considers dynamic and contextual information for a better modeling of writing units. For modeling the contextual units, a state-tying process based on decision tree clustering is introduced. Decision trees are built according to a set of expert-based questions on how characters are written. Questions are divided into global questions, yielding larger clusters, and precise questions, yielding smaller ones. Such clustering enables us to reduce the total number of models and Gaussians densities by 10. We then apply this modeling to the recognition of handwritten words. Experiments are conducted on three publicly available databases based on Latin or Arabic languages: Rimes, IAM, and OpenHart. The results obtained show that contextual information embedded with dynamic modeling significantly improves recognition.
NASA Technical Reports Server (NTRS)
Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara
2000-01-01
We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.
An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network
Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian
2015-01-01
Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish–Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection. PMID:26447696
An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network.
Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian
2015-01-01
Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish-Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection.
A cluster expansion model for predicting activation barrier of atomic processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rehman, Tafizur; Jaipal, M.; Chatterjee, Abhijit, E-mail: achatter@iitk.ac.in
2013-06-15
We introduce a procedure based on cluster expansion models for predicting the activation barrier of atomic processes encountered while studying the dynamics of a material system using the kinetic Monte Carlo (KMC) method. Starting with an interatomic potential description, a mathematical derivation is presented to show that the local environment dependence of the activation barrier can be captured using cluster interaction models. Next, we develop a systematic procedure for training the cluster interaction model on-the-fly, which involves: (i) obtaining activation barriers for handful local environments using nudged elastic band (NEB) calculations, (ii) identifying the local environment by analyzing the NEBmore » results, and (iii) estimating the cluster interaction model parameters from the activation barrier data. Once a cluster expansion model has been trained, it is used to predict activation barriers without requiring any additional NEB calculations. Numerical studies are performed to validate the cluster expansion model by studying hop processes in Ag/Ag(100). We show that the use of cluster expansion model with KMC enables efficient generation of an accurate process rate catalog.« less
NASA Astrophysics Data System (ADS)
Roediger, Joel C.; Courteau, Stéphane; Graves, Genevieve; Schiavon, Ricardo P.
2014-01-01
We present an extensive literature compilation of age, metallicity, and chemical abundance pattern information for the 41 Galactic globular clusters (GGCs) studied by Schiavon et al. Our compilation constitutes a notable improvement over previous similar work, particularly in terms of chemical abundances. Its primary purpose is to enable detailed evaluations of and refinements to stellar population synthesis models designed to recover the above information for unresolved stellar systems based on their integrated spectra. However, since the Schiavon sample spans a wide range of the known GGC parameter space, our compilation may also benefit investigations related to a variety of astrophysical endeavors, such as the early formation of the Milky Way, the chemical evolution of GGCs, and stellar evolution and nucleosynthesis. For instance, we confirm with our compiled data that the GGC system has a bimodal metallicity distribution and is uniformly enhanced in the α elements. When paired with the ages of our clusters, we find evidence that supports a scenario whereby the Milky Way obtained its globular clusters through two channels: in situ formation and accretion of satellite galaxies. The distributions of C, N, O, and Na abundances and the dispersions thereof per cluster corroborate the known fact that all GGCs studied so far with respect to multiple stellar populations have been found to harbor them. Finally, using data on individual stars, we verify that stellar atmospheres become progressively polluted by CN(O)-processed material after they leave the main sequence. We also uncover evidence which suggests that the α elements Mg and Ca may originate from more than one nucleosynthetic production site. We estimate that our compilation incorporates all relevant analyses from the literature up to mid-2012. As an aid to investigators in the fields named above, we provide detailed electronic tables of the data upon which our work is based at http://www.astro.queensu.ca/people/Stephane_Courteau/roediger2013/index.html.
NASA Astrophysics Data System (ADS)
Okolelova, Ella; Shibaeva, Marina; Shalnev, Oleg
2018-03-01
The article analyses risks in high-rise construction in terms of investment value with account of the maximum probable loss in case of risk event. The authors scrutinized the risks of high-rise construction in regions with various geographic, climatic and socio-economic conditions that may influence the project environment. Risk classification is presented in general terms, that includes aggregated characteristics of risks being common for many regions. Cluster analysis tools, that allow considering generalized groups of risk depending on their qualitative and quantitative features, were used in order to model the influence of the risk factors on the implementation of investment project. For convenience of further calculations, each type of risk is assigned a separate code with the number of the cluster and the subtype of risk. This approach and the coding of risk factors makes it possible to build a risk matrix, which greatly facilitates the task of determining the degree of impact of risks. The authors clarified and expanded the concept of the price risk, which is defined as the expected value of the event, 105 which extends the capabilities of the model, allows estimating an interval of the probability of occurrence and also using other probabilistic methods of calculation.
CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence
Nepal, Madhav P; Benson, Benjamin V
2015-01-01
Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the Ks-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future. PMID:25922568
CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence.
Nepal, Madhav P; Benson, Benjamin V
2015-01-01
Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the K s-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future.
Hurwitz, Bonnie L; Westveld, Anton H; Brum, Jennifer R; Sullivan, Matthew B
2014-07-22
Long-standing questions in marine viral ecology are centered on understanding how viral assemblages change along gradients in space and time. However, investigating these fundamental ecological questions has been challenging due to incomplete representation of naturally occurring viral diversity in single gene- or morphology-based studies and an inability to identify up to 90% of reads in viral metagenomes (viromes). Although protein clustering techniques provide a significant advance by helping organize this unknown metagenomic sequence space, they typically use only ∼75% of the data and rely on assembly methods not yet tuned for naturally occurring sequence variation. Here, we introduce an annotation- and assembly-free strategy for comparative metagenomics that combines shared k-mer and social network analyses (regression modeling). This robust statistical framework enables visualization of complex sample networks and determination of ecological factors driving community structure. Application to 32 viromes from the Pacific Ocean Virome dataset identified clusters of samples broadly delineated by photic zone and revealed that geographic region, depth, and proximity to shore were significant predictors of community structure. Within subsets of this dataset, depth, season, and oxygen concentration were significant drivers of viral community structure at a single open ocean station, whereas variability along onshore-offshore transects was driven by oxygen concentration in an area with an oxygen minimum zone and not depth or proximity to shore, as might be expected. Together these results demonstrate that this highly scalable approach using complete metagenomic network-based comparisons can both test and generate hypotheses for ecological investigation of viral and microbial communities in nature.
Hurwitz, Bonnie L.; Westveld, Anton H.; Brum, Jennifer R.; Sullivan, Matthew B.
2014-01-01
Long-standing questions in marine viral ecology are centered on understanding how viral assemblages change along gradients in space and time. However, investigating these fundamental ecological questions has been challenging due to incomplete representation of naturally occurring viral diversity in single gene- or morphology-based studies and an inability to identify up to 90% of reads in viral metagenomes (viromes). Although protein clustering techniques provide a significant advance by helping organize this unknown metagenomic sequence space, they typically use only ∼75% of the data and rely on assembly methods not yet tuned for naturally occurring sequence variation. Here, we introduce an annotation- and assembly-free strategy for comparative metagenomics that combines shared k-mer and social network analyses (regression modeling). This robust statistical framework enables visualization of complex sample networks and determination of ecological factors driving community structure. Application to 32 viromes from the Pacific Ocean Virome dataset identified clusters of samples broadly delineated by photic zone and revealed that geographic region, depth, and proximity to shore were significant predictors of community structure. Within subsets of this dataset, depth, season, and oxygen concentration were significant drivers of viral community structure at a single open ocean station, whereas variability along onshore–offshore transects was driven by oxygen concentration in an area with an oxygen minimum zone and not depth or proximity to shore, as might be expected. Together these results demonstrate that this highly scalable approach using complete metagenomic network-based comparisons can both test and generate hypotheses for ecological investigation of viral and microbial communities in nature. PMID:25002514
Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.
Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V
2017-09-30
Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.
Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species
Andersen, Ethan J.; Neupane, Surendra; Benson, Benjamin V.
2017-01-01
Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis, we investigated nTNL orthologs in the genomes of common bean, Medicago, soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis, common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence. PMID:28973974
NASA Astrophysics Data System (ADS)
Adamaki, A.; Roberts, R.
2016-12-01
For many years an important aim in seismological studies has been forecasting the occurrence of large earthquakes. Despite some well-established statistical behavior of earthquake sequences, expressed by e.g. the Omori law for aftershock sequences and the Gutenburg-Richter distribution of event magnitudes, purely statistical approaches to short-term earthquake prediction have in general not been successful. It seems that better understanding of the processes leading to critical stress build-up prior to larger events is necessary to identify useful precursory activity, if this exists, and statistical analyses are an important tool in this context. There has been considerable debate on the usefulness or otherwise of foreshock studies for short-term earthquake prediction. We investigate generic patterns of foreshock activity using aggregated data and by studying not only strong but also moderate magnitude events. Aggregating empirical local seismicity time series prior to larger events observed in and around Greece reveals a statistically significant increasing rate of seismicity over 20 days prior to M>3.5 earthquakes. This increase cannot be explained by tempo-spatial clustering models such as ETAS, implying genuine changes in the mechanical situation just prior to larger events and thus the possible existence of useful precursory information. Because of tempo-spatial clustering, including aftershocks to foreshocks, even if such generic behavior exists it does not necessarily follow that foreshocks have the potential to provide useful precursory information for individual larger events. Using synthetic catalogs produced based on different clustering models and different presumed system sensitivities we are now investigating to what extent the apparently established generic foreshock rate acceleration may or may not imply that the foreshocks have potential in the context of routine forecasting of larger events. Preliminary results suggest that this is the case, but that it is likely that physically-based models of foreshock clustering will be a necessary, but not necessarily sufficient, basis for successful forecasting.
Perez, Manolo F; Carstens, Bryan C; Rodrigues, Gustavo L; Moraes, Evandro M
2016-02-01
The Pilosocereus aurisetus complex consists of eight cactus species with a fragmented distribution associated to xeric enclaves within the Cerrado biome in eastern South America. The phylogeny of these species is incompletely resolved, and this instability complicates evolutionary analyses. Previous analyses based on both plastid and microsatellite markers suggested that this complex contained species with inherent phylogeographic structure, which was attributed to recent diversification and recurring range shifts. However, limitations of the molecular markers used in these analyses prevented some questions from being properly addressed. In order to better understand the relationship among these species and make a preliminary assessment of the genetic structure within them, we developed anonymous nuclear loci from pyrosequencing data of 40 individuals from four species in the P. aurisetus complex. The data obtained from these loci were used to identify genetic clusters within species, and to investigate the phylogenetic relationship among these inferred clusters using a species tree methodology. Coupled with a palaeodistributional modelling, our results reveal a deep phylogenetic and climatic disjunction between two geographic lineages. Our results highlight the importance of sampling more regions from the genome to gain better insights on the evolution of species with an intricate evolutionary history. The methodology used here provides a feasible approach to develop numerous genealogical molecular markers throughout the genome for non-model species. These data provide a more robust hypothesis for the relationship among the lineages of the P. aurisetus complex. Copyright © 2015 Elsevier Inc. All rights reserved.
Azeredo, Catarina Machado; Levy, Renata Bertazzi; Peres, Maria Fernanda Tourinho; Menezes, Paulo Rossi; Araya, Ricardo
2016-01-01
Objectives The aim of this study was to analyse the clustering of multiple health-related behaviours among adolescents and describe which socio-demographic characteristics are associated with these patterns. Design Cross-sectional study. Setting Brazilian schools assessed by the National Survey of School Health (PeNSE, 2012). Participants 104 109 Brazilian ninth-grade students from public and private schools (response rate=82.7%). Methods Exploratory and confirmatory factor analyses were performed to identify behaviour clustering and linear regression models were used to identify socio-demographic characteristics associated with each one of these behaviour patterns. Results We identified a good fit model with three behaviour patterns. The first was labelled ‘problem-behaviour’ and included aggressive behaviour, alcohol consumption, smoking, drug use and unsafe sex; the second was labelled ‘health-compromising diet and sedentary behaviours’ and included unhealthy food indicators and sedentary behaviour; and the third was labelled ‘health-promoting diet and physical activity’ and included healthy food indicators and physical activity. No differences in behaviour patterns were found between genders. The problem-behaviour pattern was associated with male gender, older age, more developed region (socially and economically) and public schools (compared with private). The ‘health-compromising diet and sedentary behaviours’ pattern was associated with female gender, older age, mothers with higher education level and more developed region. The ‘health-promoting diet and physical activity’ pattern was associated with male gender and mothers with higher education level. Conclusions Three health-related behaviour patterns were found among Brazilian adolescents. Interventions to decrease those negative patterns should take into account how these behaviours cluster together and the individuals most at risk. PMID:28186927
Testing prediction methods: Earthquake clustering versus the Poisson model
Michael, A.J.
1997-01-01
Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.
An integrated approach to reconstructing genome-scale transcriptional regulatory networks
Imam, Saheed; Noguera, Daniel R.; Donohue, Timothy J.; ...
2015-02-27
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making themmore » highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.« less
Text-mining analysis of mHealth research.
Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun
2017-01-01
In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies.
Text-mining analysis of mHealth research
Zengul, Ferhat; Oner, Nurettin; Delen, Dursun
2017-01-01
In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies. PMID:29430456
Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis
NASA Astrophysics Data System (ADS)
Fu, Pei-hua; Yin, Hong-bo
In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.
Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian
2015-01-01
In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.
Everage, Nicholas J.; Linkletter, Crystal D.; Gjelsvik, Annie; McGarvey, Stephen T.; Loucks, Eric B.
2014-01-01
Background. Social and behavioral risk markers (e.g., physical activity, diet, smoking, and socioeconomic position) cluster; however, little is known whether clustering is associated with coronary heart disease (CHD) risk. Objectives were to determine if sociobehavioral clustering is associated with biological CHD risk factors (total cholesterol, HDL cholesterol, systolic blood pressure, body mass index, waist circumference, and diabetes) and whether associations are independent of individual clustering components. Methods. Participants included 4,305 males and 4,673 females aged ≥20 years from NHANES 2001–2004. Sociobehavioral Risk Marker Index (SRI) included a summary score of physical activity, fruit/vegetable consumption, smoking, and educational attainment. Regression analyses evaluated associations of SRI with aforementioned biological CHD risk factors. Receiver operator curve analyses assessed independent predictive ability of SRI. Results. Healthful clustering (SRI = 0) was associated with improved biological CHD risk factor levels in 5 of 6 risk factors in females and 2 of 6 risk factors in males. Adding SRI to models containing age, race, and individual SRI components did not improve C-statistics. Conclusions. Findings suggest that healthful sociobehavioral risk marker clustering is associated with favorable CHD risk factor levels, particularly in females. These findings should inform social ecological interventions that consider health impacts of addressing social and behavioral risk factors. PMID:24719858
Methods in Computational Cosmology
NASA Astrophysics Data System (ADS)
Vakili, Mohammadjavad
State of the inhomogeneous universe and its geometry throughout cosmic history can be studied by measuring the clustering of galaxies and the gravitational lensing of distant faint galaxies. Lensing and clustering measurements from large datasets provided by modern galaxy surveys will forever shape our understanding of the how the universe expands and how the structures grow. Interpretation of these rich datasets requires careful characterization of uncertainties at different stages of data analysis: estimation of the signal, estimation of the signal uncertainties, model predictions, and connecting the model to the signal through probabilistic means. In this thesis, we attempt to address some aspects of these challenges. The first step in cosmological weak lensing analyses is accurate estimation of the distortion of the light profiles of galaxies by large scale structure. These small distortions, known as the cosmic shear signal, are dominated by extra distortions due to telescope optics and atmosphere (in the case of ground-based imaging). This effect is captured by a kernel known as the Point Spread Function (PSF) that needs to be fully estimated and corrected for. We address two challenges a head of accurate PSF modeling for weak lensing studies. The first challenge is finding the centers of point sources that are used for empirical estimation of the PSF. We show that the approximate methods for centroiding stars in wide surveys are able to optimally saturate the information content that is retrievable from astronomical images in the presence of noise. The fist step in weak lensing studies is estimating the shear signal by accurately measuring the shapes of galaxies. Galaxy shape measurement involves modeling the light profile of galaxies convolved with the light profile of the PSF. Detectors of many space-based telescopes such as the Hubble Space Telescope (HST) sample the PSF with low resolution. Reliable weak lensing analysis of galaxies observed by the HST camera requires knowledge of the PSF at a resolution higher than the pixel resolution of HST. This PSF is called the super-resolution PSF. In particular, we present a forward model of the point sources imaged through filters of the HST WFC3 IR channel. We show that this forward model can accurately estimate the super-resolution PSF. We also introduce a noise model that permits us to robustly analyze the HST WFC3 IR observations of the crowded fields. Then we try to address one of the theoretical uncertainties in modeling of galaxy clustering on small scales. Study of small scale clustering requires assuming a halo model. Clustering of halos has been shown to depend on halo properties beyond mass such as halo concentration, a phenomenon referred to as assembly bias. Standard large-scale structure studies with halo occupation distribution (HOD) assume that halo mass alone is sufficient to characterize the connection between galaxies and halos. However, assembly bias could cause the modeling of galaxy clustering to face systematic effects if the expected number of galaxies in halos is correlated with other halo properties. Using high resolution N-body simulations and the clustering measurements of Sloan Digital Sky Survey (SDSS) DR7 main galaxy sample, we show that modeling of galaxy clustering can slightly improve if we allow the HOD model to depend on halo properties beyond mass. One of the key ingredients in precise parameter inference using galaxy clustering is accurate estimation of the error covariance matrix of clustering measurements. This requires generation of many independent galaxy mock catalogs that accurately describe the statistical distribution of galaxies in a wide range of physical scales. We present a fast and accurate method based on low-resolution N-body simulations and an empirical bias model for generating mock catalogs. We use fast particle mesh gravity solvers for generation of dark matter density field and we use Markov Chain Monti Carlo (MCMC) to estimate the bias model that connects dark matter to galaxies. We show that this approach enables the fast generation of mock catalogs that recover clustering at a percent-level accuracy down to quasi-nonlinear scales. Cosmological datasets are interpreted by specifying likelihood functions that are often assumed to be multivariate Gaussian. Likelihood free approaches such as Approximate Bayesian Computation (ABC) can bypass this assumption by introducing a generative forward model of the data and a distance metric for quantifying the closeness of the data and the model. We present the first application of ABC in large scale structure for constraining the connections between galaxies and dark matter halos. We present an implementation of ABC equipped with Population Monte Carlo and a generative forward model of the data that incorporates sample variance and systematic uncertainties. (Abstract shortened by ProQuest.).
Rumor Diffusion in an Interests-Based Dynamic Social Network
Mao, Xinjun; Guessoum, Zahia; Zhou, Huiping
2013-01-01
To research rumor diffusion in social friend network, based on interests, a dynamic friend network is proposed, which has the characteristics of clustering and community, and a diffusion model is also proposed. With this friend network and rumor diffusion model, based on the zombie-city model, some simulation experiments to analyze the characteristics of rumor diffusion in social friend networks have been conducted. The results show some interesting observations: (1) positive information may evolve to become a rumor through the diffusion process that people may modify the information by word of mouth; (2) with the same average degree, a random social network has a smaller clustering coefficient and is more beneficial for rumor diffusion than the dynamic friend network; (3) a rumor is spread more widely in a social network with a smaller global clustering coefficient than in a social network with a larger global clustering coefficient; and (4) a network with a smaller clustering coefficient has a larger efficiency. PMID:24453911
Rumor diffusion in an interests-based dynamic social network.
Tang, Mingsheng; Mao, Xinjun; Guessoum, Zahia; Zhou, Huiping
2013-01-01
To research rumor diffusion in social friend network, based on interests, a dynamic friend network is proposed, which has the characteristics of clustering and community, and a diffusion model is also proposed. With this friend network and rumor diffusion model, based on the zombie-city model, some simulation experiments to analyze the characteristics of rumor diffusion in social friend networks have been conducted. The results show some interesting observations: (1) positive information may evolve to become a rumor through the diffusion process that people may modify the information by word of mouth; (2) with the same average degree, a random social network has a smaller clustering coefficient and is more beneficial for rumor diffusion than the dynamic friend network; (3) a rumor is spread more widely in a social network with a smaller global clustering coefficient than in a social network with a larger global clustering coefficient; and (4) a network with a smaller clustering coefficient has a larger efficiency.
NASA Astrophysics Data System (ADS)
Belloni, Diogo; Zorotovic, Mónica; Schreiber, Matthias R.; Leigh, Nathan W. C.; Giersz, Mirek; Askar, Abbas
2017-06-01
In this third of a series of papers related to cataclysmic variables (CVs) and related objects, we analyse the population of CVs in a set of 12 globular cluster models evolved with the MOCCA Monte Carlo code, for two initial binary populations (IBPs), two choices of common-envelope phase (CEP) parameters, and three different models for the evolution of CVs and the treatment of angular momentum loss. When more realistic models and parameters are considered, we find that present-day cluster CV duty cycles are extremely low (≲0.1 per cent) that makes their detection during outbursts rather difficult. Additionally, the IBP plays a significant role in shaping the CV population properties, and models that follow the Kroupa IBP are less affected by enhanced angular momentum loss. We also predict from our simulations that CVs formed dynamically in the past few Gyr (massive CVs) correspond to bright CVs (as expected) and that faint CVs formed several Gyr ago (dynamically or not) represent the overwhelming majority. Regarding the CV formation rate, we rule out the notion that it is similar irrespective of the cluster properties. Finally, we discuss the differences in the present-day CV properties related to the IBPs, the initial cluster conditions, the CEP parameters, formation channels, the CV evolution models and the angular momentum loss treatments.
Probabilistic Analysis of Hierarchical Cluster Protocols for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Kaj, Ingemar
Wireless sensor networks are designed to extract data from the deployment environment and combine sensing, data processing and wireless communication to provide useful information for the network users. Hundreds or thousands of small embedded units, which operate under low-energy supply and with limited access to central network control, rely on interconnecting protocols to coordinate data aggregation and transmission. Energy efficiency is crucial and it has been proposed that cluster based and distributed architectures such as LEACH are particularly suitable. We analyse the random cluster hierarchy in this protocol and provide a solution for low-energy and limited-loss optimization. Moreover, we extend these results to a multi-level version of LEACH, where clusters of nodes again self-organize to form clusters of clusters, and so on.
Hoadley, Katherine A; Yau, Christina; Hinoue, Toshinori; Wolf, Denise M; Lazar, Alexander J; Drill, Esther; Shen, Ronglai; Taylor, Alison M; Cherniack, Andrew D; Thorsson, Vésteinn; Akbani, Rehan; Bowlby, Reanne; Wong, Christopher K; Wiznerowicz, Maciej; Sanchez-Vega, Francisco; Robertson, A Gordon; Schneider, Barbara G; Lawrence, Michael S; Noushmehr, Houtan; Malta, Tathiane M; Stuart, Joshua M; Benz, Christopher C; Laird, Peter W
2018-04-05
We conducted comprehensive integrative molecular analyses of the complete set of tumors in The Cancer Genome Atlas (TCGA), consisting of approximately 10,000 specimens and representing 33 types of cancer. We performed molecular clustering using data on chromosome-arm-level aneuploidy, DNA hypermethylation, mRNA, and miRNA expression levels and reverse-phase protein arrays, of which all, except for aneuploidy, revealed clustering primarily organized by histology, tissue type, or anatomic origin. The influence of cell type was evident in DNA-methylation-based clustering, even after excluding sites with known preexisting tissue-type-specific methylation. Integrative clustering further emphasized the dominant role of cell-of-origin patterns. Molecular similarities among histologically or anatomically related cancer types provide a basis for focused pan-cancer analyses, such as pan-gastrointestinal, pan-gynecological, pan-kidney, and pan-squamous cancers, and those related by stemness features, which in turn may inform strategies for future therapeutic development. Copyright © 2018 Elsevier Inc. All rights reserved.
Opara, Umezuruike Linus; Jacobson, Dan; Al-Saady, Nadiya Abubakar
2010-01-01
Banana is an important crop grown in Oman and there is a dearth of information on its genetic diversity to assist in crop breeding and improvement programs. This study employed amplified fragment length polymorphism (AFLP) to investigate the genetic variation in local banana cultivars from the southern region of Oman. Using 12 primer combinations, a total of 1094 bands were scored, of which 1012 were polymorphic. Eighty-two unique markers were identified, which revealed the distinct separation of the seven cultivars. The results obtained show that AFLP can be used to differentiate the banana cultivars. Further classification by phylogenetic, hierarchical clustering and principal component analyses showed significant differences between the clusters found with molecular markers and those clusters created by previous studies using morphological analysis. Based on the analytical results, a consensus dendrogram of the banana cultivars is presented. PMID:20443211
Bias correction of satellite-based rainfall data
NASA Astrophysics Data System (ADS)
Bhattacharya, Biswa; Solomatine, Dimitri
2015-04-01
Limitation in hydro-meteorological data availability in many catchments limits the possibility of reliable hydrological analyses especially for near-real-time predictions. However, the variety of satellite based and meteorological model products for rainfall provides new opportunities. Often times the accuracy of these rainfall products, when compared to rain gauge measurements, is not impressive. The systematic differences of these rainfall products from gauge observations can be partially compensated by adopting a bias (error) correction. Many of such methods correct the satellite based rainfall data by comparing their mean value to the mean value of rain gauge data. Refined approaches may also first find out a suitable time scale at which different data products are better comparable and then employ a bias correction at that time scale. More elegant methods use quantile-to-quantile bias correction, which however, assumes that the available (often limited) sample size can be useful in comparing probabilities of different rainfall products. Analysis of rainfall data and understanding of the process of its generation reveals that the bias in different rainfall data varies in space and time. The time aspect is sometimes taken into account by considering the seasonality. In this research we have adopted a bias correction approach that takes into account the variation of rainfall in space and time. A clustering based approach is employed in which every new data point (e.g. of Tropical Rainfall Measuring Mission (TRMM)) is first assigned to a specific cluster of that data product and then, by identifying the corresponding cluster of gauge data, the bias correction specific to that cluster is adopted. The presented approach considers the space-time variation of rainfall and as a result the corrected data is more realistic. Keywords: bias correction, rainfall, TRMM, satellite rainfall
The persistent clustering of adult body mass index by school attended in adolescence.
Evans, Clare Rosenfeld; Lippert, Adam M; Subramanian, S V
2016-03-01
It is well known that adolescent body mass index (BMI) shows school-level clustering. We explore whether school-level clustering of BMI persists into adulthood. Multilevel models nesting young adults in schools they attended as adolescents are fit for 3 outcomes: adolescent BMI, self-report adult BMI and measured adult BMI. Sex-stratified and race/ethnicity-stratified (black, Hispanic, white, other) analyses were also conducted. School-level clustering (wave 1 intraclass correlation coefficient (ICC)=1.3%) persists over time (wave 4 ICC=2%), and results are comparable across stratified analyses of both sexes and all racial/ethnic groups (except for Hispanics when measured BMIs are used). Controlling for BMI in adolescence partially attenuates this effect. School-level clustering of BMI persists into young adulthood. Possible explanations include the salience of school environments in establishing behaviours and trajectories, the selection of adult social networks that resemble adolescent networks and reinforce previous behaviours, and characteristics of school catchment areas associated with BMI. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
NASA Astrophysics Data System (ADS)
Acebron, Ana; Jullo, Eric; Limousin, Marceau; Tilquin, André; Giocoli, Carlo; Jauzac, Mathilde; Mahler, Guillaume; Richard, Johan
2017-09-01
Strong gravitational lensing by galaxy clusters is a fundamental tool to study dark matter and constrain the geometry of the Universe. Recently, the Hubble Space Telescope Frontier Fields programme has allowed a significant improvement of mass and magnification measurements but lensing models still have a residual root mean square between 0.2 arcsec and few arcseconds, not yet completely understood. Systematic errors have to be better understood and treated in order to use strong lensing clusters as reliable cosmological probes. We have analysed two simulated Hubble-Frontier-Fields-like clusters from the Hubble Frontier Fields Comparison Challenge, Ares and Hera. We use several estimators (relative bias on magnification, density profiles, ellipticity and orientation) to quantify the goodness of our reconstructions by comparing our multiple models, optimized with the parametric software lenstool, with the input models. We have quantified the impact of systematic errors arising, first, from the choice of different density profiles and configurations and, secondly, from the availability of constraints (spectroscopic or photometric redshifts, redshift ranges of the background sources) in the parametric modelling of strong lensing galaxy clusters and therefore on the retrieval of cosmological parameters. We find that substructures in the outskirts have a significant impact on the position of the multiple images, yielding tighter cosmological contours. The need for wide-field imaging around massive clusters is thus reinforced. We show that competitive cosmological constraints can be obtained also with complex multimodal clusters and that photometric redshifts improve the constraints on cosmological parameters when considering a narrow range of (spectroscopic) redshifts for the sources.
Pressure of the hot gas in simulations of galaxy clusters
NASA Astrophysics Data System (ADS)
Planelles, S.; Fabjan, D.; Borgani, S.; Murante, G.; Rasia, E.; Biffi, V.; Truong, N.; Ragone-Figueroa, C.; Granato, G. L.; Dolag, K.; Pierpaoli, E.; Beck, A. M.; Steinborn, Lisa K.; Gaspari, M.
2017-06-01
We analyse the radial pressure profiles, the intracluster medium (ICM) clumping factor and the Sunyaev-Zel'dovich (SZ) scaling relations of a sample of simulated galaxy clusters and groups identified in a set of hydrodynamical simulations based on an updated version of the treepm-SPH GADGET-3 code. Three different sets of simulations are performed: the first assumes non-radiative physics, the others include, among other processes, active galactic nucleus (AGN) and/or stellar feedback. Our results are analysed as a function of redshift, ICM physics, cluster mass and cluster cool-coreness or dynamical state. In general, the mean pressure profiles obtained for our sample of groups and clusters show a good agreement with X-ray and SZ observations. Simulated cool-core (CC) and non-cool-core (NCC) clusters also show a good match with real data. We obtain in all cases a small (if any) redshift evolution of the pressure profiles of massive clusters, at least back to z = 1. We find that the clumpiness of gas density and pressure increases with the distance from the cluster centre and with the dynamical activity. The inclusion of AGN feedback in our simulations generates values for the gas clumping (√{C}_{ρ }˜ 1.2 at R200) in good agreement with recent observational estimates. The simulated YSZ-M scaling relations are in good accordance with several observed samples, especially for massive clusters. As for the scatter of these relations, we obtain a clear dependence on the cluster dynamical state, whereas this distinction is not so evident when looking at the subsamples of CC and NCC clusters.
Use of machine learning methods to reduce predictive error of groundwater models.
Xu, Tianfang; Valocchi, Albert J; Choi, Jaesik; Amir, Eyal
2014-01-01
Quantitative analyses of groundwater flow and transport typically rely on a physically-based model, which is inherently subject to error. Errors in model structure, parameter and data lead to both random and systematic error even in the output of a calibrated model. We develop complementary data-driven models (DDMs) to reduce the predictive error of physically-based groundwater models. Two machine learning techniques, the instance-based weighting and support vector regression, are used to build the DDMs. This approach is illustrated using two real-world case studies of the Republican River Compact Administration model and the Spokane Valley-Rathdrum Prairie model. The two groundwater models have different hydrogeologic settings, parameterization, and calibration methods. In the first case study, cluster analysis is introduced for data preprocessing to make the DDMs more robust and computationally efficient. The DDMs reduce the root-mean-square error (RMSE) of the temporal, spatial, and spatiotemporal prediction of piezometric head of the groundwater model by 82%, 60%, and 48%, respectively. In the second case study, the DDMs reduce the RMSE of the temporal prediction of piezometric head of the groundwater model by 77%. It is further demonstrated that the effectiveness of the DDMs depends on the existence and extent of the structure in the error of the physically-based model. © 2013, National GroundWater Association.
Patterns of Physical and Relational Aggression in a School-Based Sample of Boys and Girls
ERIC Educational Resources Information Center
Crapanzano, Ann Marie; Frick, Paul J.; Terranova, Andrew M.
2010-01-01
The current study investigated the patterns of aggressive behavior displayed in a sample of 282 students in the 4th through 7th grades (M age = 11.28; SD = 1.82). Using cluster analyses, two distinct patterns of physical aggression emerged for both boys and girls with one aggressive cluster showing mild levels of reactive aggression and one group…
Pennings, Stephanie M; Finn, Joseph; Houtsma, Claire; Green, Bradley A; Anestis, Michael D
2017-10-01
Prior studies examining posttraumatic stress disorder (PTSD) symptom clusters and the components of the interpersonal theory of suicide (ITS) have yielded mixed results, likely stemming in part from the use of divergent samples and measurement techniques. This study aimed to expand on these findings by utilizing a large military sample, gold standard ITS measures, and multiple PTSD factor structures. Utilizing a sample of 935 military personnel, hierarchical multiple regression analyses were used to test the association between PTSD symptom clusters and the ITS variables. Additionally, we tested for indirect effects of PTSD symptom clusters on suicidal ideation through thwarted belongingness, conditional on levels of perceived burdensomeness. Results indicated that numbing symptoms are positively associated with both perceived burdensomeness and thwarted belongingness and hyperarousal symptoms (dysphoric arousal in the 5-factor model) are positively associated with thwarted belongingness. Results also indicated that hyperarousal symptoms (anxious arousal in the 5-factor model) were positively associated with fearlessness about death. The positive association between PTSD symptom clusters and suicidal ideation was inconsistent and modest, with mixed support for the ITS model. Overall, these results provide further clarity regarding the association between specific PTSD symptom clusters and suicide risk factors. © 2016 The American Association of Suicidology.
Arnrup, Kristina; Broberg, Anders G; Berggren, Ulf; Bodin, Lennart
2007-11-01
Current treatment of children with dental behaviour management problems (DBMP) is based on the presupposition that their difficulties are caused by dental fear, but is this always the case? The aim of this study was to study temperamental reactivity, negative emotionality, and other personal characteristics in relation to DBMP in 8- to 12-year-old children. Forty-six children referred because of DBMP (study group) and 110 children in ordinary dental care (reference group) participated. The EASI tempramental survey assessed temperamental reactivity and negative emotionality, the Child Behaviour Questionnaire internalizing and externalizing behaviour problems, and the Children's Fear Survey Schedule general and dental fears. Cluster analyses and tree-based modelling were used for data analysis. Among the five clusters identified, one could be characterized as 'balanced temperament'. Thirty-five per cent of the reference group compared to only 7% of the study group belonged to this cluster. Negative emotionality was the most important sorting variable. Children referred because of DBMP differed from children in ordinary dental care, not only in dental fear level, but also in personal characteristics. Few of the referred children were characterized by a balanced temperament profile. It is important to consider the dual impact of emotion dysregulation and emotional reactivity in the development of DBMP.
Automated modal parameter estimation using correlation analysis and bootstrap sampling
NASA Astrophysics Data System (ADS)
Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.
2018-02-01
The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein
2014-11-01
Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Mechanism for Collective Cell Alignment in Myxococcus xanthus Bacteria
Balagam, Rajesh; Igoshin, Oleg A.
2015-01-01
Myxococcus xanthus cells self-organize into aligned groups, clusters, at various stages of their lifecycle. Formation of these clusters is crucial for the complex dynamic multi-cellular behavior of these bacteria. However, the mechanism underlying the cell alignment and clustering is not fully understood. Motivated by studies of clustering in self-propelled rods, we hypothesized that M. xanthus cells can align and form clusters through pure mechanical interactions among cells and between cells and substrate. We test this hypothesis using an agent-based simulation framework in which each agent is based on the biophysical model of an individual M. xanthus cell. We show that model agents, under realistic cell flexibility values, can align and form cell clusters but only when periodic reversals of cell directions are suppressed. However, by extending our model to introduce the observed ability of cells to deposit and follow slime trails, we show that effective trail-following leads to clusters in reversing cells. Furthermore, we conclude that mechanical cell alignment combined with slime-trail-following is sufficient to explain the distinct clustering behaviors observed for wild-type and non-reversing M. xanthus mutants in recent experiments. Our results are robust to variation in model parameters, match the experimentally observed trends and can be applied to understand surface motility patterns of other bacterial species. PMID:26308508
Exemplar-Based Clustering via Simulated Annealing
ERIC Educational Resources Information Center
Brusco, Michael J.; Kohn, Hans-Friedrich
2009-01-01
Several authors have touted the p-median model as a plausible alternative to within-cluster sums of squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision of "exemplars" as cluster centers, robustness with respect to outliers, and the accommodation of a diverse range of similarity data. We developed…
The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth
ERIC Educational Resources Information Center
Steyvers, Mark; Tenenbaum, Joshua B.
2005-01-01
We present statistical analyses of the large-scale structure of 3 types of semantic networks: word associations, WordNet, and Roget's Thesaurus. We show that they have a small-world structure, characterized by sparse connectivity, short average path lengths between words, and strong local clustering. In addition, the distributions of the number of…
Elastic K-means using posterior probability.
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.
Convergence tests on tax burden and economic growth among China, Taiwan and the OECD countries
NASA Astrophysics Data System (ADS)
Wang, David Han-Min
2007-07-01
The unfolding globalization has profound impact on a wide range of nations’ policies including tax and economy policies. This study adopts the time series and cluster analyses to examine the convergence property of tax burden and per capita gross domestic product among Taiwan, China and the OECD countries. The empirical results show that there is no significant relationship between the integration process and fiscal convergence among countries. However, the cluster analyses identify that the group of China, Taiwan, and Korea was stably moving toward one model during the 1970s, 1980s and 1990s. And, the convergence of tax burden is found in the group, but no pairwise convergence exists.
Naz, Gul Jabeen; Dong, Dandan; Geng, Yaoxiang; Wang, Yingmin; Dong, Chuang
2017-08-22
It is known that bulk metallic glasses follow simple composition formulas [cluster](glue atom) 1 or 3 with 24 valence electrons within the framework of the cluster-plus-glue-atom model. Though the relevant nearest-neighbor cluster can be readily identified from a devitrification phase, the glue atoms remains poorly defined. The present work is devoted to understanding the composition rule of Fe-(B,P,C) based multi-component bulk metallic glasses, by introducing a cluster-based eutectic liquid model. This model regards a eutectic liquid to be composed of two stable liquids formulated respectively by cluster formulas for ideal metallic glasses from the two eutectic phases. The dual cluster formulas are first established for binary Fe-(B,C,P) eutectics: [Fe-Fe 14 ]B 2 Fe + [B-B 2 Fe 8 ]Fe ≈ Fe 83.3 B 16.7 for eutectic Fe 83 B 17 , [P-Fe 14 ]P + [P-Fe 9 ]P 2 Fe≈Fe 82.8 P 17.2 for Fe 83 P 17 , and [C-Fe 6 ]Fe 3 + [C-Fe 9 ]C 2 Fe ≈ Fe 82.6 C 17.4 for Fe 82.7 C 17.3 . The second formulas in these dual-cluster formulas, being respectively relevant to devitrification phases Fe 2 B, Fe 3 P, and Fe 3 C, well explain the compositions of existing Fe-based transition metals-metalloid bulk metallic glasses. These formulas also satisfy the 24-electron rule. The proposition of the composition formulas for good glass formers, directly from known eutectic points, constitutes a new route towards understanding and eventual designing metallic glasses of high glass forming abilities.
NASA Astrophysics Data System (ADS)
Kost, Christoph; Friebertshäuser, Chris; Hartmann, Niklas; Fluri, Thomas; Nitz, Peter
2017-06-01
This paper analyses the role of solar technologies (CSP and PV) and their interaction in the South African electricity system by using a fundamental electricity system modelling (ENTIGRIS-SouthAfrica). The model is used to analyse the South African long-term electricity generation portfolio mix, optimized site selection and required transmission capacities until the year 2050. Hereby especially the location and grid integration of solar technology (PV and CSP) and wind power plants is analysed. This analysis is carried out by using detailed resource assessment of both technologies. A cluster approach is presented to reduce complexity by integrating the data in an optimization model.
Smith, Jordan J; Morgan, Philip J; Plotnikoff, Ronald C; Stodden, David F; Lubans, David R
2016-01-01
The purpose of this study was to examine the mediating effect of resistance training skill competency on percentage of body fat, muscular fitness and physical activity among a sample of adolescent boys participating in a school-based obesity prevention intervention. Participants were 361 adolescent boys taking part in the Active Teen Leaders Avoiding Screen-time (ATLAS) cluster randomised controlled trial: a school-based program targeting the health behaviours of economically disadvantaged adolescent males considered "at-risk" of obesity. Body fat percentage (bioelectrical impedance), muscular fitness (hand grip dynamometry and push-ups), physical activity (accelerometry) and resistance training skill competency were assessed at baseline and post-intervention (i.e., 8 months). Three separate multi-level mediation models were analysed to investigate the potential mediating effects of resistance training skill competency on each of the study outcomes using a product-of-coefficients test. Analyses followed the intention-to-treat principle. The intervention had a significant impact on the resistance training skill competency of the boys, and improvements in skill competency significantly mediated the effect of the intervention on percentage of body fat and the combined muscular fitness score. No significant mediated effects were found for physical activity. Improving resistance training skill competency may be an effective strategy for achieving improvements in body composition and muscular fitness in adolescent boys.
The anterior hypothalamus in cluster headache.
Arkink, Enrico B; Schmitz, Nicole; Schoonman, Guus G; van Vliet, Jorine A; Haan, Joost; van Buchem, Mark A; Ferrari, Michel D; Kruit, Mark C
2017-10-01
Objective To evaluate the presence, localization, and specificity of structural hypothalamic and whole brain changes in cluster headache and chronic paroxysmal hemicrania (CPH). Methods We compared T1-weighted magnetic resonance images of subjects with cluster headache (episodic n = 24; chronic n = 23; probable n = 14), CPH ( n = 9), migraine (with aura n = 14; without aura n = 19), and no headache ( n = 48). We applied whole brain voxel-based morphometry (VBM) using two complementary methods to analyze structural changes in the hypothalamus: region-of-interest analyses in whole brain VBM, and manual segmentation of the hypothalamus to calculate volumes. We used both conservative VBM thresholds, correcting for multiple comparisons, and less conservative thresholds for exploratory purposes. Results Using region-of-interest VBM analyses mirrored to the headache side, we found enlargement ( p < 0.05, small volume correction) in the anterior hypothalamic gray matter in subjects with chronic cluster headache compared to controls, and in all participants with episodic or chronic cluster headache taken together compared to migraineurs. After manual segmentation, hypothalamic volume (mean±SD) was larger ( p < 0.05) both in subjects with episodic (1.89 ± 0.18 ml) and chronic (1.87 ± 0.21 ml) cluster headache compared to controls (1.72 ± 0.15 ml) and migraineurs (1.68 ± 0.19 ml). Similar but non-significant trends were observed for participants with probable cluster headache (1.82 ± 0.19 ml; p = 0.07) and CPH (1.79 ± 0.20 ml; p = 0.15). Increased hypothalamic volume was primarily explained by bilateral enlargement of the anterior hypothalamus. Exploratory whole brain VBM analyses showed widespread changes in pain-modulating areas in all subjects with headache. Interpretation The anterior hypothalamus is enlarged in episodic and chronic cluster headache and possibly also in probable cluster headache or CPH, but not in migraine.
NASA Astrophysics Data System (ADS)
Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.
2018-04-01
Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peacock, Mark B.; Zepf, Stephen E.; Maccarone, Thomas J.
2011-08-10
Accurate stellar population synthesis models are vital in understanding the properties and formation histories of galaxies. In order to calibrate and test the reliability of these models, they are often compared with observations of star clusters. However, relatively little work has compared these models in the ugriz filters, despite the recent widespread use of this filter set. In this paper, we compare the integrated colors of globular clusters in the Sloan Digital Sky Survey (SDSS) with those predicted from commonly used simple stellar population (SSP) models. The colors are based on SDSS observations of M31's clusters and provide the largestmore » population of star clusters with accurate photometry available from the survey. As such, it is a unique sample with which to compare SSP models with SDSS observations. From this work, we identify a significant offset between the SSP models and the clusters' g - r colors, with the models predicting colors which are too red by g - r {approx} 0.1. This finding is consistent with previous observations of luminous red galaxies in the SDSS, which show a similar discrepancy. The identification of this offset in globular clusters suggests that it is very unlikely to be due to a minority population of young stars. The recently updated SSP model of Maraston and Stroembaeck better represents the observed g - r colors. This model is based on the empirical MILES stellar library, rather than theoretical libraries, suggesting an explanation for the g - r discrepancy.« less
Salient concerns in using analgesia for cancer pain among outpatients: A cluster analysis study.
Meghani, Salimah H; Knafl, George J
2017-02-10
To identify unique clusters of patients based on their concerns in using analgesia for cancer pain and predictors of the cluster membership. This was a 3-mo prospective observational study ( n = 207). Patients were included if they were adults (≥ 18 years), diagnosed with solid tumors or multiple myelomas, and had at least one prescription of around-the-clock pain medication for cancer or cancer-treatment-related pain. Patients were recruited from two outpatient medical oncology clinics within a large health system in Philadelphia. A choice-based conjoint (CBC) analysis experiment was used to elicit analgesic treatment preferences (utilities). Patients employed trade-offs based on five analgesic attributes (percent relief from analgesics, type of analgesic, type of side-effects, severity of side-effects, out of pocket cost). Patients were clustered based on CBC utilities using novel adaptive statistical methods. Multiple logistic regression was used to identify predictors of cluster membership. The analyses found 4 unique clusters: Most patients made trade-offs based on the expectation of pain relief (cluster 1, 41%). For a subset, the main underlying concern was type of analgesic prescribed, i.e ., opioid vs non-opioid (cluster 2, 11%) and type of analgesic side effects (cluster 4, 21%), respectively. About one in four made trade-offs based on multiple concerns simultaneously including pain relief, type of side effects, and severity of side effects (cluster 3, 28%). In multivariable analysis, to identify predictors of cluster membership, clinical and socioeconomic factors (education, health literacy, income, social support) rather than analgesic attitudes and beliefs were found important; only the belief, i.e ., pain medications can mask changes in health or keep you from knowing what is going on in your body was found significant in predicting two of the four clusters [cluster 1 (-); cluster 4 (+)]. Most patients appear to be driven by a single salient concern in using analgesia for cancer pain. Addressing these concerns, perhaps through real time clinical assessments, may improve patients' analgesic adherence patterns and cancer pain outcomes.
Lee, JongHyup; Pak, Dohyun
2016-01-01
For practical deployment of wireless sensor networks (WSN), WSNs construct clusters, where a sensor node communicates with other nodes in its cluster, and a cluster head support connectivity between the sensor nodes and a sink node. In hybrid WSNs, cluster heads have cellular network interfaces for global connectivity. However, when WSNs are active and the load of cellular networks is high, the optimal assignment of cluster heads to base stations becomes critical. Therefore, in this paper, we propose a game theoretic model to find the optimal assignment of base stations for hybrid WSNs. Since the communication and energy cost is different according to cellular systems, we devise two game models for TDMA/FDMA and CDMA systems employing power prices to adapt to the varying efficiency of recent wireless technologies. The proposed model is defined on the assumptions of the ideal sensing field, but our evaluation shows that the proposed model is more adaptive and energy efficient than local selections. PMID:27589743
Support vector machine learning-based fMRI data group analysis.
Wang, Ze; Childress, Anna R; Wang, Jiongjiong; Detre, John A
2007-07-15
To explore the multivariate nature of fMRI data and to consider the inter-subject brain response discrepancies, a multivariate and brain response model-free method is fundamentally required. Two such methods are presented in this paper by integrating a machine learning algorithm, the support vector machine (SVM), and the random effect model. Without any brain response modeling, SVM was used to extract a whole brain spatial discriminance map (SDM), representing the brain response difference between the contrasted experimental conditions. Population inference was then obtained through the random effect analysis (RFX) or permutation testing (PMU) on the individual subjects' SDMs. Applied to arterial spin labeling (ASL) perfusion fMRI data, SDM RFX yielded lower false-positive rates in the null hypothesis test and higher detection sensitivity for synthetic activations with varying cluster size and activation strengths, compared to the univariate general linear model (GLM)-based RFX. For a sensory-motor ASL fMRI study, both SDM RFX and SDM PMU yielded similar activation patterns to GLM RFX and GLM PMU, respectively, but with higher t values and cluster extensions at the same significance level. Capitalizing on the absence of temporal noise correlation in ASL data, this study also incorporated PMU in the individual-level GLM and SVM analyses accompanied by group-level analysis through RFX or group-level PMU. Providing inferences on the probability of being activated or deactivated at each voxel, these individual-level PMU-based group analysis methods can be used to threshold the analysis results of GLM RFX, SDM RFX or SDM PMU.
Semi-supervised Machine Learning for Analysis of Hydrogeochemical Data and Models
NASA Astrophysics Data System (ADS)
Vesselinov, Velimir; O'Malley, Daniel; Alexandrov, Boian; Moore, Bryan
2017-04-01
Data- and model-based analyses such as uncertainty quantification, sensitivity analysis, and decision support using complex physics models with numerous model parameters and typically require a huge number of model evaluations (on order of 10^6). Furthermore, model simulations of complex physics may require substantial computational time. For example, accounting for simultaneously occurring physical processes such as fluid flow and biogeochemical reactions in heterogeneous porous medium may require several hours of wall-clock computational time. To address these issues, we have developed a novel methodology for semi-supervised machine learning based on Non-negative Matrix Factorization (NMF) coupled with customized k-means clustering. The algorithm allows for automated, robust Blind Source Separation (BSS) of groundwater types (contamination sources) based on model-free analyses of observed hydrogeochemical data. We have also developed reduced order modeling tools, which coupling support vector regression (SVR), genetic algorithms (GA) and artificial and convolutional neural network (ANN/CNN). SVR is applied to predict the model behavior within prior uncertainty ranges associated with the model parameters. ANN and CNN procedures are applied to upscale heterogeneity of the porous medium. In the upscaling process, fine-scale high-resolution models of heterogeneity are applied to inform coarse-resolution models which have improved computational efficiency while capturing the impact of fine-scale effects at the course scale of interest. These techniques are tested independently on a series of synthetic problems. We also present a decision analysis related to contaminant remediation where the developed reduced order models are applied to reproduce groundwater flow and contaminant transport in a synthetic heterogeneous aquifer. The tools are coded in Julia and are a part of the MADS high-performance computational framework (https://github.com/madsjulia/Mads.jl).
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri
2007-01-01
Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis. PMID:18305825
Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri
2007-12-30
Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis.
Clustering for Binary Data Sets by Using Genetic Algorithm-Incremental K-means
NASA Astrophysics Data System (ADS)
Saharan, S.; Baragona, R.; Nor, M. E.; Salleh, R. M.; Asrah, N. M.
2018-04-01
This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers.
Novel layered clustering-based approach for generating ensemble of classifiers.
Rahman, Ashfaqur; Verma, Brijesh
2011-05-01
This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.
Roca, Josep; Vargas, Claudia; Cano, Isaac; Selivanov, Vitaly; Barreiro, Esther; Maier, Dieter; Falciani, Francesco; Wagner, Peter; Cascante, Marta; Garcia-Aymerich, Judith; Kalko, Susana; De Mas, Igor; Tegnér, Jesper; Escarrabill, Joan; Agustí, Alvar; Gomez-Cabrero, David
2014-11-28
Heterogeneity in clinical manifestations and disease progression in Chronic Obstructive Pulmonary Disease (COPD) lead to consequences for patient health risk assessment, stratification and management. Implicit with the classical "spill over" hypothesis is that COPD heterogeneity is driven by the pulmonary events of the disease. Alternatively, we hypothesized that COPD heterogeneities result from the interplay of mechanisms governing three conceptually different phenomena: 1) pulmonary disease, 2) systemic effects of COPD and 3) co-morbidity clustering, each of them with their own dynamics. To explore the potential of a systems analysis of COPD heterogeneity focused on skeletal muscle dysfunction and on co-morbidity clustering aiming at generating predictive modeling with impact on patient management. To this end, strategies combining deterministic modeling and network medicine analyses of the Biobridge dataset were used to investigate the mechanisms of skeletal muscle dysfunction. An independent data driven analysis of co-morbidity clustering examining associated genes and pathways was performed using a large dataset (ICD9-CM data from Medicare, 13 million people). Finally, a targeted network analysis using the outcomes of the two approaches (skeletal muscle dysfunction and co-morbidity clustering) explored shared pathways between these phenomena. (1) Evidence of abnormal regulation of skeletal muscle bioenergetics and skeletal muscle remodeling showing a significant association with nitroso-redox disequilibrium was observed in COPD; (2) COPD patients presented higher risk for co-morbidity clustering than non-COPD patients increasing with ageing; and, (3) the on-going targeted network analyses suggests shared pathways between skeletal muscle dysfunction and co-morbidity clustering. The results indicate the high potential of a systems approach to address COPD heterogeneity. Significant knowledge gaps were identified that are relevant to shape strategies aiming at fostering 4P Medicine for patients with COPD.
NASA Astrophysics Data System (ADS)
Sehgal, V.; Lakhanpal, A.; Maheswaran, R.; Khosa, R.; Sridhar, Venkataramana
2018-01-01
This study proposes a wavelet-based multi-resolution modeling approach for statistical downscaling of GCM variables to mean monthly precipitation for five locations at Krishna Basin, India. Climatic dataset from NCEP is used for training the proposed models (Jan.'69 to Dec.'94) and are applied to corresponding CanCM4 GCM variables to simulate precipitation for the validation (Jan.'95-Dec.'05) and forecast (Jan.'06-Dec.'35) periods. The observed precipitation data is obtained from the India Meteorological Department (IMD) gridded precipitation product at 0.25 degree spatial resolution. This paper proposes a novel Multi-Scale Wavelet Entropy (MWE) based approach for clustering climatic variables into suitable clusters using k-means methodology. Principal Component Analysis (PCA) is used to obtain the representative Principal Components (PC) explaining 90-95% variance for each cluster. A multi-resolution non-linear approach combining Discrete Wavelet Transform (DWT) and Second Order Volterra (SoV) is used to model the representative PCs to obtain the downscaled precipitation for each downscaling location (W-P-SoV model). The results establish that wavelet-based multi-resolution SoV models perform significantly better compared to the traditional Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) based frameworks. It is observed that the proposed MWE-based clustering and subsequent PCA, helps reduce the dimensionality of the input climatic variables, while capturing more variability compared to stand-alone k-means (no MWE). The proposed models perform better in estimating the number of precipitation events during the non-monsoon periods whereas the models with clustering without MWE over-estimate the rainfall during the dry season.
A cluster randomized theory-guided oral hygiene trial in adolescents-A latent growth model.
Aleksejūnienė, J; Brukienė, V
2018-05-01
(i) To test whether theory-guided interventions are more effective than conventional dental instruction (CDI) for changing oral hygiene in adolescents and (ii) to examine whether such interventions equally benefit both genders and different socio-economic (SES) groups. A total of 244 adolescents were recruited from three schools, and cluster randomization allocated adolescents to one of the three types of interventions: two were theory-based interventions (Precaution Adoption Process Model or Authoritative Parenting Model) and CDI served as an active control. Oral hygiene levels % (OH) were assessed at baseline, after 3 months and after 12 months. A complete data set was available for 166 adolescents (the total follow-up rate: 69%). There were no significant differences in baseline OH between those who participated throughout the study and those who dropped out. Bivariate and multivariate analyses showed that theory-guided interventions produced significant improvements in oral hygiene and that there were no significant gender or socio-economic differences. Theory-guided interventions produced more positive changes in OH than CDI, and these changes did not differ between gender and SES groups. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
A novel and reliable computational intelligence system for breast cancer detection.
Zadeh Shirazi, Amin; Seyyed Mahdavi Chabok, Seyyed Javad; Mohammadi, Zahra
2018-05-01
Cancer is the second important morbidity and mortality factor among women and the most incident type is breast cancer. This paper suggests a hybrid computational intelligence model based on unsupervised and supervised learning techniques, i.e., self-organizing map (SOM) and complex-valued neural network (CVNN), for reliable detection of breast cancer. The dataset used in this paper consists of 822 patients with five features (patient's breast mass shape, margin, density, patient's age, and Breast Imaging Reporting and Data System assessment). The proposed model was used for the first time and can be categorized in two stages. In the first stage, considering the input features, SOM technique was used to cluster the patients with the most similarity. Then, in the second stage, for each cluster, the patient's features were applied to complex-valued neural network and dealt with to classify breast cancer severity (benign or malign). The obtained results corresponding to each patient were compared to the medical diagnosis results using receiver operating characteristic analyses and confusion matrix. In the testing phase, health and disease detection ratios were 94 and 95%, respectively. Accordingly, the superiority of the proposed model was proved and can be used for reliable and robust detection of breast cancer.
Population Structure With Localized Haplotype Clusters
Browning, Sharon R.; Weir, Bruce S.
2010-01-01
We propose a multilocus version of FST and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific FST estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based FST than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of FST and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data. PMID:20457877
2010-01-01
Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Henschel, Volkmar; Engel, Jutta; Hölzel, Dieter; Mansmann, Ulrich
2009-02-10
Multivariate analysis of interval censored event data based on classical likelihood methods is notoriously cumbersome. Likelihood inference for models which additionally include random effects are not available at all. Developed algorithms bear problems for practical users like: matrix inversion, slow convergence, no assessment of statistical uncertainty. MCMC procedures combined with imputation are used to implement hierarchical models for interval censored data within a Bayesian framework. Two examples from clinical practice demonstrate the handling of clustered interval censored event times as well as multilayer random effects for inter-institutional quality assessment. The software developed is called survBayes and is freely available at CRAN. The proposed software supports the solution of complex analyses in many fields of clinical epidemiology as well as health services research.
Fast simulation of electromagnetic and hadronic showers in SpaCal calorimeter at the H1 experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raičević, Nataša, E-mail: raicevic@mail.desy.de; Glazov, Alexandre
2016-03-25
The fast simulation of showers induced by electrons (positrons) in the H1 lead/scintillating-fiber calorimeter, SpaCal, based on shower library technique has been presented previously. In this paper we show the results on linearity and uniformity of the reconstructed electron/positron cluster energy in electromagnetic section of Spacal for the simulations based on shower library and GFLASH shower parametrisation. The shapes of the clusters originating from photon and hadron candidates in SpaCal are analysed and experimental distributions compared with the two simulations.
Lesion identification using unified segmentation-normalisation models and fuzzy clustering
Seghier, Mohamed L.; Ramlackhansingh, Anil; Crinion, Jenny; Leff, Alexander P.; Price, Cathy J.
2008-01-01
In this paper, we propose a new automated procedure for lesion identification from single images based on the detection of outlier voxels. We demonstrate the utility of this procedure using artificial and real lesions. The scheme rests on two innovations: First, we augment the generative model used for combined segmentation and normalization of images, with an empirical prior for an atypical tissue class, which can be optimised iteratively. Second, we adopt a fuzzy clustering procedure to identify outlier voxels in normalised gray and white matter segments. These two advances suppress misclassification of voxels and restrict lesion identification to gray/white matter lesions respectively. Our analyses show a high sensitivity for detecting and delineating brain lesions with different sizes, locations, and textures. Our approach has important implications for the generation of lesion overlap maps of a given population and the assessment of lesion-deficit mappings. From a clinical perspective, our method should help to compute the total volume of lesion or to trace precisely lesion boundaries that might be pertinent for surgical or diagnostic purposes. PMID:18482850
Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno
2016-01-22
Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
NASA Astrophysics Data System (ADS)
Beerenwinkel, Anne; von Arx, Matthias
2017-04-01
For the last three decades, moderate constructivism has become an increasingly prominent perspective in science education. Researchers have defined characteristics of constructivist-oriented science classrooms, but the implementation of such science teaching in daily classroom practice seems difficult. Against this background, we conducted a sub-study within the tri-national research project Quality of Instruction in Physics (QuIP) analysing 60 videotaped physics classes involving a large sample of students ( N = 1192) from Finland, Germany and Switzerland in order to investigate the kinds of constructivist components and teaching patterns that can be found in regular classrooms without any intervention. We applied a newly developed coding scheme to capture constructivist facets of science teaching and conducted principal component and cluster analyses to explore which components and patterns were most prominent in the classes observed. Two underlying components were found, resulting in two scales—Structured Knowledge Acquisition and Fostering Autonomy—which describe key aspects of constructivist teaching. Only the first scale was rather well established in the lessons investigated. Classes were clustered based on these scales. The analysis of the different clusters suggested that teaching physics in a structured way combined with fostering students' autonomy contributes to students' motivation. However, our regression models indicated that content knowledge is a more important predictor for students' motivation, and there was no homogeneous pattern for all gender- and country-specific subgroups investigated. The results are discussed in light of recent discussions on the feasibility of constructivism in practice.
Mumtaz, Shahzad; Nabney, Ian T; Flower, Darren R
2017-10-01
Peptide-binding MHC proteins are thought the most variable across the human population; the extreme MHC polymorphism observed is functionally important and results from constrained divergent evolution. MHCs have vital functions in immunology and homeostasis: cell surface MHC class I molecules report cell status to CD8+ T cells, NKT cells and NK cells, thus playing key roles in pathogen defence, as well as mediating smell recognition, mate choice, Adverse Drug Reactions, and transplantation rejection. MHC peptide specificity falls into several supertypes exhibiting commonality of binding. It seems likely that other supertypes exist relevant to other functions. Since comprehensive experimental characterization is intractable, structure-based bioinformatics is the only viable solution. We modelled functional MHC proteins by homology and used calculated Poisson-Boltzmann electrostatics projected from the top surface of the MHC as multi-dimensional descriptors, analysing them using state-of-the-art dimensionality reduction techniques and clustering algorithms. We were able to recover the 3 MHC loci as separate clusters and identify clear sub-groups within them, vindicating unequivocally our choice of both data representation and clustering strategy. We expect this approach to make a profound contribution to the study of MHC polymorphism and its functional consequences, and, by extension, other burgeoning structural systems, such as GPCRs. Copyright © 2017 Elsevier Inc. All rights reserved.
Astrostatistical Analysis in Solar and Stellar Physics
NASA Astrophysics Data System (ADS)
Stenning, David Craig
This dissertation focuses on developing statistical models and methods to address data-analytic challenges in astrostatistics---a growing interdisciplinary field fostering collaborations between statisticians and astrophysicists. The astrostatistics projects we tackle can be divided into two main categories: modeling solar activity and Bayesian analysis of stellar evolution. These categories from Part I and Part II of this dissertation, respectively. The first line of research we pursue involves classification and modeling of evolving solar features. Advances in space-based observatories are increasing both the quality and quantity of solar data, primarily in the form of high-resolution images. To analyze massive streams of solar image data, we develop a science-driven dimension reduction methodology to extract scientifically meaningful features from images. This methodology utilizes mathematical morphology to produce a concise numerical summary of the magnetic flux distribution in solar "active regions'' that (i) is far easier to work with than the source images, (ii) encapsulates scientifically relevant information in a more informative manner than existing schemes (i.e., manual classification schemes), and (iii) is amenable to sophisticated statistical analyses. In a related line of research, we perform a Bayesian analysis of the solar cycle using multiple proxy variables, such as sunspot numbers. We take advantage of patterns and correlations among the proxy variables to model solar activity using data from proxies that have become available more recently, while also taking advantage of the long history of observations of sunspot numbers. This model is an extension of the Yu et al. (2012) Bayesian hierarchical model for the solar cycle that used the sunspot numbers alone. Since proxies have different temporal coverage, we devise a multiple imputation scheme to account for missing data. We find that incorporating multiple proxies reveals important features of the solar cycle that are missed when the model is fit using only the sunspot numbers. In Part II of this dissertation we focus on two related lines of research involving Bayesian analysis of stellar evolution. We first focus on modeling multiple stellar populations in star clusters. It has long been assumed that all star clusters are comprised of single stellar populations---stars that formed at roughly the same time from a common molecular cloud. However, recent studies have produced evidence that some clusters host multiple populations, which has far-reaching scientific implications. We develop a Bayesian hierarchical model for multiple-population star clusters, extending earlier statistical models of stellar evolution (e.g., van Dyk et al. 2009, Stein et al. 2013). We also devise an adaptive Markov chain Monte Carlo algorithm to explore the complex posterior distribution. We use numerical studies to demonstrate that our method can recover parameters of multiple-population clusters, and also show how model misspecification can be diagnosed. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We also explore statistical properties of the estimators and determine that the influence of the prior distribution does not diminish with larger sample sizes, leading to non-standard asymptotics. In a final line of research, we present the first-ever attempt to estimate the carbon fraction of white dwarfs. This quantity has important implications for both astrophysics and fundamental nuclear physics, but is currently unknown. We use a numerical study to demonstrate that assuming an incorrect value for the carbon fraction leads to incorrect white-dwarf ages of star clusters. Finally, we present our attempt to estimate the carbon fraction of the white dwarfs in the well-studied star cluster 47 Tucanae.
Anholt, R M; Berezowski, J; Robertson, C; Stephen, C
2015-09-01
There is interest in the potential of companion animal surveillance to provide data to improve pet health and to provide early warning of environmental hazards to people. We implemented a companion animal surveillance system in Calgary, Alberta and the surrounding communities. Informatics technologies automatically extracted electronic medical records from participating veterinary practices and identified cases of enteric syndrome in the warehoused records. The data were analysed using time-series analyses and a retrospective space-time permutation scan statistic. We identified a seasonal pattern of reports of occurrences of enteric syndromes in companion animals and four statistically significant clusters of enteric syndrome cases. The cases within each cluster were examined and information about the animals involved (species, age, sex), their vaccination history, possible exposure or risk behaviour history, information about disease severity, and the aetiological diagnosis was collected. We then assessed whether the cases within the cluster were unusual and if they represented an animal or public health threat. There was often insufficient information recorded in the medical record to characterize the clusters by aetiology or exposures. Space-time analysis of companion animal enteric syndrome cases found evidence of clustering. Collection of more epidemiologically relevant data would enhance the utility of practice-based companion animal surveillance.
Tomás, Inmaculada; Regueira-Iglesias, Alba; López, Maria; Arias-Bujanda, Nora; Novoa, Lourdes; Balsa-Castro, Carlos; Tomás, Maria
2017-01-01
Currently, there is little evidence available on the development of predictive models for the diagnosis or prognosis of chronic periodontitis based on the qPCR quantification of subgingival pathobionts. Our objectives were to: (1) analyze and internally validate pathobiont-based models that could be used to distinguish different periodontal conditions at site-specific level within the same patient with chronic periodontitis; (2) develop nomograms derived from predictive models. Subgingival plaque samples were obtained from control and periodontal sites (probing pocket depth and clinical attachment loss <4 mm and >4 mm, respectively) from 40 patients with moderate-severe generalized chronic periodontitis. The samples were analyzed by qPCR using TaqMan probes and specific primers to determine the concentrations of Actinobacillus actinomycetemcomitans (Aa) , Fusobacterium nucleatum (Fn) , Parvimonas micra (Pm) , Porphyromonas gingivalis (Pg) , Prevotella intermedia (Pi) , Tannerella forsythia (Tf) , and Treponema denticola (Td) . The pathobiont-based models were obtained using multivariate binary logistic regression. The best models were selected according to specified criteria. The discrimination was assessed using receiver operating characteristic curves and numerous classification measures were thus obtained. The nomograms were built based on the best predictive models. Eight bacterial cluster-based models showed an area under the curve (AUC) ≥0.760 and a sensitivity and specificity ≥75.0%. The PiTfFn cluster showed an AUC of 0.773 (sensitivity and specificity = 75.0%). When Pm and AaPm were incorporated in the TdPiTfFn cluster, we detected the two best predictive models with an AUC of 0.788 and 0.789, respectively (sensitivity and specificity = 77.5%). The TdPiTfAa cluster had an AUC of 0.785 (sensitivity and specificity = 75.0%). When Pm was incorporated in this cluster, a new predictive model appeared with better AUC and specificity values (0.787 and 80.0%, respectively). Distinct clusters formed by species with different etiopathogenic role (belonging to different Socransky's complexes) had a good predictive accuracy for distinguishing a site with periodontal destruction in a periodontal patient. The predictive clusters with the lowest number of bacteria were PiTfFn and TdPiTfAa , while TdPiTfAaFnPm had the highest number. In all the developed nomograms, high concentrations of these clusters were associated with an increased probability of having a periodontal site in a patient with chronic periodontitis.
Tomás, Inmaculada; Regueira-Iglesias, Alba; López, Maria; Arias-Bujanda, Nora; Novoa, Lourdes; Balsa-Castro, Carlos; Tomás, Maria
2017-01-01
Currently, there is little evidence available on the development of predictive models for the diagnosis or prognosis of chronic periodontitis based on the qPCR quantification of subgingival pathobionts. Our objectives were to: (1) analyze and internally validate pathobiont-based models that could be used to distinguish different periodontal conditions at site-specific level within the same patient with chronic periodontitis; (2) develop nomograms derived from predictive models. Subgingival plaque samples were obtained from control and periodontal sites (probing pocket depth and clinical attachment loss <4 mm and >4 mm, respectively) from 40 patients with moderate-severe generalized chronic periodontitis. The samples were analyzed by qPCR using TaqMan probes and specific primers to determine the concentrations of Actinobacillus actinomycetemcomitans (Aa), Fusobacterium nucleatum (Fn), Parvimonas micra (Pm), Porphyromonas gingivalis (Pg), Prevotella intermedia (Pi), Tannerella forsythia (Tf), and Treponema denticola (Td). The pathobiont-based models were obtained using multivariate binary logistic regression. The best models were selected according to specified criteria. The discrimination was assessed using receiver operating characteristic curves and numerous classification measures were thus obtained. The nomograms were built based on the best predictive models. Eight bacterial cluster-based models showed an area under the curve (AUC) ≥0.760 and a sensitivity and specificity ≥75.0%. The PiTfFn cluster showed an AUC of 0.773 (sensitivity and specificity = 75.0%). When Pm and AaPm were incorporated in the TdPiTfFn cluster, we detected the two best predictive models with an AUC of 0.788 and 0.789, respectively (sensitivity and specificity = 77.5%). The TdPiTfAa cluster had an AUC of 0.785 (sensitivity and specificity = 75.0%). When Pm was incorporated in this cluster, a new predictive model appeared with better AUC and specificity values (0.787 and 80.0%, respectively). Distinct clusters formed by species with different etiopathogenic role (belonging to different Socransky’s complexes) had a good predictive accuracy for distinguishing a site with periodontal destruction in a periodontal patient. The predictive clusters with the lowest number of bacteria were PiTfFn and TdPiTfAa, while TdPiTfAaFnPm had the highest number. In all the developed nomograms, high concentrations of these clusters were associated with an increased probability of having a periodontal site in a patient with chronic periodontitis. PMID:28848499
NASA Astrophysics Data System (ADS)
Mlakar, P.
2004-11-01
SO2 pollution is still a significant problem in Slovenia, especially around large thermal power plants (TPPs), like the one at Šoštanj. The Šoštanj TPP is the exclusive source of SO2 in the area and is therefore a perfect example for air pollution studies. In order to understand air pollution around the Šoštanj TPP in detail, some analyses of emissions and ambient concentrations of SO2 at six automated monitoring stations in the surroundings of the TPP were made. The data base from 1991 to 1993 was used when there were no desulfurisation plants in operation. Statistical analyses of the influence of the emissions from the three TPP stacks at different measuring points were made. The analyses prove that the smallest stack (100 m) mainly pollutes villages and towns near the TPP within a radius of a few kilometres. The medium stack's (150 m) influence is noticed at shorter as well as at longer distances up to more than ten kilometres. The highest stack (230 m) pollutes mainly at longer distances, where the plume reaches the higher hills. Detailed analyses of ambient SO2 concentrations were made. They show the temporal and spatial distribution of different classes of SO2 concentrations from very low to alarming values. These analyses show that pollution patterns at a particular station remain the same if observed on a yearly basis, but can vary very much if observed on a monthly basis, mainly because of different weather patterns. Therefore the winds in the basin (as the most important feature influencing air pollution dispersion) were further analysed in detail to find clusters of similar patterns. For cluster analysis of ground-level winds patterns in the basin around the Šoštanj Thermal Power Plant, the Kohonen neural network and Leaders' method were used. Furthermore, the dependence of ambient SO2 concentrations on the clusters obtained was analysed. The results proved that effective cluster analysis can be a useful tool for compressing a huge wind data base in order to find the correlation between winds and pollutant concentrations. The analyses made provide a better insight into air pollution over complex terrain.
Cluster and propensity based approximation of a network
2013-01-01
Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust. PMID:23497424
Elastic K-means using posterior probability
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model. PMID:29240756
Kruschwitz, Johann D; Meyer-Lindenberg, Andreas; Veer, Ilya M; Wackerhagen, Carolin; Erk, Susanne; Mohnke, Sebastian; Pöhland, Lydia; Haddad, Leila; Grimm, Oliver; Tost, Heike; Romanczuk-Seiferth, Nina; Heinz, Andreas; Walter, Martin; Walter, Henrik
2015-10-01
The application of global signal regression (GSR) to resting-state functional magnetic resonance imaging data and its usefulness is a widely discussed topic. In this article, we report an observation of segregated distribution of amygdala resting-state functional connectivity (rs-FC) within the fusiform gyrus (FFG) as an effect of GSR in a multi-center-sample of 276 healthy subjects. Specifically, we observed that amygdala rs-FC was distributed within the FFG as distinct anterior versus posterior clusters delineated by positive versus negative rs-FC polarity when GSR was performed. To characterize this effect in more detail, post hoc analyses revealed the following: first, direct overlays of task-functional magnetic resonance imaging derived face sensitive areas and clusters of positive versus negative amygdala rs-FC showed that the positive amygdala rs-FC cluster corresponded best with the fusiform face area, whereas the occipital face area corresponded to the negative amygdala rs-FC cluster. Second, as expected from a hierarchical face perception model, these amygdala rs-FC defined clusters showed differential rs-FC with other regions of the visual stream. Third, dynamic connectivity analyses revealed that these amygdala rs-FC defined clusters also differed in their rs-FC variance across time to the amygdala. Furthermore, subsample analyses of three independent research sites confirmed reliability of the effect of GSR, as revealed by similar patterns of distinct amygdala rs-FC polarity within the FFG. In this article, we discuss the potential of GSR to segregate face sensitive areas within the FFG and furthermore discuss how our results may relate to the functional organization of the face-perception circuit. © 2015 Wiley Periodicals, Inc.
SAR image segmentation using skeleton-based fuzzy clustering
NASA Astrophysics Data System (ADS)
Cao, Yun Yi; Chen, Yan Qiu
2003-06-01
SAR image segmentation can be converted to a clustering problem in which pixels or small patches are grouped together based on local feature information. In this paper, we present a novel framework for segmentation. The segmentation goal is achieved by unsupervised clustering upon characteristic descriptors extracted from local patches. The mixture model of characteristic descriptor, which combines intensity and texture feature, is investigated. The unsupervised algorithm is derived from the recently proposed Skeleton-Based Data Labeling method. Skeletons are constructed as prototypes of clusters to represent arbitrary latent structures in image data. Segmentation using Skeleton-Based Fuzzy Clustering is able to detect the types of surfaces appeared in SAR images automatically without any user input.
Mazcko, Christina; Cherba, David; Hendricks, William; Lana, Susan; Ehrhart, E. J.; Charles, Brad; Fehling, Heather; Kumar, Leena; Vail, David; Henson, Michael; Childress, Michael; Kitchell, Barbara; Kingsley, Christopher; Kim, Seungchan; Neff, Mark; Davis, Barbara
2014-01-01
Background Molecularly-guided trials (i.e. PMed) now seek to aid clinical decision-making by matching cancer targets with therapeutic options. Progress has been hampered by the lack of cancer models that account for individual-to-individual heterogeneity within and across cancer types. Naturally occurring cancers in pet animals are heterogeneous and thus provide an opportunity to answer questions about these PMed strategies and optimize translation to human patients. In order to realize this opportunity, it is now necessary to demonstrate the feasibility of conducting molecularly-guided analysis of tumors from dogs with naturally occurring cancer in a clinically relevant setting. Methodology A proof-of-concept study was conducted by the Comparative Oncology Trials Consortium (COTC) to determine if tumor collection, prospective molecular profiling, and PMed report generation within 1 week was feasible in dogs. Thirty-one dogs with cancers of varying histologies were enrolled. Twenty-four of 31 samples (77%) successfully met all predefined QA/QC criteria and were analyzed via Affymetrix gene expression profiling. A subsequent bioinformatics workflow transformed genomic data into a personalized drug report. Average turnaround from biopsy to report generation was 116 hours (4.8 days). Unsupervised clustering of canine tumor expression data clustered by cancer type, but supervised clustering of tumors based on the personalized drug report clustered by drug class rather than cancer type. Conclusions Collection and turnaround of high quality canine tumor samples, centralized pathology, analyte generation, array hybridization, and bioinformatic analyses matching gene expression to therapeutic options is achievable in a practical clinical window (<1 week). Clustering data show robust signatures by cancer type but also showed patient-to-patient heterogeneity in drug predictions. This lends further support to the inclusion of a heterogeneous population of dogs with cancer into the preclinical modeling of personalized medicine. Future comparative oncology studies optimizing the delivery of PMed strategies may aid cancer drug development. PMID:24637659
Paoloni, Melissa; Webb, Craig; Mazcko, Christina; Cherba, David; Hendricks, William; Lana, Susan; Ehrhart, E J; Charles, Brad; Fehling, Heather; Kumar, Leena; Vail, David; Henson, Michael; Childress, Michael; Kitchell, Barbara; Kingsley, Christopher; Kim, Seungchan; Neff, Mark; Davis, Barbara; Khanna, Chand; Trent, Jeffrey
2014-01-01
Molecularly-guided trials (i.e. PMed) now seek to aid clinical decision-making by matching cancer targets with therapeutic options. Progress has been hampered by the lack of cancer models that account for individual-to-individual heterogeneity within and across cancer types. Naturally occurring cancers in pet animals are heterogeneous and thus provide an opportunity to answer questions about these PMed strategies and optimize translation to human patients. In order to realize this opportunity, it is now necessary to demonstrate the feasibility of conducting molecularly-guided analysis of tumors from dogs with naturally occurring cancer in a clinically relevant setting. A proof-of-concept study was conducted by the Comparative Oncology Trials Consortium (COTC) to determine if tumor collection, prospective molecular profiling, and PMed report generation within 1 week was feasible in dogs. Thirty-one dogs with cancers of varying histologies were enrolled. Twenty-four of 31 samples (77%) successfully met all predefined QA/QC criteria and were analyzed via Affymetrix gene expression profiling. A subsequent bioinformatics workflow transformed genomic data into a personalized drug report. Average turnaround from biopsy to report generation was 116 hours (4.8 days). Unsupervised clustering of canine tumor expression data clustered by cancer type, but supervised clustering of tumors based on the personalized drug report clustered by drug class rather than cancer type. Collection and turnaround of high quality canine tumor samples, centralized pathology, analyte generation, array hybridization, and bioinformatic analyses matching gene expression to therapeutic options is achievable in a practical clinical window (<1 week). Clustering data show robust signatures by cancer type but also showed patient-to-patient heterogeneity in drug predictions. This lends further support to the inclusion of a heterogeneous population of dogs with cancer into the preclinical modeling of personalized medicine. Future comparative oncology studies optimizing the delivery of PMed strategies may aid cancer drug development.
Tsai, Jack; Harpaz-Rotem, Ilan; Armour, Cherie; Southwick, Steven M; Krystal, John H; Pietrzak, Robert H
2015-05-01
To evaluate the prevalence of DSM-5 posttraumatic stress disorder (PTSD) and factor structure of PTSD symptomatology in a nationally representative sample of US veterans and examine how PTSD symptom clusters are related to depression, anxiety, suicidal ideation, hostility, physical and mental health-related functioning, and quality of life. Data were analyzed from the National Health and Resilience in Veterans Study, a nationally representative survey of 1,484 US veterans conducted from September through October 2013. Confirmatory factor analyses were conducted to evaluate the factor structure of PTSD symptoms, and structural equation models were constructed to examine the association between PTSD symptom clusters and external correlates. 12.0% of veterans screened positive for lifetime PTSD and 5.2% for past-month PTSD. A 5-factor dysphoric arousal model and a newly proposed 6-factor model both fit the data significantly better than the 4-factor model of DSM-5. The 6-factor model fit the data best in the full sample, as well as in subsamples of female veterans and veterans with lifetime PTSD. The emotional numbing symptom cluster was more strongly related to depression (P < .001) and worse mental health-related functioning (P < .001) than other symptom clusters, while the externalizing behavior symptom cluster was more strongly related to hostility (P < .001). A total of 5.2% of US veterans screened positive for past-month DSM-5 PTSD. A 6-factor model of DSM-5 PTSD symptoms, which builds on extant models and includes a sixth externalizing behavior factor, provides the best dimensional representation of DSM-5 PTSD symptom clusters and demonstrates validity in assessing health outcomes of interest in this population. © Copyright 2015 Physicians Postgraduate Press, Inc.
McNally, Richard J Q; Rankin, Judith; Shirley, Mark D F; Rushton, Stephen P; Pless-Mulloli, Tanja
2008-10-01
Whilst maternal age is an established risk factor for Patau syndrome (trisomy 13), Edwards syndrome (trisomy 18) and Down syndrome (trisomy 21), the aetiology and contribution of genetic and environmental factors remains unclear. We analysed for space-time clustering using high quality fully population-based data from a geographically defined region. The study included all cases of Patau, Edwards and Down syndrome, delivered during 1985-2003 and resident in the former Northern Region of England, including terminations of pregnancy for fetal anomaly. We applied the K-function test for space-time clustering with fixed thresholds of close in space and time using residential addresses at time of delivery. The Knox test was used to indicate the range over which the clustering effect occurred. Tests were repeated using nearest neighbour (NN) thresholds to adjust for variable population density. The study analysed 116 cases of Patau syndrome, 240 cases of Edwards syndrome and 1084 cases of Down syndrome. There was evidence of space-time clustering for Down syndrome (fixed threshold of close in space: P = 0.01, NN threshold: P = 0.02), but little or no clustering for Patau (P = 0.57, P = 0.19) or Edwards (P = 0.37, P = 0.06) syndromes. Clustering of Down syndrome was associated with cases from more densely populated areas and evidence of clustering persisted when cases were restricted to maternal age <40 years. The highly novel space-time clustering for Down syndrome suggests an aetiological role for transient environmental factors, such as infections.
Fens, Niki; van Rossum, Annelot G J; Zanen, Pieter; van Ginneken, Bram; van Klaveren, Rob J; Zwinderman, Aeilko H; Sterk, Peter J
2013-06-01
Classification of COPD is currently based on the presence and severity of airways obstruction. However, this may not fully reflect the phenotypic heterogeneity of COPD in the (ex-) smoking community. We hypothesized that factor analysis followed by cluster analysis of functional, clinical, radiological and exhaled breath metabolomic features identifies subphenotypes of COPD in a community-based population of heavy (ex-) smokers. Adults between 50-75 years with a smoking history of at least 15 pack-years derived from a random population-based survey as part of the NELSON study underwent detailed assessment of pulmonary function, chest CT scanning, questionnaires and exhaled breath molecular profiling using an electronic nose. Factor and cluster analyses were performed on the subgroup of subjects fulfilling the GOLD criteria for COPD (post-BD FEV1/FVC < 0.70). Three hundred subjects were recruited, of which 157 fulfilled the criteria for COPD and were included in the factor and cluster analysis. Four clusters were identified: cluster 1 (n = 35; 22%): mild COPD, limited symptoms and good quality of life. Cluster 2 (n = 48; 31%): low lung function, combined emphysema and chronic bronchitis and a distinct breath molecular profile. Cluster 3 (n = 60; 38%): emphysema predominant COPD with preserved lung function. Cluster 4 (n = 14; 9%): highly symptomatic COPD with mildly impaired lung function. In a leave-one-out validation analysis an accuracy of 97.4% was reached. This unbiased taxonomy for mild to moderate COPD reinforces clusters found in previous studies and thereby allows better phenotyping of COPD in the general (ex-) smoking population.
Development of an automated energy audit protocol for office buildings
NASA Astrophysics Data System (ADS)
Deb, Chirag
This study aims to enhance the building energy audit process, and bring about reduction in time and cost requirements in the conduction of a full physical audit. For this, a total of 5 Energy Service Companies in Singapore have collaborated and provided energy audit reports for 62 office buildings. Several statistical techniques are adopted to analyse these reports. These techniques comprise cluster analysis and development of prediction models to predict energy savings for buildings. The cluster analysis shows that there are 3 clusters of buildings experiencing different levels of energy savings. To understand the effect of building variables on the change in EUI, a robust iterative process for selecting the appropriate variables is developed. The results show that the 4 variables of GFA, non-air-conditioning energy consumption, average chiller plant efficiency and installed capacity of chillers should be taken for clustering. This analysis is extended to the development of prediction models using linear regression and artificial neural networks (ANN). An exhaustive variable selection algorithm is developed to select the input variables for the two energy saving prediction models. The results show that the ANN prediction model can predict the energy saving potential of a given building with an accuracy of +/-14.8%.
Modeling and analysis of collective cell migration in an in vivo three-dimensional environment
Dai, Wei; Prasad, Mohit; Luo, Junjie; Gov, Nir S.; Montell, Denise J.
2016-01-01
A long-standing question in collective cell migration has been what might be the relative advantage of forming a cluster over migrating individually. Does an increase in the size of a collectively migrating group of cells enable them to sample the chemical gradient over a greater distance because the difference between front and rear of a cluster would be greater than for single cells? We combined theoretical modeling with experiments to study collective migration of the border cells in-between nurse cells in the Drosophila egg chamber. We discovered that cluster size is positively correlated with migration speed, up to a particular point above which speed plummets. This may be due to the effect of viscous drag from surrounding nurse cells together with confinement of all of the cells within a stiff extracellular matrix. The model predicts no relationship between cluster size and velocity for cells moving on a flat surface, in contrast to movement within a 3D environment. Our analyses also suggest that the overall chemoattractant profile in the egg chamber is likely to be exponential, with the highest concentration in the oocyte. These findings provide insights into collective chemotaxis by combining theoretical modeling with experimentation. PMID:27035964
Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A.; Marks, Jonathan A.; Haiser, Henry J.; Turnbaugh, Peter J.
2015-01-01
ABSTRACT Elucidation of the molecular mechanisms underlying the human gut microbiota’s effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. PMID:25873372
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ren, Huiying; Hou, Zhangshuan; Huang, Maoyi
The Community Land Model (CLM) represents physical, chemical, and biological processes of the terrestrial ecosystems that interact with climate across a range of spatial and temporal scales. As CLM includes numerous sub-models and associated parameters, the high-dimensional parameter space presents a formidable challenge for quantifying uncertainty and improving Earth system predictions needed to assess environmental changes and risks. This study aims to evaluate the potential of transferring hydrologic model parameters in CLM through sensitivity analyses and classification across watersheds from the Model Parameter Estimation Experiment (MOPEX) in the United States. The sensitivity of CLM-simulated water and energy fluxes to hydrologicalmore » parameters across 431 MOPEX basins are first examined using an efficient stochastic sampling-based sensitivity analysis approach. Linear, interaction, and high-order nonlinear impacts are all identified via statistical tests and stepwise backward removal parameter screening. The basins are then classified accordingly to their parameter sensitivity patterns (internal attributes), as well as their hydrologic indices/attributes (external hydrologic factors) separately, using a Principal component analyses (PCA) and expectation-maximization (EM) –based clustering approach. Similarities and differences among the parameter sensitivity-based classification system (S-Class), the hydrologic indices-based classification (H-Class), and the Koppen climate classification systems (K-Class) are discussed. Within each S-class with similar parameter sensitivity characteristics, similar inversion modeling setups can be used for parameter calibration, and the parameters and their contribution or significance to water and energy cycling may also be more transferrable. This classification study provides guidance on identifiable parameters, and on parameterization and inverse model design for CLM but the methodology is applicable to other models. Inverting parameters at representative sites belonging to the same class can significantly reduce parameter calibration efforts.« less
The personality context of relational aggression: A Five-Factor Model profile analysis.
Reardon, Kathleen W; Tackett, Jennifer L; Lynam, Don
2018-05-01
Relational aggression (RAgg) is a form of behavior intended to damage the victim's social status or interpersonal relationships through the use of purposeful interpersonal manipulation or social exclusion (Archer & Coyne, 2005). RAgg is impairing, stable, and largely defined by dysfunctional patterns of interpersonal interactions-all of which invokes comparisons to personality and, more specifically, personality pathology. Leveraging research using the Five Factor Model (FFM) in personality disorder (PD) work, the present study aims to understand the personality context of RAgg by applying this FFM profile approach in 2 ways: (a) by compiling a personality profile of RAgg based on a thorough review of the relevant literature and (b) by compiling a personality profile of RAgg based on expert ratings (N = 19). We then compared these profiles to each other and to existing personality profiles of Cluster B PDs to examine how RAgg fits into the personality space represented by Cluster B PDs. These analyses indicate that both FFM profiles of RAgg show substantial overlap with the FFM profile of narcissistic PD. The present study has important implications for bridging disjointed domains of research on personality pathology and RAgg and underscores the relevance of RAgg for early emergence of PD characteristics. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Gay, Emilie; Senoussi, Rachid; Barnouin, Jacques
2007-01-01
Methods for spatial cluster detection dealing with diseases quantified by continuous variables are few, whereas several diseases are better approached by continuous indicators. For example, subclinical mastitis of the dairy cow is evaluated using a continuous marker of udder inflammation, the somatic cell score (SCS). Consequently, this study proposed to analyze spatialized risk and cluster components of herd SCS through a new method based on a spatial hazard model. The dataset included annual SCS for 34 142 French dairy herds for the year 2000, and important SCS risk factors: mean parity, percentage of winter and spring calvings, and herd size. The model allowed the simultaneous estimation of the effects of known risk factors and of potential spatial clusters on SCS, and the mapping of the estimated clusters and their range. Mean parity and winter and spring calvings were significantly associated with subclinical mastitis risk. The model with the presence of 3 clusters was highly significant, and the 3 clusters were attractive, i.e. closeness to cluster center increased the occurrence of high SCS. The three localizations were the following: close to the city of Troyes in the northeast of France; around the city of Limoges in the center-west; and in the southwest close to the city of Tarbes. The semi-parametric method based on spatial hazard modeling applies to continuous variables, and takes account of both risk factors and potential heterogeneity of the background population. This tool allows a quantitative detection but assumes a spatially specified form for clusters.
Generating clustered scale-free networks using Poisson based localization of edges
NASA Astrophysics Data System (ADS)
Türker, İlker
2018-05-01
We introduce a variety of network models using a Poisson-based edge localization strategy, which result in clustered scale-free topologies. We first verify the success of our localization strategy by realizing a variant of the well-known Watts-Strogatz model with an inverse approach, implying a small-world regime of rewiring from a random network through a regular one. We then apply the rewiring strategy to a pure Barabasi-Albert model and successfully achieve a small-world regime, with a limited capacity of scale-free property. To imitate the high clustering property of scale-free networks with higher accuracy, we adapted the Poisson-based wiring strategy to a growing network with the ingredients of both preferential attachment and local connectivity. To achieve the collocation of these properties, we used a routine of flattening the edges array, sorting it, and applying a mixing procedure to assemble both global connections with preferential attachment and local clusters. As a result, we achieved clustered scale-free networks with a computational fashion, diverging from the recent studies by following a simple but efficient approach.
Chen, Yun; Yang, Hui
2016-01-01
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581
Chen, Yun; Yang, Hui
2016-12-14
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.
Modeling tensional homeostasis in multicellular clusters.
Tam, Sze Nok; Smith, Michael L; Stamenović, Dimitrije
2017-03-01
Homeostasis of mechanical stress in cells, or tensional homeostasis, is essential for normal physiological function of tissues and organs and is protective against disease progression, including atherosclerosis and cancer. Recent experimental studies have shown that isolated cells are not capable of maintaining tensional homeostasis, whereas multicellular clusters are, with stability increasing with the size of the clusters. Here, we proposed simple mathematical models to interpret experimental results and to obtain insight into factors that determine homeostasis. Multicellular clusters were modeled as one-dimensional arrays of linearly elastic blocks that were either jointed or disjointed. Fluctuating forces that mimicked experimentally measured cell-substrate tractions were obtained from Monte Carlo simulations. These forces were applied to the cluster models, and the corresponding stress field in the cluster was calculated by solving the equilibrium equation. It was found that temporal fluctuations of the cluster stress field became attenuated with increasing cluster size, indicating that the cluster approached tensional homeostasis. These results were consistent with previously reported experimental data. Furthermore, the models revealed that key determinants of tensional homeostasis in multicellular clusters included the cluster size, the distribution of traction forces, and mechanical coupling between adjacent cells. Based on these findings, we concluded that tensional homeostasis was a multicellular phenomenon. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Review of methods for handling confounding by cluster and informative cluster size in clustered data
Seaman, Shaun; Pavlou, Menelaos; Copas, Andrew
2014-01-01
Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland. PMID:25087978
ERIC Educational Resources Information Center
Costa, Carolina; Alvelos, Helena; Teixeira, Leonor
2016-01-01
This study analyses and compares the use of Web 2.0 tools by students in both learning and leisure contexts. Data were collected based on a questionnaire applied to 234 students from the University of Aveiro (Portugal) and the results were analysed by using descriptive analysis, paired samples t-tests, cluster analyses and Kruskal-Wallis tests.…
Features of asthma which provide meaningful insights for understanding the disease heterogeneity.
Deliu, M; Yavuz, T S; Sperrin, M; Belgrave, D; Sahiner, U M; Sackesen, C; Kalayci, O; Custovic, A
2018-01-01
Data-driven methods such as hierarchical clustering (HC) and principal component analysis (PCA) have been used to identify asthma subtypes, with inconsistent results. To develop a framework for the discovery of stable and clinically meaningful asthma subtypes. We performed HC in a rich data set from 613 asthmatic children, using 45 clinical variables (Model 1), and after PCA dimensionality reduction (Model 2). Clinical experts then identified a set of asthma features/domains which informed clusters in the two analyses. In Model 3, we reclustered the data using these features to ascertain whether this improved the discovery process. Cluster stability was poor in Models 1 and 2. Clinical experts highlighted four asthma features/domains which differentiated the clusters in two models: age of onset, allergic sensitization, severity, and recent exacerbations. In Model 3 (HC using these four features), cluster stability improved substantially. The cluster assignment changed, providing more clinically interpretable results. In a 5-cluster model, we labelled the clusters as: "Difficult asthma" (n = 132); "Early-onset mild atopic" (n = 210); "Early-onset mild non-atopic: (n = 153); "Late-onset" (n = 105); and "Exacerbation-prone asthma" (n = 13). Multinomial regression demonstrated that lung function was significantly diminished among children with "Difficult asthma"; blood eosinophilia was a significant feature of "Difficult," "Early-onset mild atopic," and "Late-onset asthma." Children with moderate-to-severe asthma were present in each cluster. An integrative approach of blending the data with clinical expert domain knowledge identified four features, which may be informative for ascertaining asthma endotypes. These findings suggest that variables which are key determinants of asthma presence, severity, or control may not be the most informative for determining asthma subtypes. Our results indicate that exacerbation-prone asthma may be a separate asthma endotype and that severe asthma is not a single entity, but an extreme end of the spectrum of several different asthma endotypes. © 2017 The Authors. Clinical & Experimental Allergy published by John Wiley & Sons Ltd.
Veldhuis, Anouk; Brouwer-Middelesch, Henriëtte; Marceau, Alexis; Madouasse, Aurélien; Van der Stede, Yves; Fourichon, Christine; Welby, Sarah; Wever, Paul; van Schaik, Gerdien
2016-02-01
This study aimed to evaluate the use of routinely collected reproductive and milk production data for the early detection of emerging vector-borne diseases in cattle in the Netherlands and the Flanders region of Belgium (i.e., the northern part of Belgium). Prospective space-time cluster analyses on residuals from a model on milk production were carried out to detect clusters of reduced milk yield. A CUSUM algorithm was used to detect temporal aberrations in model residuals of reproductive performance models on two indicators of gestation length. The Bluetongue serotype-8 (BTV-8) epidemics of 2006 and 2007 and the Schmallenberg virus (SBV) epidemic of 2011 were used as case studies to evaluate the sensitivity and timeliness of these methods. The methods investigated in this study did not result in a more timely detection of BTV-8 and SBV in the Netherlands and BTV-8 in Belgium given the surveillance systems in place when these viruses emerged. This could be due to (i) the large geographical units used in the analyses (country, region and province level), and (ii) the high level of sensitivity of the surveillance systems in place when these viruses emerged. Nevertheless, it might be worthwhile to use a syndromic surveillance system based on non-specific animal health data in real-time alongside regular surveillance, to increase the sense of urgency and to provide valuable quantitative information for decision makers in the initial phase of an emerging disease outbreak. Copyright © 2015 Elsevier B.V. All rights reserved.
Saha, Abhijoy; Banerjee, Sayantan; Kurtek, Sebastian; Narang, Shivali; Lee, Joonsang; Rao, Ganesh; Martinez, Juan; Bharath, Karthik; Rao, Arvind U K; Baladandayuthapani, Veerabhadran
2016-01-01
Tumor heterogeneity is a crucial area of cancer research wherein inter- and intra-tumor differences are investigated to assess and monitor disease development and progression, especially in cancer. The proliferation of imaging and linked genomic data has enabled us to evaluate tumor heterogeneity on multiple levels. In this work, we examine magnetic resonance imaging (MRI) in patients with brain cancer to assess image-based tumor heterogeneity. Standard approaches to this problem use scalar summary measures (e.g., intensity-based histogram statistics) that do not adequately capture the complete and finer scale information in the voxel-level data. In this paper, we introduce a novel technique, DEMARCATE (DEnsity-based MAgnetic Resonance image Clustering for Assessing Tumor hEterogeneity) to explore the entire tumor heterogeneity density profiles (THDPs) obtained from the full tumor voxel space. THDPs are smoothed representations of the probability density function of the tumor images. We develop tools for analyzing such objects under the Fisher-Rao Riemannian framework that allows us to construct metrics for THDP comparisons across patients, which can be used in conjunction with standard clustering approaches. Our analyses of The Cancer Genome Atlas (TCGA) based Glioblastoma dataset reveal two significant clusters of patients with marked differences in tumor morphology, genomic characteristics and prognostic clinical outcomes. In addition, we see enrichment of image-based clusters with known molecular subtypes of glioblastoma multiforme, which further validates our representation of tumor heterogeneity and subsequent clustering techniques.
Nghia, Nguyen Anh; Kadir, Jugah; Sunderasan, E; Puad Abdullah, Mohd; Malik, Adam; Napis, Suhaimi
2008-10-01
Morphological features and Inter Simple Sequence Repeat (ISSR) polymorphism were employed to analyse 21 Corynespora cassiicola isolates obtained from a number of Hevea clones grown in rubber plantations in Malaysia. The C. cassiicola isolates used in this study were collected from several states in Malaysia from 1998 to 2005. The morphology of the isolates was characteristic of that previously described for C. cassiicola. Variations in colony and conidial morphology were observed not only among isolates but also within a single isolate with no inclination to either clonal or geographical origin of the isolates. ISSR analysis delineated the isolates into two distinct clusters. The dendrogram created from UPGMA analysis based on Nei and Li's coefficient (calculated from the binary matrix data of 106 amplified DNA bands generated from 8 ISSR primers) showed that cluster 1 encompasses 12 isolates from the states of Johor and Selangor (this cluster was further split into 2 sub clusters (1A, 1B), sub cluster 1B consists of a unique isolate, CKT05D); while cluster 2 comprises of 9 isolates that were obtained from the other states. Detached leaf assay performed on selected Hevea clones showed that the pathogenicity of representative isolates from cluster 1 (with the exception of CKT05D) resembled that of race 1; and isolates in cluster 2 showed pathogenicity similar to race 2 of the fungus that was previously identified in Malaysia. The isolate CKT05D from sub cluster 1B showed pathogenicity dissimilar to either race 1 or race 2.
Diversity and Community Can Coexist.
Stivala, Alex; Robins, Garry; Kashima, Yoshihisa; Kirley, Michael
2016-03-01
We examine the (in)compatibility of diversity and sense of community by means of agent-based models based on the well-known Schelling model of residential segregation and Axelrod model of cultural dissemination. We find that diversity and highly clustered social networks, on the assumptions of social tie formation based on spatial proximity and homophily, are incompatible when agent features are immutable, and this holds even for multiple independent features. We include both mutable and immutable features into a model that integrates Schelling and Axelrod models, and we find that even for multiple independent features, diversity and highly clustered social networks can be incompatible on the assumptions of social tie formation based on spatial proximity and homophily. However, this incompatibility breaks down when cultural diversity can be sufficiently large, at which point diversity and clustering need not be negatively correlated. This implies that segregation based on immutable characteristics such as race can possibly be overcome by sufficient similarity on mutable characteristics based on culture, which are subject to a process of social influence, provided a sufficiently large "scope of cultural possibilities" exists. © Society for Community Research and Action 2016.
Saito, Norio; Cordier, Stéphane; Lemoine, Pierric; Ohsawa, Takeo; Wada, Yoshiki; Grasset, Fabien; Cross, Jeffrey S; Ohashi, Naoki
2017-06-05
The electronic and crystal structures of Cs 2 [Mo 6 X 14 ] (X = Cl, Br, I) cluster-based compounds were investigated by density functional theory (DFT) simulations and experimental methods such as powder X-ray diffraction, ultraviolet-visible spectroscopy, and X-ray photoemission spectroscopy (XPS). The experimentally determined lattice parameters were in good agreement with theoretically optimized ones, indicating the usefulness of DFT calculations for the structural investigation of these clusters. The calculated band gaps of these compounds reproduced those experimentally determined by UV-vis reflectance within an error of a few tenths of an eV. Core-level XPS and effective charge analyses indicated bonding states of the halogens changed according to their sites. The XPS valence spectra were fairly well reproduced by simulations based on the projected electron density of states weighted with cross sections of Al K α , suggesting that DFT calculations can predict the electronic properties of metal-cluster-based crystals with good accuracy.
Low Temperature Kinetics of the First Steps of Water Cluster Formation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bourgalais, J.; Roussel, V.; Capron, M.
2016-03-01
We present a combined experimental and theoretical low temperature kinetic study of water cluster formation. Water cluster growth takes place in low temperature (23-69 K) supersonic flows. The observed kinetics of formation of water clusters are reproduced with a kinetic model based on theoretical predictions for the first steps of clusterization. The temperature-and pressure-dependent association and dissociation rate coefficients are predicted with an ab initio transition state theory based master equation approach over a wide range of temperatures (20-100 K) and pressures (10(-6) - 10 bar).
Observing the clustering properties of galaxy clusters in dynamical dark-energy cosmologies
NASA Astrophysics Data System (ADS)
Fedeli, C.; Moscardini, L.; Bartelmann, M.
2009-06-01
We study the clustering properties of galaxy clusters expected to be observed by various forthcoming surveys both in the X-ray and sub-mm regimes by the thermal Sunyaev-Zel'dovich effect. Several different background cosmological models are assumed, including the concordance ΛCDM and various cosmologies with dynamical evolution of the dark energy. Particular attention is paid to models with a significant contribution of dark energy at early times which affects the process of structure formation. Past light cone and selection effects in cluster catalogs are carefully modeled by realistic scaling relations between cluster mass and observables and by properly taking into account the selection functions of the different instruments. The results show that early dark-energy models are expected to produce significantly lower values of effective bias and both spatial and angular correlation amplitudes with respect to the standard ΛCDM model. Among the cluster catalogs studied in this work, it turns out that those based on eRosita, Planck, and South Pole Telescope observations are the most promising for distinguishing between various dark-energy models.
DMINDA: an integrated web server for DNA motif identification and analyses
Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying
2014-01-01
DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419
A density-based clustering model for community detection in complex networks
NASA Astrophysics Data System (ADS)
Zhao, Xiang; Li, Yantao; Qu, Zehui
2018-04-01
Network clustering (or graph partitioning) is an important technique for uncovering the underlying community structures in complex networks, which has been widely applied in various fields including astronomy, bioinformatics, sociology, and bibliometric. In this paper, we propose a density-based clustering model for community detection in complex networks (DCCN). The key idea is to find group centers with a higher density than their neighbors and a relatively large integrated-distance from nodes with higher density. The experimental results indicate that our approach is efficient and effective for community detection of complex networks.
Stars caught in the braking stage in young Magellanic Cloud clusters
NASA Astrophysics Data System (ADS)
D'Antona, Francesca; Milone, Antonino P.; Tailo, Marco; Ventura, Paolo; Vesperini, Enrico; di Criscienzo, Marcella
2017-08-01
The colour-magnitude diagrams of many Magellanic Cloud clusters (with ages up to 2 billion years) display extended turnoff regions where the stars leave the main sequence, suggesting the presence of multiple stellar populations with ages that may differ even by hundreds of millions of years 1,2,3 . A strongly debated question is whether such an extended turnoff is instead due to populations with different stellar rotations3,4,5,6 . The recent discovery of a 'split' main sequence in some younger clusters (~80-400 Myr) added another piece to this puzzle. The blue side of the main sequence is consistent with slowly rotating stellar models, and the red side consistent with rapidly rotating models7,8,9,10. However, a complete theoretical characterization of the observed colour-magnitude diagram also seemed to require an age spread9. We show here that, in the three clusters so far analysed, if the blue main-sequence stars are interpreted with models in which the stars have always been slowly rotating, they must be ~30% younger than the rest of the cluster. If they are instead interpreted as stars that were initially rapidly rotating but have later slowed down, the age difference disappears, and this 'braking' also helps to explain the apparent age differences of the extended turnoff. The age spreads in Magellanic Cloud clusters are thus a manifestation of rotational stellar evolution. Observational tests are suggested.
Stopka, Thomas J; Brinkley-Rubinstein, Lauren; Johnson, Kendra; Chan, Philip A; Hutcheson, Marga; Crosby, Richard; Burke, Deirdre; Mena, Leandro; Nunn, Amy
2018-04-03
In recent years, more than half of new HIV infections in the United States occur among African Americans in the Southeastern United States. Spatial epidemiological analyses can inform public health responses in the Deep South by identifying HIV hotspots and community-level factors associated with clustering. The goal of this study was to identify and characterize HIV clusters in Mississippi through analysis of state-level HIV surveillance data. We used a combination of spatial epidemiology and statistical modeling to identify and characterize HIV hotspots in Mississippi census tracts (n=658) from 2008 to 2014. We conducted spatial analyses of all HIV infections, infections among men who have sex with men (MSM), and infections among African Americans. Multivariable logistic regression analyses identified community-level sociodemographic factors associated with HIV hotspots considering all cases. There were HIV hotspots for the entire population, MSM, and African American MSM identified in the Mississippi Delta region, Southern Mississippi, and in greater Jackson, including surrounding rural counties (P<.05). In multivariable models for all HIV cases, HIV hotspots were significantly more likely to include urban census tracts (adjusted odds ratio [AOR] 2.01, 95% CI 1.20-3.37) and census tracts that had a higher proportion of African Americans (AOR 3.85, 95% CI 2.23-6.65). The HIV hotspots were less likely to include census tracts with residents who had less than a high school education (AOR 0.95, 95% CI 0.92-0.98), census tracts with residents belonging to two or more racial/ethnic groups (AOR 0.46, 95% CI 0.30-0.70), and census tracts that had a higher percentage of the population living below the poverty level (AOR 0.51, 95% CI 0.28-0.92). We used spatial epidemiology and statistical modeling to identify and characterize HIV hotspots for the general population, MSM, and African Americans. HIV clusters concentrated in Jackson and the Mississippi Delta. African American race and urban location were positively associated with clusters, whereas having less than a high school education and having a higher percentage of the population living below the poverty level were negatively associated with clusters. Spatial epidemiological analyses can inform implementation science and public health response strategies, including improved HIV testing, targeted prevention and risk reduction education, and tailored preexposure prophylaxis to address HIV disparities in the South. ©Thomas J Stopka, Lauren Brinkley-Rubinstein, Kendra Johnson, Philip A Chan, Marga Hutcheson, Richard Crosby, Deirdre Burke, Leandro Mena, Amy Nunn. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 03.04.2018.
3D morphology-based clustering and simulation of human pyramidal cell dendritic spines.
Luengo-Sanchez, Sergio; Fernaud-Espinosa, Isabel; Bielza, Concha; Benavides-Piccione, Ruth; Larrañaga, Pedro; DeFelipe, Javier
2018-06-13
The dendritic spines of pyramidal neurons are the targets of most excitatory synapses in the cerebral cortex. They have a wide variety of morphologies, and their morphology appears to be critical from the functional point of view. To further characterize dendritic spine geometry, we used in this paper over 7,000 individually 3D reconstructed dendritic spines from human cortical pyramidal neurons to group dendritic spines using model-based clustering. This approach uncovered six separate groups of human dendritic spines. To better understand the differences between these groups, the discriminative characteristics of each group were identified as a set of rules. Model-based clustering was also useful for simulating accurate 3D virtual representations of spines that matched the morphological definitions of each cluster. This mathematical approach could provide a useful tool for theoretical predictions on the functional features of human pyramidal neurons based on the morphology of dendritic spines.
Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S; Allard, Marc W; Brown, Eric W; Strain, Errol A
2017-01-01
A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents.
Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S.; Allard, Marc W.; Brown, Eric W.; Strain, Errol A.
2017-01-01
A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents. PMID:28166293
Dark Energy Survey Year 1 Results: galaxy mock catalogues for BAO
NASA Astrophysics Data System (ADS)
Avila, S.; Crocce, M.; Ross, A. J.; García-Bellido, J.; Percival, W. J.; Banik, N.; Camacho, H.; Kokron, N.; Chan, K. C.; Andrade-Oliveira, F.; Gomes, R.; Gomes, D.; Lima, M.; Rosenfeld, R.; Salvador, A. I.; Friedrich, O.; Abdalla, F. B.; Annis, J.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Carrasco Kind, M.; Carretero, J.; Castander, F. J.; Cunha, C. E.; da Costa, L. N.; Davis, C.; De Vicente, J.; Doel, P.; Fosalba, P.; Frieman, J.; Gerdes, D. W.; Gruen, D.; Gruendl, R. A.; Gutierrez, G.; Hartley, W. G.; Hollowood, D.; Honscheid, K.; James, D. J.; Kuehn, K.; Kuropatkin, N.; Miquel, R.; Plazas, A. A.; Sanchez, E.; Scarpine, V.; Schindler, R.; Schubnell, M.; Sevilla-Noarbe, I.; Smith, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Walker, A. R.; Dark Energy Survey Collaboration
2018-05-01
Mock catalogues are a crucial tool in the analysis of galaxy surveys data, both for the accurate computation of covariance matrices, and for the optimisation of analysis methodology and validation of data sets. In this paper, we present a set of 1800 galaxy mock catalogues designed to match the Dark Energy Survey Year-1 BAO sample (Crocce et al. 2017) in abundance, observational volume, redshift distribution and uncertainty, and redshift dependent clustering. The simulated samples were built upon HALOGEN (Avila et al. 2015) halo catalogues, based on a 2LPT density field with an empirical halo bias. For each of them, a lightcone is constructed by the superposition of snapshots in the redshift range 0.45 < z < 1.4. Uncertainties introduced by so-called photometric redshifts estimators were modelled with a double-skewed-Gaussian curve fitted to the data. We populate halos with galaxies by introducing a hybrid Halo Occupation Distribution - Halo Abundance Matching model with two free parameters. These are adjusted to achieve a galaxy bias evolution b(zph) that matches the data at the 1-σ level in the range 0.6 < zph < 1.0. We further analyse the galaxy mock catalogues and compare their clustering to the data using the angular correlation function w(θ), the comoving transverse separation clustering ξμ < 0.8(s⊥) and the angular power spectrum Cℓ, finding them in agreement. This is the first large set of three-dimensional {ra,dec,z} galaxy mock catalogues able to simultaneously accurately reproduce the photometric redshift uncertainties and the galaxy clustering.
Deltour, Isabelle; Wiart, Joe; Taki, Masao; Wake, Kanako; Varsier, Nadège; Mann, Simon; Schüz, Joachim; Cardis, Elisabeth
2011-12-01
The three-dimensional distribution of the specific absorption rate of energy (SAR) in phantom models was analysed to detect clusters of mobile phones producing similar spatial deposition of energy in the head. The clusters' characteristics were described from the phones external features, frequency band and communication protocol. Compliance measurements with phones in cheek and tilt positions, and on the left and right side of a physical phantom were used. Phones used the Personal Digital Cellular (PDC), Code division multiple access One (CdmaOne), Global System for Mobile Communications (GSM) and Nordic Mobile Telephony (NMT) communication systems, in the 800, 900, 1500 and 1800 MHz bands. Each phone's measurements were summarised by the half-ellipsoid in which the SAR values were above half the maximum value. Cluster analysis used the Partitioning Around Medoids algorithm. The dissimilarity measure was based on the overlap of the ellipsoids, and the Manhattan distance was used for robustness analysis. Within the 800 MHz frequency band, and in part within the 900 MHz and the 1800 MHz frequency bands, weak clustering was obtained for the handset shape (bar phone, flip with top and flip with central antennas), but only in specific positions (tilt or cheek). On measurements of 120 phones, the three-dimensional distribution of SAR in phantom models did not appear to be related to particular external phone characteristics or measurement characteristics, which could be used for refining the assessment of exposure to radiofrequency energy within the brain in epidemiological studies such as the Interphone. Copyright © 2011 Wiley Periodicals, Inc.
Grams, Vanessa; Wellmann, Robin; Preuß, Siegfried; Grashorn, Michael A; Kjaer, Jörgen B; Bessei, Werner; Bennewitz, Jörn
2015-09-30
Feather pecking (FP) in laying hens is a well-known and multi-factorial behaviour with a genetic background. In a selection experiment, two lines were developed for 11 generations for high (HFP) and low (LFP) feather pecking, respectively. Starting with the second generation of selection, there was a constant difference in mean number of FP bouts between both lines. We used the data from this experiment to perform a quantitative genetic analysis and to map selection signatures. Pedigree and phenotypic data were available for the last six generations of both lines. Univariate quantitative genetic analyses were conducted using mixed linear and generalized mixed linear models assuming a Poisson distribution. Selection signatures were mapped using 33,228 single nucleotide polymorphisms (SNPs) genotyped on 41 HFP and 34 LFP individuals of generation 11. For each SNP, we estimated Wright's fixation index (FST). We tested the null hypothesis that FST is driven purely by genetic drift against the alternative hypothesis that it is driven by genetic drift and selection. The mixed linear model failed to analyze the LFP data because of the large number of 0s in the observation vector. The Poisson model fitted the data well and revealed a small but continuous genetic trend in both lines. Most of the 17 genome-wide significant SNPs were located on chromosomes 3 and 4. Thirteen clusters with at least two significant SNPs within an interval of 3 Mb maximum were identified. Two clusters were mapped on chromosomes 3, 4, 8 and 19. Of the 17 genome-wide significant SNPs, 12 were located within the identified clusters. This indicates a non-random distribution of significant SNPs and points to the presence of selection sweeps. Data on FP should be analysed using generalised linear mixed models assuming a Poisson distribution, especially if the number of FP bouts is small and the distribution is heavily peaked at 0. The FST-based approach was suitable to map selection signatures that need to be confirmed by linkage or association mapping.
Fraver, Shawn; D'Amato, Anthony W.; Bradford, John B.; Jonsson, Bengt Gunnar; Jönsson, Mari; Esseen, Per-Anders
2013-01-01
Question: What factors best characterize tree competitive environments in this structurally diverse old-growth forest, and do these factors vary spatially within and among stands? Location: Old-growth Picea abies forest of boreal Sweden. Methods: Using long-term, mapped permanent plot data augmented with dendrochronological analyses, we evaluated the effect of neighbourhood competition on focal tree growth by means of standard competition indices, each modified to include various metrics of trees size, neighbour mortality weighting (for neighbours that died during the inventory period), and within-neighbourhood tree clustering. Candidate models were evaluated using mixed-model linear regression analyses, with mean basal area increment as the response variable. We then analysed stand-level spatial patterns of competition indices and growth rates (via kriging) to determine if the relationship between these patterns could further elucidate factors influencing tree growth. Results: Inter-tree competition clearly affected growth rates, with crown volume being the size metric most strongly influencing the neighbourhood competitive environment. Including neighbour tree mortality weightings in models only slightly improved descriptions of competitive interactions. Although the within-neighbourhood clustering index did not improve model predictions, competition intensity was influenced by the underlying stand-level tree spatial arrangement: stand-level clustering locally intensified competition and reduced tree growth, whereas in the absence of such clustering, inter-tree competition played a lesser role in constraining tree growth. Conclusions: Our findings demonstrate that competition continues to influence forest processes and structures in an old-growth system that has not experienced major disturbances for at least two centuries. The finding that the underlying tree spatial pattern influenced the competitive environment suggests caution in interpreting traditional tree competition studies, in which tree spatial patterning is typically not taken into account. Our findings highlight the importance of forest structure – particularly the spatial arrangement of trees – in regulating inter-tree competition and growth in structurally diverse forests, and they provide insight into the causes and consequences of heterogeneity in this old-growth system.
Ruths, Troy; Nakhleh, Luay
2013-05-07
Cis-regulatory networks (CRNs) play a central role in cellular decision making. Like every other biological system, CRNs undergo evolution, which shapes their properties by a combination of adaptive and nonadaptive evolutionary forces. Teasing apart these forces is an important step toward functional analyses of the different components of CRNs, designing regulatory perturbation experiments, and constructing synthetic networks. Although tests of neutrality and selection based on molecular sequence data exist, no such tests are currently available based on CRNs. In this work, we present a unique genotype model of CRNs that is grounded in a genomic context and demonstrate its use in identifying portions of the CRN with properties explainable by neutral evolutionary forces at the system, subsystem, and operon levels. We leverage our model against experimentally derived data from Escherichia coli. The results of this analysis show statistically significant and substantial neutral trends in properties previously identified as adaptive in origin--degree distribution, clustering coefficient, and motifs--within the E. coli CRN. Our model captures the tightly coupled genome-interactome of an organism and enables analyses of how evolutionary events acting at the genome level, such as mutation, and at the population level, such as genetic drift, give rise to neutral patterns that we can quantify in CRNs.
Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F
2017-04-01
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
Cluster-based control of a separating flow over a smoothly contoured ramp
NASA Astrophysics Data System (ADS)
Kaiser, Eurika; Noack, Bernd R.; Spohn, Andreas; Cattafesta, Louis N.; Morzyński, Marek
2017-12-01
The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. The proposed closed-loop control framework addresses a key issue of model-based control: The actuation effect often results from slow dynamics of strongly nonlinear interactions which the flow reveals at timescales much longer than the prediction horizon of any model. Hence, we employ a probabilistic approach based on a cluster-based discretization of the Liouville equation for the evolution of the probability distribution. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a control-dependent Markov model. This Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is determined. We examine how the approach can be used to improve the open-loop actuation in a separating flow dominated by Kelvin-Helmholtz shedding. For this purpose, the feature space, in which the model is learned, and the admissible control inputs are tailored to strongly oscillatory flows.
Miething, Alexander; Rostila, Mikael; Edling, Christofer; Rydgren, Jens
2016-01-01
The present study examines how the composition of social networks and perceived relationship content influence peer clustering in smoking, and how the association changes during the transition from late adolescence to early adulthood. The analysis was based on a Swedish two-wave survey sample comprising ego-centric network data. Respondents were 19 years old in the initial wave, and 23 when the follow-up sample was conducted. 17,227 ego-alter dyads were included in the analyses, which corresponds to an average response rate of 48.7 percent. Random effects logistic regression models were performed to calculate gender-specific average marginal effects of social network characteristics on smoking. The association of egos' and alters' smoking behavior was confirmed and found to be stronger when correlated in the female sample. For females, the associations decreased between age 19 and 23. Interactions between network characteristics and peer clustering in smoking showed that intense social interactions with smokers increase egos' smoking probability. The influence of network structures on peer clustering in smoking decreased during the transition from late adolescence to early adulthood. The study confirmed peer clustering in smoking and revealed that females' smoking behavior in particular is determined by social interactions. Female smokers' propensity to interact with other smokers was found to be associated with the quality of peer relationships, frequent social interactions, and network density. The influence of social networks on peer clustering in smoking decreased during the transition from late adolescence to early adulthood.
Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.; ...
2017-11-23
We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 < z < 0.35 to determine whether there is any preferential tendency for satellites to point radially towards cluster centres. Here, we analyse the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianizationmore » shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.« less
Rostila, Mikael; Edling, Christofer; Rydgren, Jens
2016-01-01
Objectives The present study examines how the composition of social networks and perceived relationship content influence peer clustering in smoking, and how the association changes during the transition from late adolescence to early adulthood. Methods The analysis was based on a Swedish two-wave survey sample comprising ego-centric network data. Respondents were 19 years old in the initial wave, and 23 when the follow-up sample was conducted. 17,227 ego-alter dyads were included in the analyses, which corresponds to an average response rate of 48.7 percent. Random effects logistic regression models were performed to calculate gender-specific average marginal effects of social network characteristics on smoking. Results The association of egos’ and alters’ smoking behavior was confirmed and found to be stronger when correlated in the female sample. For females, the associations decreased between age 19 and 23. Interactions between network characteristics and peer clustering in smoking showed that intense social interactions with smokers increase egos’ smoking probability. The influence of network structures on peer clustering in smoking decreased during the transition from late adolescence to early adulthood. Conclusions The study confirmed peer clustering in smoking and revealed that females’ smoking behavior in particular is determined by social interactions. Female smokers’ propensity to interact with other smokers was found to be associated with the quality of peer relationships, frequent social interactions, and network density. The influence of social networks on peer clustering in smoking decreased during the transition from late adolescence to early adulthood. PMID:27727314
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.
We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 < z < 0.35 to determine whether there is any preferential tendency for satellites to point radially towards cluster centres. Here, we analyse the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianizationmore » shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.« less
Hsu, David
2015-09-27
Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression,more » also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.« less
Some Psychometric and Design Implications of Game-Based Learning Analytics
ERIC Educational Resources Information Center
Gibson, David; Clarke-Midura, Jody
2013-01-01
The rise of digital game and simulation-based learning applications has led to new approaches in educational measurement that take account of patterns in time, high resolution paths of action, and clusters of virtual performance artifacts. The new approaches, which depart from traditional statistical analyses, include data mining, machine…
An Empirical Typology of Perfectionism in Academically Talented Children.
ERIC Educational Resources Information Center
Parker, Wayne D.
1997-01-01
A national sample of 820 academically talented children took the Multidimensional Perfectionism Scale. Cluster analyses of scores found a three-cluster solution. Further analyses indicated that these clusters were: nonperfectionistic (32.%), healthy perfectionistic (41.7%), and dysfunctional perfectionistic (25.5%). The construct of perfectionism…
PuReD-MCL: a graph-based PubMed document clustering methodology.
Theodosiou, T; Darzentas, N; Angelis, L; Ouzounis, C A
2008-09-01
Biomedical literature is the principal repository of biomedical knowledge, with PubMed being the most complete database collecting, organizing and analyzing such textual knowledge. There are numerous efforts that attempt to exploit this information by using text mining and machine learning techniques. We developed a novel approach, called PuReD-MCL (Pubmed Related Documents-MCL), which is based on the graph clustering algorithm MCL and relevant resources from PubMed. PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation. This process allows users to analyse the results by highlighting important clues, and finally to visualize the clusters and all relevant information using an interactive graph layout algorithm, for instance BioLayout Express 3D. The methodology was applied to two different datasets, previously used for the validation of the document clustering tool TextQuest. The first dataset involves the organisms Escherichia coli and yeast, whereas the second is related to Drosophila development. PuReD-MCL successfully reproduces the annotated results obtained from TextQuest, while at the same time provides additional insights into the clusters and the corresponding documents. Source code in perl and R are available from http://tartara.csd.auth.gr/~theodos/
Spatial, Temporal and Spatio-Temporal Patterns of Maritime Piracy.
Marchione, Elio; Johnson, Shane D
2013-11-01
To examine patterns in the timing and location of incidents of maritime piracy to see whether, like many urban crimes, attacks cluster in space and time. Data for all incidents of maritime piracy worldwide recorded by the National Geospatial Intelligence Agency are analyzed using time-series models and methods originally developed to detect disease contagion. At the macro level, analyses suggest that incidents of pirate attacks are concentrated in five subregions of the earth's oceans and that the time series for these different subregions differ. At the micro level, analyses suggest that for the last 16 years (or more), pirate attacks appear to cluster in space and time suggesting that patterns are not static but are also not random. Much like other types of crime, pirate attacks cluster in space, and following an attack at one location the risk of others at the same location or nearby is temporarily elevated. The identification of such regularities has implications for the understanding of maritime piracy and for predicting the future locations of attacks.
Analytical halo model of galactic conformity
NASA Astrophysics Data System (ADS)
Pahwa, Isha; Paranjape, Aseem
2017-09-01
We present a fully analytical halo model of colour-dependent clustering that incorporates the effects of galactic conformity in a halo occupation distribution framework. The model, based on our previous numerical work, describes conformity through a correlation between the colour of a galaxy and the concentration of its parent halo, leading to a correlation between central and satellite galaxy colours at fixed halo mass. The strength of the correlation is set by a tunable 'group quenching efficiency', and the model can separately describe group-level correlations between galaxy colour (1-halo conformity) and large-scale correlations induced by assembly bias (2-halo conformity). We validate our analytical results using clustering measurements in mock galaxy catalogues, finding that the model is accurate at the 10-20 per cent level for a wide range of luminosities and length-scales. We apply the formalism to interpret the colour-dependent clustering of galaxies in the Sloan Digital Sky Survey (SDSS). We find good overall agreement between the data and a model that has 1-halo conformity at a level consistent with previous results based on an SDSS group catalogue, although the clustering data require satellites to be redder than suggested by the group catalogue. Within our modelling uncertainties, however, we do not find strong evidence of 2-halo conformity driven by assembly bias in SDSS clustering.
Kurczynska, Monika; Kotulska, Malgorzata
2018-01-01
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters-the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.
Riemannian multi-manifold modeling and clustering in brain networks
NASA Astrophysics Data System (ADS)
Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.
2017-08-01
This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.
Spectral Modeling of the 0.4-2.5 μm Phobos CRISM dataset
NASA Astrophysics Data System (ADS)
Pajola, Maurizio; Roush, Ted; Dalle Ore, Cristina; Marzo, Giuseppe A.; Simioni, Emanuele
2017-04-01
We present the spectral modeling of the 0.4-2.5 μm MRO/CRISM Phobos dataset. After applying a statistical clustering technique, based on a K-means partitioning algorithm, we identified eight separate clusters in the Phobos CRISM data, extending the surface coverage beyond the previous analyses of Fraeman et al. (2012, 2014). Each resulting cluster is characterized by an average and its associated variability. We modeled these different spectra using a radiative transfer code based on the approach of Shkuratov et al. (1999). We used the optical constants of the model proposed by Pajola et al. (2013) in our effort, i.e. the Tagish Lake meteorite (TL) and the Mg-rich pyroxene glass (PM80). The Shkuratov model is used in an algorithm that iteratively, and simultaneously changes the relative abundance and grain sizes of the selected components to minimize the differences between the model and observations using a chi-squared criterion. The best-fitting models were achieved with a simple intimate mixture showing that the relative percentages of TL and PM80 vary between 80-20% and 95-5%, respectively, and grain sizes for TL are 12-14 μm and 20-22 μm for PM80. This work aims to return a detailed picture of the surface properties of Phobos identifying specific areas that may be of interest for future planetary exploration, as the proposed Japanese Mars Moon eXploration (MMX) sample return mission. Acknowledgements: We make use of the public NASA-Planetary Data System MRO-CRISM spectral data of Phobos. M.P. was supported for this research by an appointment to the National Aeronautics and Space Administration (NASA) Post-doctoral Program at the Ames Research Center administered by Universities Space Research Association (USRA) through a contract with NASA. References: Fraeman et al. 2012, J. Geophy. Res, E00J15, 10.1029/2012JE004137; Fraeman et al., 2014, Icarus, 229, 196-205, 10.1016/icarus.2013.11.021; Shkuratov, Y. et al. (1999), Icarus, 137, 235. Pajola et al., 2013, The Astrophysical Journal, 777:127, 10.1088/0004-637X/777/2/127.
Gaussian mixture clustering and imputation of microarray data.
Ouyang, Ming; Welsh, William J; Georgopoulos, Panos
2004-04-12
In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.
Monoatomic and cluster beam effect on ToF-SIMS spectra of self-assembled monolayers on gold
NASA Astrophysics Data System (ADS)
Tuccitto, N.; Torrisi, V.; Delfanti, I.; Licciardello, A.
2008-12-01
Self-assembled monolayers represent well-defined systems that is a good model surface to study the effect of primary ion beams used in secondary ion mass spectrometry. The effect of polyatomic primary beams on both aliphatic and aromatic self-assembled monolayers has been studied. In particular, we analysed the variation of the relative secondary ion yield of both substrate metal-cluster (Au n-) in comparison with the molecular ions (M -) and clusters (M xAu y-) by using Bi +, Bi 3+, Bi 5+ beams. Moreover, the differences in the secondary ion generation efficiency are discussed. The main effect of the cluster beams is related to an increased formation of low-mass fragments and to the enhancement of the substrate related gold-clusters. The results show that, at variance of many other cases, the static SIMS of self-assembled monolayers does not benefit of the use of polyatomic primary ions.
Marques, J M C; Pais, A A C C; Abreu, P E
2012-02-05
The efficiency of the so-called big-bang method for the optimization of atomic clusters is analysed in detail for Morse pair potentials with different ranges; here, we have used Morse potentials with four different ranges, from long- ρ = 3) to short-ranged ρ = 14) interactions. Specifically, we study the efficacy of the method in discovering low-energy structures, including the putative global minimum, as a function of the potential range and the cluster size. A new global minimum structure for long-ranged ρ = 3) Morse potential at the cluster size of n= 240 is reported. The present results are useful to assess the maximum cluster size for each type of interaction where the global minimum can be discovered with a limited number of big-bang trials. Copyright © 2011 Wiley Periodicals, Inc.
Integrating Xgrid into the HENP distributed computing model
NASA Astrophysics Data System (ADS)
Hajdu, L.; Kocoloski, A.; Lauret, J.; Miller, M.
2008-07-01
Modern Macintosh computers feature Xgrid, a distributed computing architecture built directly into Apple's OS X operating system. While the approach is radically different from those generally expected by the Unix based Grid infrastructures (Open Science Grid, TeraGrid, EGEE), opportunistic computing on Xgrid is nonetheless a tempting and novel way to assemble a computing cluster with a minimum of additional configuration. In fact, it requires only the default operating system and authentication to a central controller from each node. OS X also implements arbitrarily extensible metadata, allowing an instantly updated file catalog to be stored as part of the filesystem itself. The low barrier to entry allows an Xgrid cluster to grow quickly and organically. This paper and presentation will detail the steps that can be taken to make such a cluster a viable resource for HENP research computing. We will further show how to provide to users a unified job submission framework by integrating Xgrid through the STAR Unified Meta-Scheduler (SUMS), making tasks and jobs submission effortlessly at reach for those users already using the tool for traditional Grid or local cluster job submission. We will discuss additional steps that can be taken to make an Xgrid cluster a full partner in grid computing initiatives, focusing on Open Science Grid integration. MIT's Xgrid system currently supports the work of multiple research groups in the Laboratory for Nuclear Science, and has become an important tool for generating simulations and conducting data analyses at the Massachusetts Institute of Technology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qin, Ling, E-mail: qinling@hfut.edu.cn; Jiangsu Engineering Technology Research Center of Environmental Cleaning Materials; State Key Laboratory of Coordination Chemistry, School of Chemistry and Chemical Engineering, Nanjing National Laboratory of Microstructures, Nanjing University, Nanjing 210093
2016-07-15
Two zinc coordination polymers {[Zn_2(TPPBDA)(oba)_2]·DMF·1.5H_2O}{sub n} (1), {[Zn(TPPBDA)_1_/_2(tpdc)]·DMF}{sub n} (2) have been synthesized by zinc metal salt, nanosized tetradentate pyridine ligand with flexible or rigid V-shaped carboxylate co-ligands. These complexes were characterized by elemental analyses and X-ray single-crystal diffraction analyses. Compound 1 is a 2-fold interpenetrated 3D framework with [Zn{sub 2}(CO{sub 2}){sub 4}] clusters. Compound 2 can be defined as a five folded interpenetrating bbf topology with mononuclear Zn{sup 2+}. These mononuclear or dinuclear cluster units are linked by mix-ligands, resulting in various degrees of interpenetration. In addition, the photoluminescent properties for TPPBDA ligand under different state and coordination polymersmore » have been investigated in detail. - Graphical abstract: Two zinc coordination polymers have been synthesized by zinc metal salt, nanosized tetradentate pyridine ligand with flexible or rigid V-shaped carboxylate co-ligands. Compound 1 is a 2-fold interpenetrated 3D framework with [Zn{sub 2}(CO{sub 2}){sub 4}] clusters. Compound 2 can be defined as a five folded interpenetrating bbf topology with mononuclear Zn{sup 2+}. In addition, the photoluminescent properties for TPPBDA ligand under different status and coordination polymers have been investigated in detail. Display Omitted - Highlights: • Two Zn coordination polymers based on mononuclear or dinuclear cluster units have been synthesized. • Compound 1 is a 2-fold interpenetrated 3D framework with [Zn{sub 2}(CO{sub 2}){sub 4}] clusters. • Compound 2 is a five folded interpenetrating bbf topology with mononuclear Zn{sup 2+}. • The photoluminescent properties for TPPBDA with different state and two coordination polymers have been investigated.« less
Clustering of dietary intake and sedentary behavior in 2-year-old children.
Gubbels, Jessica S; Kremers, Stef P J; Stafleu, Annette; Dagnelie, Pieter C; de Vries, Sanne I; de Vries, Nanne K; Thijs, Carel
2009-08-01
To examine clustering of energy balance-related behaviors (EBRBs) in young children. This is crucial because lifestyle habits are formed at an early age and track in later life. This study is the first to examine EBRB clustering in children as young as 2 years. Cross-sectional data originated from the Child, Parent and Health: Lifestyle and Genetic Constitution (KOALA) Birth Cohort Study. Parents of 2578 2-year-old children completed a questionnaire. Correlation analyses, principal component analyses, and linear regression analyses were performed to examine clustering of EBRBs. We found modest but consistent correlations in EBRBs. Two clusters emerged: a "sedentary-snacking cluster" and a "fiber cluster." Television viewing clustered with computer use and unhealthy dietary behaviors. Children who frequently consumed vegetables also consumed fruit and brown bread more often and white bread less often. Lower maternal education and maternal obesity were associated with high scores on the sedentary-snacking cluster, whereas higher educational level was associated with high fiber cluster scores. Obesity-prone behavioral clusters are already visible in 2-year-old children and are related to maternal characteristics. The findings suggest that obesity prevention should apply an integrated approach to physical activity and dietary intake in early childhood.
Text Summarization Model based on Facility Location Problem
NASA Astrophysics Data System (ADS)
Takamura, Hiroya; Okumura, Manabu
e propose a novel multi-document generic summarization model based on the budgeted median problem, which is a facility location problem. The summarization method based on our model is an extractive method, which selects sentences from the given document cluster and generates a summary. Each sentence in the document cluster will be assigned to one of the selected sentences, where the former sentece is supposed to be represented by the latter. Our method selects sentences to generate a summary that yields a good sentence assignment and hence covers the whole content of the document cluster. An advantage of this method is that it can incorporate asymmetric relations between sentences such as textual entailment. Through experiments, we showed that the proposed method yields good summaries on the dataset of DUC'04.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pizzuti, L.; Sartoris, B.; Borgani, S.
We perform a maximum likelihood kinematic analysis of the two dynamically relaxed galaxy clusters MACS J1206.2-0847 at z =0.44 and RXC J2248.7-4431 at z =0.35 to determine the total mass profile in modified gravity models, using a modified version of the MAMPOSSt code of Mamon, Biviano and Bou and apos;e. Our work is based on the kinematic and lensing mass profiles derived using the data from the Cluster Lensing And Supernova survey with Hubble (hereafter CLASH) and the spectroscopic follow-up with the Very Large Telescope (hereafter CLASH-VLT). We assume a spherical Navarro-Frenk-White (NFW hereafter) profile in order to obtain amore » constraint on the fifth force interaction range λ for models in which the dependence of this parameter on the environment is negligible at the scale considered (i.e. λ= const ) and fixing the fifth force strength to the value predicted in f (R) gravity. We then use information from lensing analysis to put a prior on the other NFW free parameters. In the case of MACSJ 1206 the joint kinematic+lensing analysis leads to an upper limit on the effective interaction range λ≤1.61 mpc at Δχ{sup 2}=2.71 on the marginalized distribution. For RXJ 2248 instead a possible tension with the ΛCDM model appears when adding lensing information, with a lower limit λ≥0.14 mpc at Δχ{sup 2}=2.71. This is consequence of the slight difference between the lensing and kinematic data, appearing in GR for this cluster, that could in principle be explained in terms of modifications of gravity. We discuss the impact of systematics and the limits of our analysis as well as future improvements of the results obtained. This work has interesting implications in view of upcoming and future large imaging and spectroscopic surveys, that will deliver lensing and kinematic mass reconstruction for a large number of galaxy clusters.« less
Li, Xin; Yang, Zhong-Zhi
2005-05-12
We present a potential model for Li(+)-water clusters based on a combination of the atom-bond electronegativity equalization and molecular mechanics (ABEEM/MM) that is to take ABEEM charges of the cation and all atoms, bonds, and lone pairs of water molecules into the intermolecular electrostatic interaction term in molecular mechanics. The model allows point charges on cationic site and seven sites of an ABEEM-7P water molecule to fluctuate responding to the cluster geometry. The water molecules in the first sphere of Li(+) are strongly structured and there is obvious charge transfer between the cation and the water molecules; therefore, the charge constraint on the ionic cluster includes the charged constraint on the Li(+) and the first-shell water molecules and the charge neutrality constraint on each water molecule in the external hydration shells. The newly constructed potential model based on ABEEM/MM is first applied to ionic clusters and reproduces gas-phase state properties of Li(+)(H(2)O)(n) (n = 1-6 and 8) including optimized geometries, ABEEM charges, binding energies, frequencies, and so on, which are in fair agreement with those measured by available experiments and calculated by ab initio methods. Prospects and benefits introduced by this potential model are pointed out.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie
PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less
Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie; ...
2018-05-28
PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less
NASA Astrophysics Data System (ADS)
Thölken, Sophia; Schrabback, Tim; Reiprich, Thomas H.; Lovisari, Lorenzo; Allen, Steven W.; Hoekstra, Henk; Applegate, Douglas; Buddendiek, Axel; Hicks, Amalia
2018-03-01
Context. Observations of relaxed, massive, and distant clusters can provide important tests of standard cosmological models, for example by using the gas mass fraction. To perform this test, the dynamical state of the cluster and its gas properties have to be investigated. X-ray analyses provide one of the best opportunities to access this information and to determine important properties such as temperature profiles, gas mass, and the total X-ray hydrostatic mass. For the last of these, weak gravitational lensing analyses are complementary independent probes that are essential in order to test whether X-ray masses could be biased. Aims: We study the very luminous, high redshift (z = 0.902) galaxy cluster Cl J120958.9+495352 using XMM-Newton data. We measure global cluster properties and study the temperature profile and the cooling time to investigate the dynamical status with respect to the presence of a cool core. We use Hubble Space Telescope (HST) weak lensing data to estimate its total mass and determine the gas mass fraction. Methods: We perform a spectral analysis using an XMM-Newton observation of 15 ks cleaned exposure time. As the treatment of the background is crucial, we use two different approaches to account for the background emission to verify our results. We account for point spread function effects and deproject our results to estimate the gas mass fraction of the cluster. We measure weak lensing galaxy shapes from mosaic HST imaging and select background galaxies photometrically in combination with imaging data from the William Herschel Telescope. Results: The X-ray luminosity of Cl J120958.9+495352 in the 0.1-2.4 keV band estimated from our XMM-Newton data is LX = (13.4+1.2-1.0) × 1044 erg/s and thus it is one of the most X-ray luminous clusters known at similarly high redshift. We find clear indications for the presence of a cool core from the temperature profile and the central cooling time, which is very rare at such high redshifts. Based on the weak lensing analysis, we estimate a cluster mass of M500/1014 M⊙ = 4.4+2.2-2.0 (stat.) + 0.6 (sys.) and a gas mass fraction of fgas,2500 = 0.11-0.03+0.06 in good agreement with previous findings for high redshift and local clusters.
Barr, Kelly R.; Kus, Barbara E.; Preston, Kristine; Howell, Scarlett; Perkins, Emily; Vandergast, Amy
2015-01-01
Achieving long-term persistence of species in urbanized landscapes requires characterizing population genetic structure to understand and manage the effects of anthropogenic disturbance on connectivity. Urbanization over the past century in coastal southern California has caused both precipitous loss of coastal sage scrub habitat and declines in populations of the cactus wren (Campylorhynchus brunneicapillus). Using 22 microsatellite loci, we found that remnant cactus wren aggregations in coastal southern California comprised 20 populations based on strict exact tests for population differentiation, and 12 genetic clusters with hierarchical Bayesian clustering analyses. Genetic structure patterns largely mirrored underlying habitat availability, with cluster and population boundaries coinciding with fragmentation caused primarily by urbanization. Using a habitat model we developed, we detected stronger associations between habitat-based distances and genetic distances than Euclidean geographic distance. Within populations, we detected a positive association between available local habitat and allelic richness and a negative association with relatedness. Isolation-by-distance patterns varied over the study area, which we attribute to temporal differences in anthropogenic landscape development. We also found that genetic bottleneck signals were associated with wildfire frequency. These results indicate that habitat fragmentation and alterations have reduced genetic connectivity and diversity of cactus wren populations in coastal southern California. Management efforts focused on improving connectivity among remaining populations may help to ensure population persistence.
NASA Astrophysics Data System (ADS)
Howard, Emma; Meehan, Maria; Parnell, Andrew
2018-05-01
In Maths for Business, a mathematics module for non-mathematics specialists, students are given the choice of completing the module content via short online videos, live lectures or a combination of both. In this study, we identify students' specific usage patterns with both of these resources and discuss their reasons for the preferences they exhibit. In 2015-2016, we collected quantitative data on each student's resource usage (attendance at live lectures and access of online videos) for the entire class of 522 students and employed model-based clustering which identified four distinct resource usage patterns with lectures and/or videos. We also collected qualitative data on students' perceptions of resource usage through a survey administered at the end of the semester, to which 161 students responded. The 161 survey responses were linked to each cluster and analysed using thematic analysis. Perceived benefits of videos include flexibility of scheduling and pace, and avoidance of large, long lectures. In contrast, the main perceived advantages of lectures are the ability to engage in group tasks, to ask questions, and to learn 'gradually'. Students in the two clusters with high lecture attendance achieved, on average, higher marks in the module.
Temporal asymmetries in Interbank Market: an empirically grounded Agent-Based Model
NASA Astrophysics Data System (ADS)
Zlatic, Vinko; Popovic, Marko; Abraham, Hrvoje; Caldarelli, Guido; Iori, Giulia
2014-03-01
We analyse the changes in the topology of the structure of the E-mid interbank market in the period from September 1st 1999 to September 1st 2009. We uncover a type of temporal irreversibility in the growth of the largest component of the interbank trading network, which is not common to any of the usual network growth models. Such asymmetry, which is also detected on the growth of the clustering and reciprocity coefficient, reveals that the trading mechanism is driven by different dynamics at the beginning and at the end of the day. We are able to recover the complexity of the system by means of a simple Agent Based Model in which the probability of matching between counter parties depends on a time varying vertex fitness (or attractiveness) describing banks liquidity needs. We show that temporal irreversibility is associated with heterogeneity in the banking system and emerges when the distribution of liquidity shocks across banks is broad. We acknowledge support from FET project FOC-II.
Modeling online social signed networks
NASA Astrophysics Data System (ADS)
Li, Le; Gu, Ke; Zeng, An; Fan, Ying; Di, Zengru
2018-04-01
People's online rating behavior can be modeled by user-object bipartite networks directly. However, few works have been devoted to reveal the hidden relations between users, especially from the perspective of signed networks. We analyze the signed monopartite networks projected by the signed user-object bipartite networks, finding that the networks are highly clustered with obvious community structure. Interestingly, the positive clustering coefficient is remarkably higher than the negative clustering coefficient. Then, a Signed Growing Network model (SGN) based on local preferential attachment is proposed to generate a user's signed network that has community structure and high positive clustering coefficient. Other structural properties of the modeled networks are also found to be similar to the empirical networks.
Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta
2017-01-01
Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic. PMID:28245222
Wu, Jibing; Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta
2017-01-01
Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic.
Using Agent Base Models to Optimize Large Scale Network for Large System Inventories
NASA Technical Reports Server (NTRS)
Shameldin, Ramez Ahmed; Bowling, Shannon R.
2010-01-01
The aim of this paper is to use Agent Base Models (ABM) to optimize large scale network handling capabilities for large system inventories and to implement strategies for the purpose of reducing capital expenses. The models used in this paper either use computational algorithms or procedure implementations developed by Matlab to simulate agent based models in a principal programming language and mathematical theory using clusters, these clusters work as a high performance computational performance to run the program in parallel computational. In both cases, a model is defined as compilation of a set of structures and processes assumed to underlie the behavior of a network system.
Cluster mass inference via random field theory.
Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D
2009-01-01
Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.
Eijkenaar, Frank; van Vliet, René C J A; van Kleef, Richard C
2018-01-01
The risk-equalization (RE) model in the Dutch health insurance market has evolved to a sophisticated model containing direct proxies for health. However, it still has important imperfections, leaving incentives for risk selection. This paper focuses on refining an important health-based risk-adjuster in this model: the diagnosis-based costs groups (DCGs). The current (2017) DCGs are calibrated on "old" data of 2011/2012, are mutually exclusive, and are essentially clusters of about 200 diagnosis-groups ("dxgroups"). Hospital claims data (2013), administrative data (2014) on costs and risk-characteristics for the entire Dutch population (N≈16.9 million), and health survey data (2012, N≈387,000) are used. The survey data are used to identify subgroups of individuals in poor or in good health. The claims and administrative data are used to develop alternative DCG-modalities to examine the impact on individual-level and group-level fit of recalibrating the DCGs based on new data, of allowing patients to be classified in multiple DCGs, and of refraining from clustering. Recalibrating the DCGs and allowing enrolees to be classified into multiple DCGs lead to nontrivial improvements in individual-level and group-level fit (especially for cancer patients and people with comorbid conditions). The improvement resulting from refraining from clustering does not seem to justify the increase in model complexity this would entail. The performance of the sophisticated Dutch RE-model can be improved by allowing classification in multiple (clustered) DCGs and using new data. Irrespective of the modality used, however, various subgroups remain significantly undercompensated. Further improvement of the RE-model merits high priority.
Update of membership and mean proper motion of open clusters from UCAC5 catalog
NASA Astrophysics Data System (ADS)
Dias, W. S.; Monteiro, H.; Assafin, M.
2018-06-01
We present mean proper motions and membership probabilities of individual stars for optically visible open clusters, which have been determined using data from the UCAC5 catalog. This follows our previous studies with the UCAC2 and UCAC4 catalogs, but now using improved proper motions in the GAIA reference frame. In the present study results were obtained for a sample of 1108 open clusters. For five clusters, this is the first determination of mean proper motion, and for the whole sample, we present results with a much larger number of identified astrometric member stars than on previous studies. It is the last update of our Open cluster Catalog based on proper motion data only. Future updates will count on astrometric, photometric and spectroscopic GAIA data as input for analyses.
Mateo, Rubén G.; Vanderpoorten, Alain; Muñoz, Jesús
2013-01-01
The definition of biogeographic regions provides a fundamental framework for a range of basic and applied questions in biogeography, evolutionary biology, systematics and conservation. Previous research suggested that environmental forcing results in highly congruent regionalization patterns across taxa, but that the size and number of regions depends on the dispersal ability of the taxa considered. We produced a biogeographic regionalization of European bryophytes and hypothesized that (1) regions defined for bryophytes would differ from those defined for other taxa due to the highly specific eco-physiology of the group and (2) their high dispersal ability would result in the resolution of few, large regions. Species distributions were recorded using 10,000 km2 MGRS pixels. Because of the lack of data across large portions of the area, species distribution models employing macroclimatic variables as predictors were used to determine the potential composition of empty pixels. K-means clustering analyses of the pixels based on their potential species composition were employed to define biogeographic regions. The optimal number of regions was determined by v-fold cross-validation and Moran’s I statistic. The spatial congruence of the regions identified from their potential bryophyte assemblages with large-scale vegetation patterns is at odds with our primary hypothesis. This reinforces the notion that post-glacial migration patterns might have been much more similar in bryophytes and vascular plants than previously thought. The substantially lower optimal number of clusters and the absence of nested patterns within the main biogeographic regions, as compared to identical analyses in vascular plants, support our second hypothesis. The modelling approach implemented here is, however, based on many assumptions that are discussed but can only be tested when additional data on species distributions become available, highlighting the substantial importance of developing integrated mapping projects for all taxa in key biogeographically areas of Europe, and the Mediterranean peninsulas in particular. PMID:23409015
The VLT LBG Redshift Survey - III. The clustering and dynamics of Lyman-break galaxies at z ˜ 3
NASA Astrophysics Data System (ADS)
Bielby, R.; Hill, M. D.; Shanks, T.; Crighton, N. H. M.; Infante, L.; Bornancini, C. G.; Francke, H.; Héraudeau, P.; Lambas, D. G.; Metcalfe, N.; Minniti, D.; Padilla, N.; Theuns, T.; Tummuangpak, P.; Weilbacher, P.
2013-03-01
We present a catalogue of 2135 galaxy redshifts from the VLT LBG Redshift Survey (VLRS), a spectroscopic survey of z ≈ 3 galaxies in wide fields centred on background quasi-stellar objects. We have used deep optical imaging to select galaxies via the Lyman-break technique. Spectroscopy of the Lyman-break galaxies (LBGs) was then made using the Very Large Telescope (VLT) Visible Multi-Object Spectrograph (VIMOS) instrument, giving a mean redshift of z = 2.79. We analyse the clustering properties of the VLRS sample and also of the VLRS sample combined with the smaller area Keck-based survey of Steidel et al. From the semiprojected correlation function, wp(σ), for the VLRS and combined surveys, we find that the results are well fit with a single power-law model, with clustering scale lengths of r0 = 3.46 ± 0.41 and 3.83 ± 0.24 h-1 Mpc, respectively. We note that the corresponding combined ξ(r) slope is flatter than for local galaxies at γ = 1.5-1.6 rather than γ = 1.8. This flat slope is confirmed by the z-space correlation function, ξ(s), and in the range 10 < s < 100 h-1 Mpc the VLRS shows an ≈2.5σ excess over the Λ cold dark matter (ΛCDM) linear prediction. This excess may be consistent with recent evidence for non-Gaussianity in clustering results at z ≈ 1. We then analyse the LBG z-space distortions using the 2D correlation function, ξ(σ, π), finding for the combined sample a large-scale infall parameter of β = 0.38 ± 0.19 and a velocity dispersion of sqrt{< w_z^2rangle }=420^{+140}_{-160} km s^{-1}. Based on our measured β, we are able to determine the gravitational growth rate, finding a value of f(z = 3) = 0.99 ± 0.50 (or fσ8 = 0.26 ± 0.13), which is the highest redshift measurement of the growth rate via galaxy clustering and is consistent with ΛCDM. Finally, we constrain the mean halo mass for the LBG population, finding that the VLRS and combined sample suggest mean halo masses of log(MDM/M⊙) = 11.57 ± 0.15 and 11.73 ± 0.07, respectively.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection
Liu, Wenfen
2017-01-01
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447
Analysis of Chromobacterium sp. natural isolates from different Brazilian ecosystems
Lima-Bittencourt, Cláudia I; Astolfi-Filho, Spartaco; Chartone-Souza, Edmar; Santos, Fabrício R; Nascimento, Andréa MA
2007-01-01
Background Chromobacterium violaceum is a free-living bacterium able to survive under diverse environmental conditions. In this study we evaluate the genetic and physiological diversity of Chromobacterium sp. isolates from three Brazilian ecosystems: Brazilian Savannah (Cerrado), Atlantic Rain Forest and Amazon Rain Forest. We have analyzed the diversity with molecular approaches (16S rRNA gene sequences and amplified ribosomal DNA restriction analysis) and phenotypic surveys of antibiotic resistance and biochemistry profiles. Results In general, the clusters based on physiological profiles included isolates from two or more geographical locations indicating that they are not restricted to a single ecosystem. The isolates from Brazilian Savannah presented greater physiologic diversity and their biochemical profile was the most variable of all groupings. The isolates recovered from Amazon and Atlantic Rain Forests presented the most similar biochemical characteristics to the Chromobacterium violaceum ATCC 12472 strain. Clusters based on biochemical profiles were congruent with clusters obtained by the 16S rRNA gene tree. According to the phylogenetic analyses, isolates from the Amazon Rain Forest and Savannah displayed a closer relationship to the Chromobacterium violaceum ATCC 12472. Furthermore, 16S rRNA gene tree revealed a good correlation between phylogenetic clustering and geographic origin. Conclusion The physiological analyses clearly demonstrate the high biochemical versatility found in the C. violaceum genome and molecular methods allowed to detect the intra and inter-population diversity of isolates from three Brazilian ecosystems. PMID:17584942
Pilhofer, Martin; Rappl, Kristina; Eckl, Christina; Bauer, Andreas Peter; Ludwig, Wolfgang; Schleifer, Karl-Heinz; Petroni, Giulio
2008-01-01
In the past, studies on the relationships of the bacterial phyla Planctomycetes, Chlamydiae, Lentisphaerae, and Verrucomicrobia using different phylogenetic markers have been controversial. Investigations based on 16S rRNA sequence analyses suggested a relationship of the four phyla, showing the branching order Planctomycetes, Chlamydiae, Verrucomicrobia/Lentisphaerae. Phylogenetic analyses of 23S rRNA genes in this study also support a monophyletic grouping and their branching order—this grouping is significant for understanding cell division, since the major bacterial cell division protein FtsZ is absent from members of two of the phyla Chlamydiae and Planctomycetes. In Verrucomicrobia, knowledge about cell division is mainly restricted to the recent report of ftsZ in the closely related genera Prosthecobacter and Verrucomicrobium. In this study, genes of the conserved division and cell wall (dcw) cluster (ddl, ftsQ, ftsA, and ftsZ) were characterized in all verrucomicrobial subdivisions (1 to 4) with cultivable representatives (1 to 4). Sequence analyses and transcriptional analyses in Verrucomicrobia and genome data analyses in Lentisphaerae suggested that cell division is based on FtsZ in all verrucomicrobial subdivisions and possibly also in the sister phylum Lentisphaerae. Comprehensive sequence analyses of available genome data for representatives of Verrucomicrobia, Lentisphaerae, Chlamydiae, and Planctomycetes strongly indicate that their last common ancestor possessed a conserved, ancestral type of dcw gene cluster and an FtsZ-based cell division mechanism. This implies that Planctomycetes and Chlamydiae may have shifted independently to a non-FtsZ-based cell division mechanism after their separate branchings from their last common ancestor with Verrucomicrobia. PMID:18310338
Gurkan-Cavusoglu, Evren; Avadhani, Sriya; Liu, Lili; Kinsella, Timothy J; Loparo, Kenneth A
2013-04-01
Base excision repair (BER) is a major DNA repair pathway involved in the processing of exogenous non-bulky base damages from certain classes of cancer chemotherapy drugs as well as ionising radiation (IR). Methoxyamine (MX) is a small molecule chemical inhibitor of BER that is shown to enhance chemotherapy and/or IR cytotoxicity in human cancers. In this study, the authors have analysed the inhibitory effect of MX on the BER pathway kinetics using a computational model of the repair pathway. The inhibitory effect of MX depends on the BER efficiency. The authors have generated variable efficiency groups using different sets of protein concentrations generated by Latin hypercube sampling, and they have clustered simulation results into high, medium and low efficiency repair groups. From analysis of the inhibitory effect of MX on each of the three groups, it is found that the inhibition is most effective for high efficiency BER, and least effective for low efficiency repair.
Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.
Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin
2018-05-03
Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.
NASA Astrophysics Data System (ADS)
Fulmer, Leah M.; Gallagher, John S.; Hamann, Wolf-Rainer; Oskinova, Lida; Ramachandran, Varsha
2018-01-01
The low-density Wing of the Small Magellanic Cloud exhibits ongoing, active star formation despite a distinctive lack of dense ambient gas and dust, or resources from which to form stars. Our continued work in studying this region reveals that these paradoxical observations may be explained by a process of sequential star formation. We present photometric, clustering, and spatial analyses in support of this scenario, along with a proposed star formation history based on the following evidence: matches to isochrone models, stellar and ionized gas kinematics (VLT, SALT), and regional HI gas kinematics (ATCA, PKS).
Leading indicators of community-based violent events among adults with mental illness.
Van Dorn, R A; Grimm, K J; Desmarais, S L; Tueller, S J; Johnson, K L; Swartz, M S
2017-05-01
The public health, public safety and clinical implications of violent events among adults with mental illness are significant; however, the causes and consequences of violence and victimization among adults with mental illness are complex and not well understood, which limits the effectiveness of clinical interventions and risk management strategies. This study examined interrelationships between violence, victimization, psychiatric symptoms, substance use, homelessness and in-patient treatment over time. Available data were integrated from four longitudinal studies of adults with mental illness. Assessments took place at baseline, and at 1, 3, 6, 9, 12, 15, 18, 24, 30 and 36 months, depending on the parent studies' protocol. Data were analysed with the autoregressive cross-lag model. Violence and victimization were leading indicators of each other and affective symptoms were a leading indicator of both. Drug and alcohol use were leading indicators of violence and victimization, respectively. All psychiatric symptom clusters - affective, positive, negative, disorganized cognitive processing - increased the likelihood of experiencing at least one subsequent symptom cluster. Sensitivity analyses identified few group-based differences in the magnitude of effects in this heterogeneous sample. Violent events demonstrated unique and shared indicators and consequences over time. Findings indicate mechanisms for reducing violent events, including trauma-informed therapy, targeting internalizing and externalizing affective symptoms with cognitive-behavioral and psychopharmacological interventions, and integrating substance use and psychiatric care. Finally, mental illness and violence and victimization research should move beyond demonstrating concomitant relationships and instead focus on lagged effects with improved spatio-temporal contiguity.
Gottscho, Andrew D.; Wood, Dustin A.; Vandergast, Amy; Lemos Espinal, Julio A.; Gatesy, John; Reeder, Tod
2017-01-01
Multi-locus nuclear DNA data were used to delimit species of fringe-toed lizards of theUma notata complex, which are specialized for living in wind-blown sand habitats in the deserts of southwestern North America, and to infer whether Quaternary glacial cycles or Tertiary geological events were important in shaping the historical biogeography of this group. We analyzed ten nuclear loci collected using Sanger sequencing and genome-wide sequence and single-nucleotide polymorphism (SNP) data collected using restriction-associated DNA (RAD) sequencing. A combination of species discovery methods (concatenated phylogenies, parametric and non-parametric clustering algorithms) and species validation approaches (coalescent-based species tree/isolation-with-migration models) were used to delimit species, infer phylogenetic relationships, and to estimate effective population sizes, migration rates, and speciation times. Uma notata, U. inornata, U. cowlesi, and an undescribed species from Mohawk Dunes, Arizona (U. sp.) were supported as distinct in the concatenated analyses and by clustering algorithms, and all operational taxonomic units were decisively supported as distinct species by ranking hierarchical nested speciation models with Bayes factors based on coalescent-based species tree methods. However, significant unidirectional gene flow (2NM >1) from U. cowlesi and U. notata into U. rufopunctata was detected under the isolation-with-migration model. Therefore, we conservatively delimit four species-level lineages within this complex (U. inornata, U. notata, U. cowlesi, and U. sp.), treating U. rufopunctata as a hybrid population (U. notata x cowlesi). Both concatenated and coalescent-based estimates of speciation times support the hypotheses that speciation within the complex occurred during the late Pleistocene, and that the geological evolution of the Colorado River delta during this period was an important process shaping the observed phylogeographic patterns.