Yokoyama, Eiji; Uchimura, Masako
2007-11-01
Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
Using Cluster Analysis to Examine Husband-Wife Decision Making
ERIC Educational Resources Information Center
Bonds-Raacke, Jennifer M.
2006-01-01
Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…
Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo
2017-12-01
To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard
2014-07-01
Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn
2014-07-15
Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). Conclusions: A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.« less
Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G
2017-05-01
The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.
EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.
Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina
2009-04-01
In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.
Stefurak, Tres; Calhoun, Georgia B
2007-01-01
The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.
Ecological tolerances of Miocene larger benthic foraminifera from Indonesia
NASA Astrophysics Data System (ADS)
Novak, Vibor; Renema, Willem
2018-01-01
To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.
ERIC Educational Resources Information Center
DiStefano, Christine; Kamphaus, R. W.
2006-01-01
Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials
Diaz-Ordaz, Karla; Bartlett, Jonathan W
2016-01-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.
Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W
2017-06-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
NASA Astrophysics Data System (ADS)
Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue
2017-08-01
Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.
Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.
Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut
2015-03-01
Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.
Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis
ERIC Educational Resources Information Center
de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.
2006-01-01
K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…
Cluster analysis in phenotyping a Portuguese population.
Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J
2015-09-03
Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.
Clusters of Occupations Based on Systematically Derived Work Dimensions: An Exploratory Study.
ERIC Educational Resources Information Center
Cunningham, J. W.; And Others
The study explored the feasibility of deriving an educationally relevant occupational cluster structure based on Occupational Analysis Inventory (OAI) work dimensions. A hierarchical cluster analysis was applied to the factor score profiles of 814 occupations on 22 higher-order OAI work dimensions. From that analysis, 73 occupational clusters were…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.
2014-02-14
Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less
Steenbergen, K G; Gaston, N
2014-02-14
Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.
Peterson, Leif E
2002-01-01
CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M
2015-05-01
To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2015-01-01
Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data
NASA Technical Reports Server (NTRS)
Wharton, S. W.; Turner, B. J.
1981-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Cross-scale analysis of cluster correspondence using different operational neighborhoods
NASA Astrophysics Data System (ADS)
Lu, Yongmei; Thill, Jean-Claude
2008-09-01
Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of multi-location events is examined for cluster correspondence if they are associated with the same operational neighborhood. This paper discusses the construction of operational neighborhoods for cluster correspondence analysis based on the spatial properties of the underlying zoning system and the scales at which the zones are aggregated into neighborhoods. Impacts of this construction on the degree of cluster correspondence are also analyzed. Empirical analyses of cluster correspondence between paired vehicle theft and recovery locations are conducted on different zoning methods and across a series of geographic scales and the dynamics of cluster correspondence patterns are discussed.
Chen, Shan; Li, Xiao-ning; Liang, Yi-zeng; Zhang, Zhi-min; Liu, Zhao-xia; Zhang, Qi-ming; Ding, Li-xia; Ye, Fei
2010-08-01
During Raman spectroscopy analysis, the organic molecules and contaminations will obscure or swamp Raman signals. The present study starts from Raman spectra of prednisone acetate tablets and glibenclamide tables, which are acquired from the BWTek i-Raman spectrometer. The background is corrected by R package baselineWavelet. Then principle component analysis and random forests are used to perform clustering analysis. Through analyzing the Raman spectra of two medicines, the accurate and validity of this background-correction algorithm is checked and the influences of fluorescence background on Raman spectra clustering analysis is discussed. Thus, it is concluded that it is important to correct fluorescence background for further analysis, and an effective background correction solution is provided for clustering or other analysis.
2010-01-01
Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Li, Hai-juan; Zhao, Xin; Jia, Qing-fei; Li, Tian-lai; Ning, Wei
2012-08-01
The achenes morphological and micro-morphological characteristics of six species of genus Taraxacum from northeastern China as well as SRAP cluster analysis were observed for their classification evidences. The achenes were observed by microscope and EPMA. Cluster analysis was given on the basis of the size, shape, cone proportion, color and surface sculpture of achenes. The Taraxacum inter-species achene shape characteristic difference is obvious, particularly spinulose distribution and size, achene color and achene size; with the Taraxacum plant achene shape the cluster method T. antungense Kitag. and the T. urbanum Kitag. should combine for the identical kind; the achene morphology cluster analysis and the SRAP tagged molecule systematics's cluster result retrieves in the table with "the Chinese flora". The class group to divide the result is consistent. Taraxacum plant achene shape characteristic stable conservative, may carry on the inter-species division and the sibship analysis according to the achene shape characteristic combination difference; the achene morphology cluster analysis as well as the SRAP tagged molecule systematics confirmation support dandelion classification result of "the Chinese flora".
A generalized analysis of hydrophobic and loop clusters within globular protein sequences
Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle
2007-01-01
Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.
2011-01-01
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
Glatman-Freedman, Aharona; Kaufman, Zalman; Kopel, Eran; Bassal, Ravit; Taran, Diana; Valinsky, Lea; Agmon, Vered; Shpriz, Manor; Cohen, Daniel; Anis, Emilia; Shohat, Tamy
2016-08-01
To enhance timely surveillance of bacterial enteric pathogens, space-time cluster analysis was introduced in Israel in May 2013. Stool isolation data of Salmonella, Shigella, and Campylobacter from patients of a large Health Maintenance Organization were analyzed weekly by ArcGIS and SaTScan, and cluster results were sent promptly to local departments of health (LDOHs). During eighteen months, we identified 52 Shigella sonnei clusters, two Salmonella clusters, and no Campylobacter clusters. S. sonnei clusters lasted from one to 33 days and included three to 30 individuals. Thirty-one (60%) of the S. sonnei clusters were known to LDOHs prior to cluster analysis. Clusters not previously known by the LDOHs prompted epidemiologic investigations. In 31 of the 37 (84%) confirmed clusters, educational institutes (nursery schools, kindergartens, and a primary school) were involved. Cluster analysis demonstrated capability to complement enteric disease surveillance. Scaling up the system can further enhance timely detection and control of outbreaks. Copyright © 2016 The British Infection Association. Published by Elsevier Ltd. All rights reserved.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitchell, John; Castillo, Andrew
2016-09-21
This software contains a set of python modules – input, search, cluster, analysis; these modules read input files containing spatial coordinates and associated attributes which can be used to perform nearest neighbor search (spatial indexing via kdtree), cluster analysis/identification, and calculation of spatial statistics for analysis.
Using cluster analysis to identify phenotypes and validation of mortality in men with COPD.
Chen, Chiung-Zuei; Wang, Liang-Yi; Ou, Chih-Ying; Lee, Cheng-Hung; Lin, Chien-Chung; Hsiue, Tzuen-Ren
2014-12-01
Cluster analysis has been proposed to examine phenotypic heterogeneity in chronic obstructive pulmonary disease (COPD). The aim of this study was to use cluster analysis to define COPD phenotypes and validate them by assessing their relationship with mortality. Male subjects with COPD were recruited to identify and validate COPD phenotypes. Seven variables were assessed for their relevance to COPD, age, FEV(1) % predicted, BMI, history of severe exacerbations, mMRC, SpO(2), and Charlson index. COPD groups were identified by cluster analysis and validated prospectively against mortality during a 4-year follow-up. Analysis of 332 COPD subjects identified five clusters from cluster A to cluster E. Assessment of the predictive validity of these clusters of COPD showed that cluster E patients had higher all cause mortality (HR 18.3, p < 0.0001), and respiratory cause mortality (HR 21.5, p < 0.0001) than those in the other four groups. Cluster E patients also had higher all cause mortality (HR 14.3, p = 0.0002) and respiratory cause mortality (HR 10.1, p = 0.0013) than patients in cluster D alone. COPD patient with severe airflow limitation, many symptoms, and a history of frequent severe exacerbations was a novel and distinct clinical phenotype predicting mortality in men with COPD.
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.
van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim
2017-01-01
In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
Redman, Regina S.; Ranson, Judith; Rodriguez, Rusty J.
2006-01-01
Cantharellus formosus growing on the Olympic Peninsula of the Pacific Northwest was sampled from September – November 1995 for genetic analysis. A total of ninety-six basidiomes from five clusters separated from one another by 3 - 25 meters were genetically characterized by PCR analysis of 13 arbitrary loci and rDNA sequences. The number of basidiomes in each cluster varied from 15 to 25 and genetic analysis delineated 15 genets among the clusters. Analysis of variance utilizing thirteen apPCR generated genetic molecular markers and PCR amplification of the ribosomal ITS regions indicated that 81.41% of the genetic variation occurred between clusters and 18.59% within clusters. Proximity of the basidiomes within a cluster was not an indicator of genotypic similarity. The molecular profiles of each cluster were distinct and defined as unique populations containing 2 - 6 genets. The monitoring and analysis of this species through non-lethal sampling and future applications is discussed.
A scoping review of spatial cluster analysis techniques for point-event data.
Fritz, Charles E; Schuurman, Nadine; Robertson, Colin; Lear, Scott
2013-05-01
Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians. In this research we conducted a scoping review to systematically search peer-reviewed journal databases for research that has employed spatial cluster analysis methods on individual-level, address location, or x and y coordinate derived data. To illustrate the thematic issues raised by our results, methods were tested using a dataset where known clusters existed. Point pattern methods, spatial clustering and cluster detection tests, and a locally weighted spatial regression model were most commonly used for individual-level, address location data (n = 29). The spatial scan statistic was the most popular method for address location data (n = 19). Six themes were identified relating to the application of spatial cluster analysis methods and subsequent analyses, which we recommend researchers to consider; exploratory analysis, visualization, spatial resolution, aetiology, scale and spatial weights. It is our intention that researchers seeking direction for using spatial cluster analysis methods, consider the caveats and strengths of each approach, but also explore the numerous other methods available for this type of analysis. Applied spatial epidemiology researchers and practitioners should give special consideration to applying multiple tests to a dataset. Future research should focus on developing frameworks for selecting appropriate methods and the corresponding spatial weighting schemes.
Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.
Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D
2017-06-01
Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.
Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O
2015-01-01
To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
clusterProfiler: an R package for comparing biological themes among gene clusters.
Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu
2012-05-01
Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.
ERIC Educational Resources Information Center
Viriyangkura, Yuwadee
2014-01-01
Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…
Simultaneous Two-Way Clustering of Multiple Correspondence Analysis
ERIC Educational Resources Information Center
Hwang, Heungsun; Dillon, William R.
2010-01-01
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…
Cluster Analysis of Minnesota School Districts. A Research Report.
ERIC Educational Resources Information Center
Cleary, James
The term "cluster analysis" refers to a set of statistical methods that classify entities with similar profiles of scores on a number of measured dimensions, in order to create empirically based typologies. A 1980 Minnesota House Research Report employed cluster analysis to categorize school districts according to their relative mixtures…
NASA Technical Reports Server (NTRS)
Hasler, Nicole; Bulbul, Esra; Bonamente, Massimiliano; Carlstrom, John E.; Culverhouse, Thomas L.; Gralla, Megan; Greer, Christopher; Lamb, James W.; Hawkins, David; Hennessy, Ryan;
2012-01-01
We perform a joint analysis of X-ray and Sunyaev-Zel'dovich effect data using an analytic model that describes the gas properties of galaxy clusters. The joint analysis allows the measurement of the cluster gas mass fraction profile and Hubble constant independent of cosmological parameters. Weak cosmological priors are used to calculate the overdensity radius within which the gas mass fractions are reported. Such an analysis can provide direct constraints on the evolution of the cluster gas mass fraction with redshift. We validate the model and the joint analysis on high signal-to-noise data from the Chandra X-ray Observatory and the Sunyaev-Zel'dovich Array for two clusters, A2631 and A2204.
2014-01-01
Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591
A Cluster of Legionella-Associated Pneumonia Cases in a Population of Military Recruits
2007-06-01
this cluster may suggest a previously unrecognized suscep- FIG. 1. Phylogenic analysis of the training center strain (represented by the MCRD consensus...military recruits during population- based surveillance for pneumonia pathogens. Results were confirmed by sequence analysis . Cases cluster tightly...17 April 2007 A Legionella cluster was identified through retrospective PCR analysis of 240 throat swab samples from X-ray-confirmed pneumonia cases
Rennard, Stephen I; Locantore, Nicholas; Delafont, Bruno; Tal-Singer, Ruth; Silverman, Edwin K; Vestbo, Jørgen; Miller, Bruce E; Bakke, Per; Celli, Bartolomé; Calverley, Peter M A; Coxson, Harvey; Crim, Courtney; Edwards, Lisa D; Lomas, David A; MacNee, William; Wouters, Emiel F M; Yates, Julie C; Coca, Ignacio; Agustí, Alvar
2015-03-01
Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease that likely includes clinically relevant subgroups. To identify subgroups of COPD in ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) subjects using cluster analysis and to assess clinically meaningful outcomes of the clusters during 3 years of longitudinal follow-up. Factor analysis was used to reduce 41 variables determined at recruitment in 2,164 patients with COPD to 13 main factors, and the variables with the highest loading were used for cluster analysis. Clusters were evaluated for their relationship with clinically meaningful outcomes during 3 years of follow-up. The relationships among clinical parameters were evaluated within clusters. Five subgroups were distinguished using cross-sectional clinical features. These groups differed regarding outcomes. Cluster A included patients with milder disease and had fewer deaths and hospitalizations. Cluster B had less systemic inflammation at baseline but had notable changes in health status and emphysema extent. Cluster C had many comorbidities, evidence of systemic inflammation, and the highest mortality. Cluster D had low FEV1, severe emphysema, and the highest exacerbation and COPD hospitalization rate. Cluster E was intermediate for most variables and may represent a mixed group that includes further clusters. The relationships among clinical variables within clusters differed from that in the entire COPD population. Cluster analysis using baseline data in ECLIPSE identified five COPD subgroups that differ in outcomes and inflammatory biomarkers and show different relationships between clinical parameters, suggesting the clusters represent clinically and biologically different subtypes of COPD.
Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.
Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik
2017-11-01
Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease
Somatotyping using 3D anthropometry: a cluster analysis.
Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur
2013-01-01
Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.
Psychosocial Costs of Racism to Whites: Exploring Patterns through Cluster Analysis
ERIC Educational Resources Information Center
Spanierman, Lisa B.; Poteat, V. Paul; Beer, Amanda M.; Armstrong, Patrick Ian
2006-01-01
Participants (230 White college students) completed the Psychosocial Costs of Racism to Whites (PCRW) Scale. Using cluster analysis, we identified 5 distinct cluster groups on the basis of PCRW subscale scores: the unempathic and unaware cluster contained the lowest empathy scores; the insensitive and afraid cluster consisted of low empathy and…
Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.
Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun
2017-12-01
Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.
Cluster Correspondence Analysis.
van de Velden, M; D'Enza, A Iodice; Palumbo, F
2017-03-01
A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh
2013-11-30
Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less
A Survey of Popular R Packages for Cluster Analysis
ERIC Educational Resources Information Center
Flynt, Abby; Dean, Nema
2016-01-01
Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…
Using Cluster Analysis for Data Mining in Educational Technology Research
ERIC Educational Resources Information Center
Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.
2012-01-01
Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…
Cluster Analysis of the Luria-Nebraska Neuropsychological Battery with Learning Disabled Adults.
ERIC Educational Resources Information Center
McCue, Michael; And Others
The study reports a cluster analysis of Luria-Nebraska Neuropsychological Battery sources of 25 learning disabled adults. The cluster analysis suggested the presence of three subgroups within this sample, one having high elevations on the Rhythm, Writing, Reading, and Arithmetic Rhythm scales, the second having an extremely high evelation on the…
Network Analysis Tools: from biological networks to clusters and pathways.
Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques
2008-01-01
Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.
Is It Feasible to Identify Natural Clusters of TSC-Associated Neuropsychiatric Disorders (TAND)?
Leclezio, Loren; Gardner-Lubbe, Sugnet; de Vries, Petrus J
2018-04-01
Tuberous sclerosis complex (TSC) is a genetic disorder with multisystem involvement. The lifetime prevalence of TSC-Associated Neuropsychiatric Disorders (TAND) is in the region of 90% in an apparently unique, individual pattern. This "uniqueness" poses significant challenges for diagnosis, psycho-education, and intervention planning. To date, no studies have explored whether there may be natural clusters of TAND. The purpose of this feasibility study was (1) to investigate the practicability of identifying natural TAND clusters, and (2) to identify appropriate multivariate data analysis techniques for larger-scale studies. TAND Checklist data were collected from 56 individuals with a clinical diagnosis of TSC (n = 20 from South Africa; n = 36 from Australia). Using R, the open-source statistical platform, mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Ward's method rendered six TAND clusters with good face validity and significant convergence with a six-factor exploratory factor analysis solution. The "bottom-up" data-driven strategies identified a "scholastic" cluster of TAND manifestations, an "autism spectrum disorder-like" cluster, a "dysregulated behavior" cluster, a "neuropsychological" cluster, a "hyperactive/impulsive" cluster, and a "mixed/mood" cluster. These feasibility results suggest that a combination of cluster analysis and exploratory factor analysis methods may be able to identify clinically meaningful natural TAND clusters. Findings require replication and expansion in larger dataset, and could include quantification of cluster or factor scores at an individual level. Copyright © 2018 Elsevier Inc. All rights reserved.
Cluster analysis of the hot subdwarfs in the PG survey
NASA Technical Reports Server (NTRS)
Thejll, Peter; Charache, Darryl; Shipman, Harry L.
1989-01-01
Application of cluster analysis to the hot subdwarfs in the Palomar Green (PG) survey of faint blue high-Galactic-latitude objects is assessed, with emphasis on data noise and the number of clusters to subdivide the data into. The data used in the study are presented, and cluster analysis, using the CLUSTAN program, is applied to it. Distances are calculated using the Euclidean formula, and clustering is done by Ward's method. The results are discussed, and five groups representing natural divisions of the subdwarfs in the PG survey are presented.
Using Machine Learning Techniques in the Analysis of Oceanographic Data
NASA Astrophysics Data System (ADS)
Falcinelli, K. E.; Abuomar, S.
2017-12-01
Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.
Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.
2015-01-01
Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
NASA Technical Reports Server (NTRS)
Fomenkova, M. N.
1997-01-01
The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.
Performance analysis of clustering techniques over microarray data: A case study
NASA Astrophysics Data System (ADS)
Dash, Rasmita; Misra, Bijan Bihari
2018-03-01
Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
Leung, S C; Fung, W K; Wong, K H
1999-01-01
The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.
Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang
2017-01-01
Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John
2015-01-01
Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700
Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.
Williams, N J; Nasuto, S J; Saddy, J D
2015-07-30
The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.
Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun
2015-01-01
Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.
Interactive visual exploration and refinement of cluster assignments.
Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R
2017-09-12
With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.
Spatiotemporal Analysis of the Ebola Hemorrhagic Fever in West Africa in 2014
NASA Astrophysics Data System (ADS)
Xu, M.; Cao, C. X.; Guo, H. F.
2017-09-01
Ebola hemorrhagic fever (EHF) is an acute hemorrhagic diseases caused by the Ebola virus, which is highly contagious. This paper aimed to explore the possible gathering area of EHF cases in West Africa in 2014, and identify endemic areas and their tendency by means of time-space analysis. We mapped distribution of EHF incidences and explored statistically significant space, time and space-time disease clusters. We utilized hotspot analysis to find the spatial clustering pattern on the basis of the actual outbreak cases. spatial-temporal cluster analysis is used to analyze the spatial or temporal distribution of agglomeration disease, examine whether its distribution is statistically significant. Local clusters were investigated using Kulldorff's scan statistic approach. The result reveals that the epidemic mainly gathered in the western part of Africa near north Atlantic with obvious regional distribution. For the current epidemic, we have found areas in high incidence of EVD by means of spatial cluster analysis.
Won, Jong Chul; Im, Yong-Jin; Lee, Ji-Hyun; Kim, Chong Hwa; Kwon, Hyuk Sang; Cha, Bong-Yun; Park, Tae Sun
2017-01-01
Patients with diabetic peripheral neuropathy (DPN) is the most common complication. However, patients are usually suffering from not only diverse sensory deficit but also neuropathy-related discomforts. The aim of this study is to identify distinct groups of patients with DPN with respect to its clinical impacts on symptom patterns and comorbidities. A hierarchical cluster analysis and factor analysis were performed to identify relevant subgroups of patients with DPN ( n = 1338) and symptom patterns. Patients with DPN were divided into three clusters: asymptomatic (cluster 1, n = 448, 33.5%), moderate symptoms with disturbed sleep (cluster 2, n = 562, 42.0%), and severe symptoms with decreased quality of life (cluster 3, n = 328, 24.5%). Patients in cluster 3, compared with clusters 1 and 2, were characterized by higher levels of HbA1c and more severe pain and physical impairments. Patients in cluster 2 had moderate pain levels but disturbed sleep patterns comparable to those in cluster 3. The frequency of symptoms on each item of MNSI by "painful" symptom pattern showed a similar distribution pattern with increasing intensities along the three clusters. Cluster and factor analysis endorsed the use of comprehensive and symptomatic subgrouping to individualize the evaluation of patients with DPN.
Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis.
Moore, Wendy C; Hastie, Annette T; Li, Xingnan; Li, Huashi; Busse, William W; Jarjour, Nizar N; Wenzel, Sally E; Peters, Stephen P; Meyers, Deborah A; Bleecker, Eugene R
2014-06-01
Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified 5 asthma subphenotypes that represent the severity spectrum of early-onset allergic asthma, late-onset severe asthma, and severe asthma with chronic obstructive pulmonary disease characteristics. Analysis of induced sputum from a subset of SARP subjects showed 4 sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophil (≥2%) and neutrophil (≥40%) percentages had characteristics of very severe asthma. To better understand interactions between inflammation and clinical subphenotypes, we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Participants in SARP who underwent sputum induction at 3 clinical sites were included in this analysis (n = 423). Fifteen variables, including clinical characteristics and blood and sputum inflammatory cell assessments, were selected using factor analysis for unsupervised cluster analysis. Four phenotypic clusters were identified. Cluster A (n = 132) and B (n = 127) subjects had mild-to-moderate early-onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of cluster C (n = 117) and D (n = 47) subjects who had moderate-to-severe asthma with frequent health care use despite treatment with high doses of inhaled or oral corticosteroids and, in cluster D, reduced lung function. The majority of these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophil percentages were the most important variables determining cluster assignment. This multivariate approach identified 4 asthma subphenotypes representing the severity spectrum from mild-to-moderate allergic asthma with minimal or eosinophil-predominant sputum inflammation to moderate-to-severe asthma with neutrophil-predominant or mixed granulocytic inflammation. Published by Mosby, Inc.
Sputum neutrophils are associated with more severe asthma phenotypes using cluster analysis
Moore, Wendy C.; Hastie, Annette T.; Li, Xingnan; Li, Huashi; Busse, William W.; Jarjour, Nizar N.; Wenzel, Sally E.; Peters, Stephen P.; Meyers, Deborah A.; Bleecker, Eugene R.
2013-01-01
Background Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified five asthma subphenotypes that represent the severity spectrum of early onset allergic asthma, late onset severe asthma and severe asthma with COPD characteristics. Analysis of induced sputum from a subset of SARP subjects showed four sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophils (≥2%) and neutrophils (≥40%) had characteristics of very severe asthma. Objective To better understand interactions between inflammation and clinical subphenotypes we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Methods Participants in SARP at three clinical sites who underwent sputum induction were included in this analysis (n=423). Fifteen variables including clinical characteristics and blood and sputum inflammatory cell assessments were selected by factor analysis for unsupervised cluster analysis. Results Four phenotypic clusters were identified. Cluster A (n=132) and B (n=127) subjects had mild-moderate early onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of Cluster C (n=117) and D (n=47) subjects who had moderate-severe asthma with frequent health care utilization despite treatment with high doses of inhaled or oral corticosteroids, and in Cluster D, reduced lung function. The majority these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophils were the most important variables determining cluster assignment. Conclusion This multivariate approach identified four asthma subphenotypes representing the severity spectrum from mild-moderate allergic asthma with minimal or eosinophilic predominant sputum inflammation to moderate-severe asthma with neutrophilic predominant or mixed granulocytic inflammation. PMID:24332216
Clustering analysis for muon tomography data elaboration in the Muon Portal project
NASA Astrophysics Data System (ADS)
Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.
2015-05-01
Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.
Mixture modelling for cluster analysis.
McLachlan, G J; Chang, S U
2004-10-01
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Aikawa, Ken; Kataoka, Masao; Ogawa, Soichiro; Akaihata, Hidenori; Sato, Yuichi; Yabe, Michihiro; Hata, Junya; Koguchi, Tomoyuki; Kojima, Yoshiyuki; Shiragasawa, Chihaya; Kobayashi, Toshimitsu; Yamaguchi, Osamu
2015-08-01
To present a new grouping of male patients with lower urinary tract symptoms (LUTS) based on symptom patterns and clarify whether the therapeutic effect of α1-blocker differs among the groups. We performed secondary analysis of anonymous data from 4815 patients enrolled in a postmarketing surveillance study of tamsulosin in Japan. Data on 7 International Prostate Symptom Score (IPSS) items at the initial visit were used in the cluster analysis. IPSS and quality of life (QOL) scores before and after tamsulosin treatment for 12 weeks were assessed in each cluster. Partial correlation coefficients were also obtained for IPSS and QOL scores based on changes before and after treatment. Five symptom groups were identified by cluster analysis of IPSS. On their symptom profile, each cluster was labeled as minimal type (cluster 1), multiple severe type (cluster 2), weak stream type (cluster 3), storage type (cluster 4), and voiding type (cluster 5). Prevalence and the mean symptom score were significantly improved in almost all symptoms in all clusters by tamsulosin treatment. Nocturia and weak stream had the strongest effect on QOL in clusters 1, 2, and 4 and clusters 3 and 5, respectively. The study clarified that 5 characteristic symptom patterns exist by cluster analysis of IPSS in male patients with LUTS. Tamsulosin improved various symptoms and QOL in each symptom group. The study reports many male patients with LUTS being satisfied with monotherapy using tamsulosin and suggests the usefulness of α1-blockers as a drug of first choice. Copyright © 2015 Elsevier Inc. All rights reserved.
The detection methods of dynamic objects
NASA Astrophysics Data System (ADS)
Knyazev, N. L.; Denisova, L. A.
2018-01-01
The article deals with the application of cluster analysis methods for solving the task of aircraft detection on the basis of distribution of navigation parameters selection into groups (clusters). The modified method of cluster analysis for search and detection of objects and then iterative combining in clusters with the subsequent count of their quantity for increase in accuracy of the aircraft detection have been suggested. The course of the method operation and the features of implementation have been considered. In the conclusion the noted efficiency of the offered method for exact cluster analysis for finding targets has been shown.
Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun
2014-04-29
To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
ERIC Educational Resources Information Center
Brown, S. J.; White, S.; Power, N.
2015-01-01
A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…
ERIC Educational Resources Information Center
van der Kloot, Willem A.; Spaans, Alexander M. J.; Heiser, Willem J.
2005-01-01
Hierarchical agglomerative cluster analysis (HACA) may yield different solutions under permutations of the input order of the data. This instability is caused by ties, either in the initial proximity matrix or arising during agglomeration. The authors recommend to repeat the analysis on a large number of random permutations of the rows and columns…
Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.
Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço
2017-11-01
The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.
ERIC Educational Resources Information Center
Ho, Hsuan-Fu; Hung, Chia-Chi
2008-01-01
Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…
NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.
Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques
2008-07-01
The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.
Identification and characterization of near-fatal asthma phenotypes by cluster analysis.
Serrano-Pariente, J; Rodrigo, G; Fiz, J A; Crespo, A; Plaza, V
2015-09-01
Near-fatal asthma (NFA) is a heterogeneous clinical entity and several profiles of patients have been described according to different clinical, pathophysiological and histological features. However, there are no previous studies that identify in a unbiased way--using statistical methods such as clusters analysis--different phenotypes of NFA. Therefore, the aim of the present study was to identify and to characterize phenotypes of near fatal asthma using a cluster analysis. Over a period of 2 years, 33 Spanish hospitals enrolled 179 asthmatics admitted for an episode of NFA. A cluster analysis using two-steps algorithm was performed from data of 84 of these cases. The analysis defined three clusters of patients with NFA: cluster 1, the largest, including older patients with clinical and therapeutic criteria of severe asthma; cluster 2, with an high proportion of respiratory arrest (68%), impaired consciousness level (82%) and mechanical ventilation (93%); and cluster 3, which included younger patients, characterized by an insufficient anti-inflammatory treatment and frequent sensitization to Alternaria alternata and soybean. These results identify specific asthma phenotypes involved in NFA, confirming in part previous findings observed in studies with a clinical approach. The identification of patients with a specific NFA phenotype could suggest interventions to prevent future severe asthma exacerbations. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour
2018-01-12
Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.
The Use of Cluster Analysis in Typological Research on Community College Students
ERIC Educational Resources Information Center
Bahr, Peter Riley; Bielby, Rob; House, Emily
2011-01-01
One useful and increasingly popular method of classifying students is known commonly as cluster analysis. The variety of techniques that comprise the cluster analytic family are intended to sort observations (for example, students) within a data set into subsets (clusters) that share similar characteristics and differ in meaningful ways from other…
[Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].
Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés
2018-03-06
To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
2008-05-12
The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John
2015-09-01
We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, (P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.
Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.
2016-01-01
Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969
Gonzalez, Robert; Suppes, Trisha; Zeitzer, Jamie; McClung, Colleen; Tamminga, Carol; Tohen, Mauricio; Forero, Angelica; Dwivedi, Alok; Alvarado, Andres
2018-02-19
Multiple types of chronobiological disturbances have been reported in bipolar disorder, including characteristics associated with general activity levels, sleep, and rhythmicity. Previous studies have focused on examining the individual relationships between affective state and chronobiological characteristics. The aim of this study was to conduct a variable cluster analysis in order to ascertain how mood states are associated with chronobiological traits in bipolar I disorder (BDI). We hypothesized that manic symptomatology would be associated with disturbances of rhythm. Variable cluster analysis identified five chronobiological clusters in 105 BDI subjects. Cluster 1, comprising subjective sleep quality was associated with both mania and depression. Cluster 2, which comprised variables describing the degree of rhythmicity, was associated with mania. Significant associations between mood state and cluster analysis-identified chronobiological variables were noted. Disturbances of mood were associated with subjectively assessed sleep disturbances as opposed to objectively determined, actigraphy-based sleep variables. No associations with general activity variables were noted. Relationships between gender and medication classes in use and cluster analysis-identified chronobiological characteristics were noted. Exploratory analyses noted that medication class had a larger impact on these relationships than the number of psychiatric medications in use. In a BDI sample, variable cluster analysis was able to group related chronobiological variables. The results support our primary hypothesis that mood state, particularly mania, is associated with chronobiological disturbances. Further research is required in order to define these relationships and to determine the directionality of the associations between mood state and chronobiological characteristics.
Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G
2015-10-01
Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.
Elemental Abundances in the Intracluster Gas and the Hot Galactic Coronae in Cluster A194
NASA Technical Reports Server (NTRS)
Forman, William R.
1997-01-01
We have completed the analysis of observations of the Coma cluster and are continuing analysis of A1367 both of which are shown to be merging clusters. Also, we are analyzing observations of the Centaurus cluster which we see as a merger based in both its temperature and surface brightness distributions. Attachment: Another collision for the coma cluster.
Weller, Claudia M; Wilbrink, Leopoldine A; Houwing-Duistermaat, Jeanine J; Koelewijn, Stephany C; Vijfhuizen, Lisanne S; Haan, Joost; Ferrari, Michel D; Terwindt, Gisela M; van den Maagdenberg, Arn M J M; de Vries, Boukje
2015-08-01
Cluster headache is a severe neurological disorder with a complex genetic background. A missense single nucleotide polymorphism (rs2653349; p.Ile308Val) in the HCRTR2 gene that encodes the hypocretin receptor 2 is the only genetic factor that is reported to be associated with cluster headache in different studies. However, as there are conflicting results between studies, we re-evaluated its role in cluster headache. We performed a genetic association analysis for rs2653349 in our large Leiden University Cluster headache Analysis (LUCA) program study population. Systematic selection of the literature yielded three additional studies comprising five study populations, which were included in our meta-analysis. Data were extracted according to predefined criteria. A total of 575 cluster headache patients from our LUCA study and 874 controls were genotyped for HCRTR2 SNP rs2653349 but no significant association with cluster headache was found (odds ratio 0.91 (95% confidence intervals 0.75-1.10), p = 0.319). In contrast, the meta-analysis that included in total 1167 cluster headache cases and 1618 controls from the six study populations, which were part of four different studies, showed association of the single nucleotide polymorphism with cluster headache (random effect odds ratio 0.69 (95% confidence intervals 0.53-0.90), p = 0.006). The association became weaker, as the odds ratio increased to 0.80, when the meta-analysis was repeated without the initial single South European study with the largest effect size. Although we did not find evidence for association of rs2653349 in our LUCA study, which is the largest investigated study population thus far, our meta-analysis provides genetic evidence for a role of HCRTR2 in cluster headache. Regardless, we feel that the association should be interpreted with caution as meta-analyses with individual populations that have limited power have diminished validity. © International Headache Society 2014.
Vigre, Håkan; Domingues, Ana Rita Coutinho Calado; Pedersen, Ulrik Bo; Hald, Tine
2016-03-01
The aim of the project as the cluster analysis was to in part to develop a generic structured quantitative microbiological risk assessment (QMRA) model of human salmonellosis due to pork consumption in EU member states (MSs), and the objective of the cluster analysis was to group the EU MSs according to the relative contribution of different pathways of Salmonella in the farm-to-consumption chain of pork products. In the development of the model, by selecting a case study MS from each cluster the model was developed to represent different aspects of pig production, pork production, and consumption of pork products across EU states. The objective of the cluster analysis was to aggregate MSs into groups of countries with similar importance of different pathways of Salmonella in the farm-to-consumption chain using available, and where possible, universal register data related to the pork production and consumption in each country. Based on MS-specific information about distribution of (i) small and large farms, (ii) small and large slaughterhouses, (iii) amount of pork meat consumed, and (iv) amount of sausages consumed we used nonhierarchical and hierarchical cluster analysis to group the MSs. The cluster solutions were validated internally using statistic measures and externally by comparing the clustered MSs with an estimated human incidence of salmonellosis due to pork products in the MSs. Finally, each cluster was characterized qualitatively using the centroids of the clusters. © 2016 Society for Risk Analysis.
Unsupervised analysis of small animal dynamic Cerenkov luminescence imaging
NASA Astrophysics Data System (ADS)
Spinelli, Antonello E.; Boschi, Federico
2011-12-01
Clustering analysis (CA) and principal component analysis (PCA) were applied to dynamic Cerenkov luminescence images (dCLI). In order to investigate the performances of the proposed approaches, two distinct dynamic data sets obtained by injecting mice with 32P-ATP and 18F-FDG were acquired using the IVIS 200 optical imager. The k-means clustering algorithm has been applied to dCLI and was implemented using interactive data language 8.1. We show that cluster analysis allows us to obtain good agreement between the clustered and the corresponding emission regions like the bladder, the liver, and the tumor. We also show a good correspondence between the time activity curves of the different regions obtained by using CA and manual region of interest analysis on dCLIT and PCA images. We conclude that CA provides an automatic unsupervised method for the analysis of preclinical dynamic Cerenkov luminescence image data.
Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat
2014-07-01
Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.
Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.
2016-01-01
Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q-EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Conclusions: Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796
Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R
2016-11-01
To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q -EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.
McKenna, J.E.
2003-01-01
The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2012-05-04
Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phasemore » utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.« less
Another collision for the Coma cluster
NASA Technical Reports Server (NTRS)
Vikhlinin, A.; Forman, W.; Jones, C.
1996-01-01
The wavelet transform analysis of the Rosat position sensitive proportional counter (PSPC) images of the Coma cluster are presented. The analysis shows, on small scales, a substructure dominated by two extended sources surrounding the two bright clusters NGC 4874 and NGC 4889. On scales of about 2 arcmin to 3 arcmin, the analysis reveals a tail of X-ray emission originating near the cluster center, curving to the south and east for approximately 25 arcmin and ending near the galaxy NGC 4911. The results are interpreted in terms of a merger of a group, having a core mass of approximately 10(exp 13) solar mass, with the main body of the Coma cluster.
Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data
NASA Astrophysics Data System (ADS)
Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.
2014-12-01
We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.
X-ray and optical substructures of the DAFT/FADA survey clusters
NASA Astrophysics Data System (ADS)
Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.
2013-04-01
We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.
Romay-Tallon, Raquel; Rivera-Baltanas, Tania; Allen, Josh; Olivares, Jose M; Kalynchuk, Lisa E; Caruncho, Hector J
2017-01-01
The pattern of serotonin transporter clustering on the plasma membrane of lymphocytes extracted from human whole blood samples has been identified as a putative biomarker of therapeutic efficacy in major depression. Here we evaluated the possibility of performing a similar analysis using blood smears obtained from rats, and from control human subjects and depression patients. We hypothesized that we could optimize a protocol to make the analysis of serotonin protein clustering in blood smears comparable to the analysis of serotonin protein clustering using isolated lymphocytes. Our data indicate that blood smears require a longer fixation time and longer times of incubation with primary and secondary antibodies. In addition, one needs to optimize the image analysis settings for the analysis of smears. When these steps are followed, the quantitative analysis of both the number and size of serotonin transporter clusters on the plasma membrane of lymphocytes is similar using both blood smears and isolated lymphocytes. The development of this novel protocol will greatly facilitate the collection of appropriate samples by eliminating the necessity and cost of specialized personnel for drawing blood samples, and by being a less invasive procedure. Therefore, this protocol will help us advance the validation of membrane protein clustering in lymphocytes as a biomarker of therapeutic efficacy in major depression, and bring it closer to its clinical application.
Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M
2016-09-01
To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire(SGRQ) score, acute exacerbation in the past one year, PEF variability and allergic dermatitis (P<0.05). (2) Four clusters were also identified by two-step cluster analysis as followings, cluster 1, COPD patients with moderate to severe airflow limitation; cluster 2, asthma and COPD patients with heavy smoking, airflow limitation and increased airways reversibility; cluster 3, patients having less smoking and normal pulmonary function with wheezing but no chronic cough; cluster 4, chronic bronchitis patients with normal pulmonary function and chronic cough. Significant differences were revealed regarding gender distribution, respiratory symptoms, pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, MMEF% pred, DLCO/VA% pred, RV% pred, PEF variability, total serum IgE level, cumulative tobacco cigarette consumption (pack-years), and SGRQ score (P<0.05). By different cluster analyses, distinct clinical phenotypes of chronic airway diseases are identified. Thus, individualized treatments may guide doctors to provide based on different phenotypes.
Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations
NASA Astrophysics Data System (ADS)
Žibert, Janez; Pražnikar, Jure
2012-09-01
The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of the diurnal concentration showed that various different day-profiles are presented in a cold period, while this is not the case for the hot season. Additional analysis of ship traffic and rain fall data showed that there is no statistically significant difference between the ship gross (bruto) registered tonnage (BRT) values in the case of BC and PM10 clusters, but that there is statistically significant differences between the rain fall in the BC and PM10 clusters. The wind-rose for clusters which included most days in the sampling period indicating that emitted PM10 and BC from Port of Koper were manly transported in the west direction over the sea and in the east direction, where there is in no populated area. Presented analysis showed that both BC and PM10 concentrations were driven by rain intensity and wind speed.
Batch Computed Tomography Analysis of Projectiles
2016-05-01
error calculation. Projectiles are then grouped together according to the similarity of their components. Also discussed is graphical- cluster analysis...ballistic, armor, grouping, clustering 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF...Fig. 10 Graphical structure of 15 clusters of the jacket/core radii profiles with plots of the profiles contained within each cluster . The size of
ERIC Educational Resources Information Center
Raker, Jeffrey R.; Holme, Thomas A.
2014-01-01
A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…
Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.
Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B
2017-11-01
Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Kerr, Deirdre; Chung, Gregory K. W. K.; Iseli, Markus R.
2011-01-01
Analyzing log data from educational video games has proven to be a challenging endeavor. In this paper, we examine the feasibility of using cluster analysis to extract information from the log files that is interpretable in both the context of the game and the context of the subject area. If cluster analysis can be used to identify patterns of…
Dias, Claudia; Mendes, Luís
2018-01-01
Despite the importance of the literature on food quality labels in the European Union (PDO, PGI and TSG), our search did not find any review joining the various research topics on this subject. This study aims therefore to consolidate the state of academic research in this field, and so the methodological option was to elaborate a bibliometric analysis resorting to the term co-occurrence technique. Analysis was made of 501 articles on the ISI Web of Science database, covering publications up to 2016. The results of the bibliometric analysis allowed identification of four clusters: "Protected Geographical Indication", "Certification of Olive Oil and Cultivars", "Certification of Cheese and Milk" and "Certification and Chemical Composition". Unlike the other clusters, where the PDO label predominates, the "Protected Geographical Indication" cluster covers the study of PGI products, highlighting analysis of consumer behaviour in relation to this type of product. The focus of studies in the "Certification of Olive Oil and Cultivars" cluster and the "Certification of Cheese and Milk" cluster is the development of authentication methods for certified traditional products. In the "Certification and Chemical Composition" cluster, standing out is analysis of the profiles of fatty acids present in this type of product. Copyright © 2017 Elsevier Ltd. All rights reserved.
Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.
Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo
2018-07-01
Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.
Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart
2016-03-01
In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.
Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart
2016-01-01
In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882
Cluster analysis and prediction of treatment outcomes for chronic rhinosinusitis.
Soler, Zachary M; Hyer, J Madison; Rudmik, Luke; Ramakrishnan, Viswanathan; Smith, Timothy L; Schlosser, Rodney J
2016-04-01
Current clinical classifications of chronic rhinosinusitis (CRS) have weak prognostic utility regarding treatment outcomes. Simplified discriminant analysis based on unsupervised clustering has identified novel phenotypic subgroups of CRS, but prognostic utility is unknown. We sought to determine whether discriminant analysis allows prognostication in patients choosing surgery versus continued medical management. A multi-institutional prospective study of patients with CRS in whom initial medical therapy failed who then self-selected continued medical management or surgical treatment was used to separate patients into 5 clusters based on a previously described discriminant analysis using total Sino-Nasal Outcome Test-22 (SNOT-22) score, age, and missed productivity. Patients completed the SNOT-22 at baseline and for 18 months of follow-up. Baseline demographic and objective measures included olfactory testing, computed tomography, and endoscopy scoring. SNOT-22 outcomes for surgical versus continued medical treatment were compared across clusters. Data were available on 690 patients. Baseline differences in demographics, comorbidities, objective disease measures, and patient-reported outcomes were similar to previous clustering reports. Three of 5 clusters identified by means of discriminant analysis had improved SNOT-22 outcomes with surgical intervention when compared with continued medical management (surgery was a mean of 21.2 points better across these 3 clusters at 6 months, P < .05). These differences were sustained at 18 months of follow-up. Two of 5 clusters had similar outcomes when comparing surgery with continued medical management. A simplified discriminant analysis based on 3 common clinical variables is able to cluster patients and provide prognostic information regarding surgical treatment versus continued medical management in patients with CRS. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Description and typology of intensive Chios dairy sheep farms in Greece.
Gelasakis, A I; Valergakis, G E; Arsenos, G; Banos, G
2012-06-01
The aim was to assess the intensified dairy sheep farming systems of the Chios breed in Greece, establishing a typology that may properly describe and characterize them. The study included the total of the 66 farms of the Chios sheep breeders' cooperative Macedonia. Data were collected using a structured direct questionnaire for in-depth interviews, including questions properly selected to obtain a general description of farm characteristics and overall management practices. A multivariate statistical analysis was used on the data to obtain the most appropriate typology. Initially, principal component analysis was used to produce uncorrelated variables (principal components), which would be used for the consecutive cluster analysis. The number of clusters was decided using hierarchical cluster analysis, whereas, the farms were allocated in 4 clusters using k-means cluster analysis. The identified clusters were described and afterward compared using one-way ANOVA or a chi-squared test. The main differences were evident on land availability and use, facility and equipment availability and type, expansion rates, and application of preventive flock health programs. In general, cluster 1 included newly established, intensive, well-equipped, specialized farms and cluster 2 included well-established farms with balanced sheep and feed/crop production. In cluster 3 were assigned small flock farms focusing more on arable crops than on sheep farming with a tendency to evolve toward cluster 2, whereas cluster 4 included farms representing a rather conservative form of Chios sheep breeding with low/intermediate inputs and choosing not to focus on feed/crop production. In the studied set of farms, 4 different farmer attitudes were evident: 1) farming disrupts sheep breeding; feed should be purchased and economies of scale will decrease costs (mainly cluster 1), 2) only exercise/pasture land is necessary; at least part of the feed (pasture) must be home-grown to decrease costs (clusters 1 and 4), 3) providing pasture to sheep is essential; on-farm feed production decreases costs (mainly cluster 3), and 4) large-scale farming (feed production and cash crops) does not disrupt sheep breeding; all feed must be produced on-farm to decrease costs (mainly cluster 3). Conducting a profitability analysis among different clusters, exploring and discovering the most beneficial levels of intensified management and capital investment should now be considered. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Konno, Satoshi; Taniguchi, Natsuko; Makita, Hironi; Nakamaru, Yuji; Shimizu, Kaoruko; Shijubo, Noriharu; Fuke, Satoshi; Takeyabu, Kimihiro; Oguri, Mitsuru; Kimura, Hirokazu; Maeda, Yukiko; Suzuki, Masaru; Nagai, Katsura; Ito, Yoichi M; Wenzel, Sally E; Nishimura, Masaharu
2015-12-01
Smoking may have multifactorial effects on asthma phenotypes, particularly in severe asthma. Cluster analysis has been applied to explore novel phenotypes, which are not based on any a priori hypotheses. To explore novel severe asthma phenotypes by cluster analysis when including cigarette smokers. We recruited a total of 127 subjects with severe asthma, including 59 current or ex-smokers, from our university hospital and its 29 affiliated hospitals/pulmonary clinics. Twelve clinical variables obtained during a 2-day hospital stay were used for cluster analysis. After clustering using clinical variables, the sputum levels of 14 molecules were measured to biologically characterize the clinical clusters. Five clinical clusters were identified, including two characterized by high pack-year exposure to cigarette smoking and low FEV1/FVC. There were marked differences between the two clusters of cigarette smokers. One had high levels of circulating eosinophils, high IgE levels, and a high sinus disease score. The other was characterized by low levels of the same parameters. Sputum analysis revealed increased levels of IL-5 in the former cluster and increased levels of IL-6 and osteopontin in the latter. The other three clusters were similar to those previously reported: young onset/atopic, nonsmoker/less eosinophilic, and female/obese. Key clinical variables were confirmed to be stable and consistent 1 year later. This study reveals two distinct phenotypes of severe asthma in current and former cigarette smokers with potentially different biological pathways contributing to fixed airflow limitation. Clinical trial registered with www.umin.ac.jp (000003254).
Suicide in the oldest old: an observational study and cluster analysis.
Sinyor, Mark; Tan, Lynnette Pei Lin; Schaffer, Ayal; Gallagher, Damien; Shulman, Kenneth
2016-01-01
The older population are at a high risk for suicide. This study sought to learn more about the characteristics of suicide in the oldest-old and to use a cluster analysis to determine if oldest-old suicide victims assort into clinically meaningful subgroups. Data were collected from a coroner's chart review of suicide victims in Toronto from 1998 to 2011. We compared two age groups (65-79 year olds, n = 335, and 80+ year olds, n = 191) and then conducted a hierarchical agglomerative cluster analysis using Ward's method to identify distinct clusters in the 80+ group. The younger and older age groups differed according to marital status, living circumstances and pattern of stressors. The cluster analysis identified three distinct clusters in the 80+ group. Cluster 1 was the largest (n = 124) and included people who were either married or widowed who had significantly more depression and somewhat more medical health stressors. In contrast, cluster 2 (n = 50) comprised people who were almost all single and living alone with significantly less identified depression and slightly fewer medical health stressors. All members of cluster 3 (n = 17) lived in a retirement residence or nursing home, and this group had the highest rates of depression, dementia, other mental illness and past suicide attempts. This is the first study to use the cluster analysis technique to identify meaningful subgroups among suicide victims in the oldest-old. The results reveal different patterns of suicide in the older population that may be relevant for clinical care. Copyright © 2015 John Wiley & Sons, Ltd.
The dynamics of cyclone clustering in re-analysis and a high-resolution climate model
NASA Astrophysics Data System (ADS)
Priestley, Matthew; Pinto, Joaquim; Dacre, Helen; Shaffrey, Len
2017-04-01
Extratropical cyclones have a tendency to occur in groups (clusters) in the exit of the North Atlantic storm track during wintertime, potentially leading to widespread socioeconomic impacts. The Winter of 2013/14 was the stormiest on record for the UK and was characterised by the recurrent clustering of intense extratropical cyclones. This clustering was associated with a strong, straight and persistent North Atlantic 250 hPa jet with Rossby wave-breaking (RWB) on both flanks, pinning the jet in place. Here, we provide for the first time an analysis of all clustered events in 36 years of the ERA-Interim Re-analysis at three latitudes (45˚ N, 55˚ N, 65˚ N) encompassing various regions of Western Europe. The relationship between the occurrence of RWB and cyclone clustering is studied in detail. Clustering at 55˚ N is associated with an extended and anomalously strong jet flanked on both sides by RWB. However, clustering at 65(45)˚ N is associated with RWB to the south (north) of the jet, deflecting the jet northwards (southwards). A positive correlation was found between the intensity of the clustering and RWB occurrence to the north and south of the jet. However, there is considerable spread in these relationships. Finally, analysis has shown that the relationships identified in the re-analysis are also present in a high-resolution coupled global climate model (HiGEM). In particular, clustering is associated with the same dynamical conditions at each of our three latitudes in spite of the identified biases in frequency and intensity of RWB.
NASA Astrophysics Data System (ADS)
Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan
2017-12-01
Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.
Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.
Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si
2017-07-01
Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.
Grimsley, Jasmine M S; Gadziola, Marie A; Wenstrup, Jeffrey J
2012-01-01
Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified 10 syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.
Micro-heterogeneity versus clustering in binary mixtures of ethanol with water or alkanes.
Požar, Martina; Lovrinčević, Bernarda; Zoranić, Larisa; Primorać, Tomislav; Sokolić, Franjo; Perera, Aurélien
2016-08-24
Ethanol is a hydrogen bonding liquid. When mixed in small concentrations with water or alkanes, it forms aggregate structures reminiscent of, respectively, the direct and inverse micellar aggregates found in emulsions, albeit at much smaller sizes. At higher concentrations, micro-heterogeneous mixing with segregated domains is found. We examine how different statistical methods, namely correlation function analysis, structure factor analysis and cluster distribution analysis, can describe efficiently these morphological changes in these mixtures. In particular, we explain how the neat alcohol pre-peak of the structure factor evolves into the domain pre-peak under mixing conditions, and how this evolution differs whether the co-solvent is water or alkane. This study clearly establishes the heuristic superiority of the correlation function/structure factor analysis to study the micro-heterogeneity, since cluster distribution analysis is insensitive to domain segregation. Correlation functions detect the domains, with a clear structure factor pre-peak signature, while the cluster techniques detect the cluster hierarchy within domains. The main conclusion is that, in micro-segregated mixtures, the domain structure is a more fundamental statistical entity than the underlying cluster structures. These findings could help better understand comparatively the radiation scattering experiments, which are sensitive to domains, versus the spectroscopy-NMR experiments, which are sensitive to clusters.
Variable Screening for Cluster Analysis.
ERIC Educational Resources Information Center
Donoghue, John R.
Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and…
Spatial pattern recognition of seismic events in South West Colombia
NASA Astrophysics Data System (ADS)
Benítez, Hernán D.; Flórez, Juan F.; Duque, Diana P.; Benavides, Alberto; Lucía Baquero, Olga; Quintero, Jiber
2013-09-01
Recognition of seismogenic zones in geographical regions supports seismic hazard studies. This recognition is usually based on visual, qualitative and subjective analysis of data. Spatial pattern recognition provides a well founded means to obtain relevant information from large amounts of data. The purpose of this work is to identify and classify spatial patterns in instrumental data of the South West Colombian seismic database. In this research, clustering tendency analysis validates whether seismic database possesses a clustering structure. A non-supervised fuzzy clustering algorithm creates groups of seismic events. Given the sensitivity of fuzzy clustering algorithms to centroid initial positions, we proposed a methodology to initialize centroids that generates stable partitions with respect to centroid initialization. As a result of this work, a public software tool provides the user with the routines developed for clustering methodology. The analysis of the seismogenic zones obtained reveals meaningful spatial patterns in South-West Colombia. The clustering analysis provides a quantitative location and dispersion of seismogenic zones that facilitates seismological interpretations of seismic activities in South West Colombia.
Abramyan, Tigran M; Snyder, James A; Thyparambil, Aby A; Stuart, Steven J; Latour, Robert A
2016-08-05
Clustering methods have been widely used to group together similar conformational states from molecular simulations of biomolecules in solution. For applications such as the interaction of a protein with a surface, the orientation of the protein relative to the surface is also an important clustering parameter because of its potential effect on adsorbed-state bioactivity. This study presents cluster analysis methods that are specifically designed for systems where both molecular orientation and conformation are important, and the methods are demonstrated using test cases of adsorbed proteins for validation. Additionally, because cluster analysis can be a very subjective process, an objective procedure for identifying both the optimal number of clusters and the best clustering algorithm to be applied to analyze a given dataset is presented. The method is demonstrated for several agglomerative hierarchical clustering algorithms used in conjunction with three cluster validation techniques. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Transcriptional and Chromatin Dynamics of Muscle Regeneration After Severe Trauma
2016-10-12
performed pathway analysis of the time-clustered RNA- Seq data16 and showed an initial burst of pro-inflammatory and immune-response transcripts in the...143 showed dynamic behavior (See Methods) and analysis of the dynamic miRNAs reinforced many of the results observed from the RNA-Seq datasets...excellent agreement was viewed. Hierarchical clustering of the datasets through time revealed 5 clusters, and gene ontology (GO) analysis of the
On-Line Pattern Analysis and Recognition System. OLPARS VI. Software Reference Manual,
1982-06-18
Discriminant Analysis Data Transformation, Feature Extraction, Feature Evaluation Cluster Analysis, Classification Computer Software 20Z. ABSTRACT... cluster /scatter cut-off value, (2) change the one-space bin factor, (3) change from long prompts to short prompts or vice versa, (4) change the...value, a cluster plot is displayed, otherwise a scatter plot is shown. if option 1 is selected, the program requests that a new value be input
Applications of cluster analysis to satellite soundings
NASA Technical Reports Server (NTRS)
Munteanu, M. J.; Jakubowicz, O.; Kalnay, E.; Piraino, P.
1984-01-01
The advantages of the use of cluster analysis in the improvement of satellite temperature retrievals were evaluated since the use of natural clusters, which are associated with atmospheric temperature soundings characteristic of different types of air masses, has the potential for improving stratified regression schemes in comparison with currently used methods which stratify soundings based on latitude, season, and land/ocean. The method of discriminatory analysis was used. The correct cluster of temperature profiles from satellite measurements was located in 85% of the cases. Considerable improvement was observed at all mandatory levels using regression retrievals derived in the clusters of temperature (weighted and nonweighted) in comparison with the control experiment and with the regression retrievals derived in the clusters of brightness temperatures of 3 MSU and 5 IR channels.
Craen, Saskia de; Commandeur, Jacques J F; Frank, Laurence E; Heiser, Willem J
2006-06-01
K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these populations showed a significant effect of lack of sphericity and group size. This effect was, however, not as large as expected, with still a recovery index of more than 0.5 in the "worst case scenario." An interaction effect between the two data aspects was also found. The decreasing trend in the recovery of clusters for increasing departures from sphericity is different for equal and unequal group sizes.
Kanno, Chihiro; Sakamoto, Kentaro Q; Yanagawa, Yojiro; Takahashi, Yoshiyuki; Katagiri, Seiji; Nagano, Masashi
2017-08-04
In the present study, bull sperm in the first and second ejaculates were divided into subpopulations based on their motility characteristics using a cluster analysis of data from computer-assisted sperm motility analysis (CASA). Semen samples were collected from 4 Japanese black bulls. Data from 9,228 motile sperm were classified into 4 clusters; 1) very rapid and progressively motile sperm, 2) rapid and circularly motile sperm with widely moving heads, 3) moderately motile sperm with heads moving frequently in a short length, and 4) poorly motile sperm. The percentage of cluster 1 varied between bulls. The first ejaculates had a higher proportion of cluster 2 and lower proportion of cluster 3 than the second ejaculates.
Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis
Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis
2016-01-01
Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230
Calibrating the Planck cluster mass scale with CLASH
NASA Astrophysics Data System (ADS)
Penna-Lima, M.; Bartlett, J. G.; Rozo, E.; Melin, J.-B.; Merten, J.; Evrard, A. E.; Postman, M.; Rykoff, E.
2017-08-01
We determine the mass scale of Planck galaxy clusters using gravitational lensing mass measurements from the Cluster Lensing And Supernova survey with Hubble (CLASH). We have compared the lensing masses to the Planck Sunyaev-Zeldovich (SZ) mass proxy for 21 clusters in common, employing a Bayesian analysis to simultaneously fit an idealized CLASH selection function and the distribution between the measured observables and true cluster mass. We used a tiered analysis strategy to explicitly demonstrate the importance of priors on weak lensing mass accuracy. In the case of an assumed constant bias, bSZ, between true cluster mass, M500, and the Planck mass proxy, MPL, our analysis constrains 1-bSZ = 0.73 ± 0.10 when moderate priors on weak lensing accuracy are used, including a zero-mean Gaussian with standard deviation of 8% to account for possible bias in lensing mass estimations. Our analysis explicitly accounts for possible selection bias effects in this calibration sourced by the CLASH selection function. Our constraint on the cluster mass scale is consistent with recent results from the Weighing the Giants program and the Canadian Cluster Comparison Project. It is also consistent, at 1.34σ, with the value needed to reconcile the Planck SZ cluster counts with Planck's base ΛCDM model fit to the primary cosmic microwave background anisotropies.
Impact of Sampling Density on the Extent of HIV Clustering
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2014-01-01
Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430
Using cluster analysis to organize and explore regional GPS velocities
Simpson, Robert W.; Thatcher, Wayne; Savage, James C.
2012-01-01
Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.
Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis
NASA Astrophysics Data System (ADS)
Fu, Pei-hua; Yin, Hong-bo
In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.
Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson
2017-07-01
The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.
The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions
NASA Astrophysics Data System (ADS)
Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.
2018-04-01
The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.
Clustering analysis strategies for electron energy loss spectroscopy (EELS).
Torruella, Pau; Estrader, Marta; López-Ortega, Alberto; Baró, Maria Dolors; Varela, Maria; Peiró, Francesca; Estradé, Sònia
2018-02-01
In this work, the use of cluster analysis algorithms, widely applied in the field of big data, is proposed to explore and analyze electron energy loss spectroscopy (EELS) data sets. Three different data clustering approaches have been tested both with simulated and experimental data from Fe 3 O 4 /Mn 3 O 4 core/shell nanoparticles. The first method consists on applying data clustering directly to the acquired spectra. A second approach is to analyze spectral variance with principal component analysis (PCA) within a given data cluster. Lastly, data clustering on PCA score maps is discussed. The advantages and requirements of each approach are studied. Results demonstrate how clustering is able to recover compositional and oxidation state information from EELS data with minimal user input, giving great prospects for its usage in EEL spectroscopy. Copyright © 2017 Elsevier B.V. All rights reserved.
Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto
2018-01-01
Abstract Background A huge amount of literature suggests that adolescents’ health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. Methods The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students’ data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Results Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: ‘Alcohol drinking’, ‘Smoking’, ‘Physical activity’, ‘Screen time’, ‘Signs & symptoms’, ‘Healthy eating’, ‘Violence’ and ‘Sweet tooth’. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Conclusions Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany. PMID:27908972
Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto
2018-03-01
A huge amount of literature suggests that adolescents' health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students' data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: 'Alcohol drinking', 'Smoking', 'Physical activity', 'Screen time', 'Signs & symptoms', 'Healthy eating', 'Violence' and 'Sweet tooth'. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany.
Topic modeling for cluster analysis of large biological and medical datasets
2014-01-01
Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106
Topic modeling for cluster analysis of large biological and medical datasets.
Zhao, Weizhong; Zou, Wen; Chen, James J
2014-01-01
The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
DICON: interactive visual analysis of multidimensional clusters.
Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin
2011-12-01
Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.
Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
Statistical Significance for Hierarchical Clustering
Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.
2017-01-01
Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
A Note on Cluster Effects in Latent Class Analysis
ERIC Educational Resources Information Center
Kaplan, David; Keller, Bryan
2011-01-01
This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under…
Mining a Web Citation Database for Author Co-Citation Analysis.
ERIC Educational Resources Information Center
He, Yulan; Hui, Siu Cheung
2002-01-01
Proposes a mining process to automate author co-citation analysis based on the Web Citation Database, a data warehouse for storing citation indices of Web publications. Describes the use of agglomerative hierarchical clustering for author clustering and multidimensional scaling for displaying author cluster maps, and explains PubSearch, a…
Multiscale visual quality assessment for cluster analysis with self-organizing maps
NASA Astrophysics Data System (ADS)
Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias
2011-01-01
Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
Potential of SNP markers for the characterization of Brazilian cassava germplasm.
de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte
2014-06-01
High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.
Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques
NASA Astrophysics Data System (ADS)
Gulgundi, Mohammad Shahid; Shetty, Amba
2018-03-01
Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.
Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis
2016-10-01
AWARD NUMBER: W81XWH-15-2-0032 TITLE: Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis PRINCIPAL...4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis 5b...Public Release; Distribution Unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT The subject of the project is FY14 PRMRP Topic Area – Tinnitus . The broad
2017-01-30
dynamic structural time- history response analysis of flexible approach walls founded on clustered pile groups using Impact_Deck. In Preparation, ERDC...research (Ebeling et al. 2012) has developed simplified analysis procedures for flexible approach wall systems founded on clustered groups of vertical...history response analysis of flexible approach walls founded on clustered pile groups using Impact_Deck. In Preparation, ERDC/ITL TR-16-X. Vicksburg, MS
Toyoda, Hiromitsu; Takahashi, Shinji; Hoshino, Masatoshi; Takayama, Kazushi; Iseki, Kazumichi; Sasaoka, Ryuichi; Tsujio, Tadao; Yasuda, Hiroyuki; Sasaki, Takeharu; Kanematsu, Fumiaki; Kono, Hiroshi; Nakamura, Hiroaki
2017-09-23
This study demonstrated four distinct patterns in the course of back pain after osteoporotic vertebral fracture (OVF). Greater angular instability in the first 6 months after the baseline was one factor affecting back pain after OVF. Understanding the natural course of symptomatic acute OVF is important in deciding the optimal treatment strategy. We used latent class analysis to classify the course of back pain after OVF and identify the risk factors associated with persistent pain. This multicenter cohort study included 218 consecutive patients with ≤ 2-week-old OVFs who were enrolled at 11 institutions. Dynamic x-rays and back pain assessment with a visual analog scale (VAS) were obtained at enrollment and at 1-, 3-, and 6-month follow-ups. The VAS scores were used to characterize patient groups, using hierarchical cluster analysis. VAS for 128 patients was used for hierarchical cluster analysis. Analysis yielded four clusters representing different patterns of back pain progression. Cluster 1 patients (50.8%) had stable, mild pain. Cluster 2 patients (21.1%) started with moderate pain and progressed quickly to very low pain. Patients in cluster 3 (10.9%) had moderate pain that initially improved but worsened after 3 months. Cluster 4 patients (17.2%) had persistent severe pain. Patients in cluster 4 showed significant high baseline pain intensity, higher degree of angular instability, and higher number of previous OVFs, and tended to lack regular exercise. In contrast, patients in cluster 2 had significantly lower baseline VAS and less angular instability. We identified four distinct groups of OVF patients with different patterns of back pain progression. Understanding the course of back pain after OVF may help in its management and contribute to future treatment trials.
Fens, Niki; van Rossum, Annelot G J; Zanen, Pieter; van Ginneken, Bram; van Klaveren, Rob J; Zwinderman, Aeilko H; Sterk, Peter J
2013-06-01
Classification of COPD is currently based on the presence and severity of airways obstruction. However, this may not fully reflect the phenotypic heterogeneity of COPD in the (ex-) smoking community. We hypothesized that factor analysis followed by cluster analysis of functional, clinical, radiological and exhaled breath metabolomic features identifies subphenotypes of COPD in a community-based population of heavy (ex-) smokers. Adults between 50-75 years with a smoking history of at least 15 pack-years derived from a random population-based survey as part of the NELSON study underwent detailed assessment of pulmonary function, chest CT scanning, questionnaires and exhaled breath molecular profiling using an electronic nose. Factor and cluster analyses were performed on the subgroup of subjects fulfilling the GOLD criteria for COPD (post-BD FEV1/FVC < 0.70). Three hundred subjects were recruited, of which 157 fulfilled the criteria for COPD and were included in the factor and cluster analysis. Four clusters were identified: cluster 1 (n = 35; 22%): mild COPD, limited symptoms and good quality of life. Cluster 2 (n = 48; 31%): low lung function, combined emphysema and chronic bronchitis and a distinct breath molecular profile. Cluster 3 (n = 60; 38%): emphysema predominant COPD with preserved lung function. Cluster 4 (n = 14; 9%): highly symptomatic COPD with mildly impaired lung function. In a leave-one-out validation analysis an accuracy of 97.4% was reached. This unbiased taxonomy for mild to moderate COPD reinforces clusters found in previous studies and thereby allows better phenotyping of COPD in the general (ex-) smoking population.
Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033
NASA Astrophysics Data System (ADS)
Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.
2018-04-01
Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.
Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.
Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban
2017-05-01
Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.
NASA Astrophysics Data System (ADS)
Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew
2017-01-01
We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ˜0.1 dex for GCs with metallicities as high as [Fe/H] = -0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na I, Mg I, Al I, Si I, Ca I, Ti I, Ti II, Sc II, V I, Cr I, Mn I, Co I, Ni I, Cu I, Y II, Zr I, Ba II, La II, Nd II, and Eu II. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe I, Ca I, Si I, Ni I, and Ba II. The elements that show the greatest differences include Mg I and Zr I. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.
[Visual field progression in glaucoma: cluster analysis].
Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M
2012-11-01
Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Cluster analysis of multiple planetary flow regimes
NASA Technical Reports Server (NTRS)
Mo, Kingtse; Ghil, Michael
1988-01-01
A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
NASA Astrophysics Data System (ADS)
Hasan, Noor Haliza; Abdullah, M. T.
2008-01-01
The aim of the study is to use cluster analysis on morphometric parameters within the genus Kerivoula to produce a dendrogram and to determine the suitability of this method to describe the relationship among species within this genus. A total of 15 adult male individuals from genus Kerivoula taken from sampling trips around Borneo and specimens kept at the zoological museum of Universiti Malaysia Sarawak were examined. A total of 27 characters using dental, skull and external body measurements were recorded. Clustering analysis illustrated the grouping and morphometric relationships between the species of this genus. It has clearly separated each species from each other despite the overlapping of measurements of some species within the genus. Cluster analysis provides an alternative approach to make a preliminary identification of a species.
Principal Component Clustering Approach to Teaching Quality Discriminant Analysis
ERIC Educational Resources Information Center
Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan
2016-01-01
Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…
Clustering "N" Objects into "K" Groups under Optimal Scaling of Variables.
ERIC Educational Resources Information Center
van Buuren, Stef; Heiser, Willem J.
1989-01-01
A method based on homogeneity analysis (multiple correspondence analysis or multiple scaling) is proposed to reduce many categorical variables to one variable with "k" categories. The method is a generalization of the sum of squared distances cluster analysis problem to the case of mixed measurement level variables. (SLD)
NASA Astrophysics Data System (ADS)
Iswandhani, N.; Muhajir, M.
2018-03-01
This research was conducted in Department of Statistics Islamic University of Indonesia. The data used are primary data obtained by post @explorejogja instagram account from January until December 2016. In the @explorejogja instagram account found many tourist destinations that can be visited by tourists both in the country and abroad, Therefore it is necessary to form a cluster of existing tourist destinations based on the number of likes from user instagram assumed as the most popular. The purpose of this research is to know the most popular distribution of tourist spot, the cluster formation of tourist destinations, and central popularity of tourist destinations based on @explorejogja instagram account in 2016. Statistical analysis used is descriptive statistics, k-means clustering, and social network analysis. The results of this research were obtained the top 10 most popular destinations in Yogyakarta, map of html-based tourist destination distribution consisting of 121 tourist destination points, formed 3 clusters each consisting of cluster 1 with 52 destinations, cluster 2 with 9 destinations and cluster 3 with 60 destinations, and Central popularity of tourist destinations in the special region of Yogyakarta by district.
Nowrousian, Minou
2009-04-01
During fungal fruiting body development, hyphae aggregate to form multicellular structures that protect and disperse the sexual spores. Analysis of microarray data revealed a gene cluster strongly upregulated during fruiting body development in the ascomycete Sordaria macrospora. Real time PCR analysis showed that the genes from the orthologous cluster in Neurospora crassa are also upregulated during development. The cluster encodes putative polyketide biosynthesis enzymes, including a reducing polyketide synthase. Analysis of knockout strains of a predicted dehydrogenase gene from the cluster showed that mutants in N. crassa and S. macrospora are delayed in fruiting body formation. In addition to the upregulated cluster, the N. crassa genome comprises another cluster containing a polyketide synthase gene, and five additional reducing polyketide synthase (rpks) genes that are not part of clusters. To study the role of these genes in sexual development, expression of the predicted rpks genes in S. macrospora (five genes) and N. crassa (six genes) was analyzed; all but one are upregulated during sexual development. Analysis of knockout strains for the N. crassa rpks genes showed that one of them is essential for fruiting body formation. These data indicate that polyketides produced by RPKSs are involved in sexual development in filamentous ascomycetes.
Longo, Dario Livio; Dastrù, Walter; Consolino, Lorena; Espak, Miklos; Arigoni, Maddalena; Cavallo, Federica; Aime, Silvio
2015-07-01
The objective of this study was to compare a clustering approach to conventional analysis methods for assessing changes in pharmacokinetic parameters obtained from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) during antiangiogenic treatment in a breast cancer model. BALB/c mice bearing established transplantable her2+ tumors were treated with a DNA-based antiangiogenic vaccine or with an empty plasmid (untreated group). DCE-MRI was carried out by administering a dose of 0.05 mmol/kg of Gadocoletic acid trisodium salt, a Gd-based blood pool contrast agent (CA) at 1T. Changes in pharmacokinetic estimates (K(trans) and vp) in a nine-day interval were compared between treated and untreated groups on a voxel-by-voxel analysis. The tumor response to therapy was assessed by a clustering approach and compared with conventional summary statistics, with sub-regions analysis and with histogram analysis. Both the K(trans) and vp estimates, following blood-pool CA injection, showed marked and spatial heterogeneous changes with antiangiogenic treatment. Averaged values for the whole tumor region, as well as from the rim/core sub-regions analysis were unable to assess the antiangiogenic response. Histogram analysis resulted in significant changes only in the vp estimates (p<0.05). The proposed clustering approach depicted marked changes in both the K(trans) and vp estimates, with significant spatial heterogeneity in vp maps in response to treatment (p<0.05), provided that DCE-MRI data are properly clustered in three or four sub-regions. This study demonstrated the value of cluster analysis applied to pharmacokinetic DCE-MRI parametric maps for assessing tumor response to antiangiogenic therapy. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew
2017-01-10
We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clustersmore » may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.« less
ERIC Educational Resources Information Center
Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2008-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics
ERIC Educational Resources Information Center
Chan, Julia Y. K.; Bauer, Christopher F.
2014-01-01
The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…
Analysis of large-scale gene expression data.
Sherlock, G
2000-04-01
The advent of cDNA and oligonucleotide microarray technologies has led to a paradigm shift in biological investigation, such that the bottleneck in research is shifting from data generation to data analysis. Hierarchical clustering, divisive clustering, self-organizing maps and k-means clustering have all been recently used to make sense of this mass of data.
ERIC Educational Resources Information Center
Hale, Robert L.; Dougherty, Donna
1988-01-01
Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…
Murray, Nicholas P; Hunfalvay, Melissa
2017-02-01
Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.
Astrophysical properties of star clusters in the Magellanic Clouds homogeneously estimated by ASteCA
NASA Astrophysics Data System (ADS)
Perren, G. I.; Piatti, A. E.; Vázquez, R. A.
2017-06-01
Aims: We seek to produce a homogeneous catalog of astrophysical parameters of 239 resolved star clusters, located in the Small and Large Magellanic Clouds, observed in the Washington photometric system. Methods: The cluster sample was processed with the recently introduced Automated Stellar Cluster Analysis (ASteCA) package, which ensures both an automatized and a fully reproducible treatment, together with a statistically based analysis of their fundamental parameters and associated uncertainties. The fundamental parameters determined for each cluster with this tool, via a color-magnitude diagram (CMD) analysis, are metallicity, age, reddening, distance modulus, and total mass. Results: We generated a homogeneous catalog of structural and fundamental parameters for the studied cluster sample and performed a detailed internal error analysis along with a thorough comparison with values taken from 26 published articles. We studied the distribution of cluster fundamental parameters in both Clouds and obtained their age-metallicity relationships. Conclusions: The ASteCA package can be applied to an unsupervised determination of fundamental cluster parameters, which is a task of increasing relevance as more data becomes available through upcoming surveys. A table with the estimated fundamental parameters for the 239 clusters analyzed is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A89
How Teachers Use and Manage Their Blogs? A Cluster Analysis of Teachers' Blogs in Taiwan
ERIC Educational Resources Information Center
Liu, Eric Zhi-Feng; Hou, Huei-Tse
2013-01-01
The development of Web 2.0 has ushered in a new set of web-based tools, including blogs. This study focused on how teachers use and manage their blogs. A sample of 165 teachers' blogs in Taiwan was analyzed by factor analysis, cluster analysis and qualitative content analysis. First, the teachers' blogs were analyzed according to six criteria…
Pérez-Rodrigo, Carmen; Gil, Ángel; González-Gross, Marcela; Ortega, Rosa M.; Serra-Majem, Lluis; Varela-Moreiras, Gregorio; Aranceta-Bartrina, Javier
2015-01-01
Weight gain has been associated with behaviors related to diet, sedentary lifestyle, and physical activity. We investigated dietary patterns and possible meaningful clustering of physical activity, sedentary behavior, and sleep time in Spanish children and adolescents and whether the identified clusters could be associated with overweight. Analysis was based on a subsample (n = 415) of the cross-sectional ANIBES study in Spain. We performed exploratory factor analysis and subsequent cluster analysis of dietary patterns, physical activity, sedentary behaviors, and sleep time. Logistic regression analysis was used to explore the association between the cluster solutions and overweight. Factor analysis identified four dietary patterns, one reflecting a profile closer to the traditional Mediterranean diet. Dietary patterns, physical activity behaviors, sedentary behaviors and sleep time on weekdays in Spanish children and adolescents clustered into two different groups. A low physical activity-poorer diet lifestyle pattern, which included a higher proportion of girls, and a high physical activity, low sedentary behavior, longer sleep duration, healthier diet lifestyle pattern. Although increased risk of being overweight was not significant, the Prevalence Ratios (PRs) for the low physical activity-poorer diet lifestyle pattern were >1 in children and in adolescents. The healthier lifestyle pattern included lower proportions of children and adolescents from low socioeconomic status backgrounds. PMID:26729155
A Bayesian cluster analysis method for single-molecule localization microscopy data.
Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick
2016-12-01
Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.
NASA Astrophysics Data System (ADS)
Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.
2017-01-01
This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.
Wang, Z; Wang, W H; Wang, S L; Jin, J; Song, Y W; Liu, Y P; Ren, H; Fang, H; Tang, Y; Chen, B; Qi, S N; Lu, N N; Li, N; Tang, Y; Liu, X F; Yu, Z H; Li, Y X
2016-06-23
To find phenotypic subgroups of patients with pT1-2N0 invasive breast cancer by means of cluster analysis and estimate the prognosis and clinicopathological features of these subgroups. From 1999 to 2013, 4979 patients with pT1-2N0 invasive breast cancer were recruited for hierarchical clustering analysis. Age (≤40, 41-70, 70+ years), size of primary tumor, pathological type, grade of differentiation, microvascular invasion, estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER-2) were chosen as distance metric between patients. Hierarchical cluster analysis was performed using Ward's method. Cophenetic correlation coefficient (CPCC) and Spearman correlation coefficient were used to validate clustering structures. The CPCC was 0.603. The Spearman correlation coefficient was 0.617 (P<0.001), which indicated a good fit of hierarchy to the data. A twelve-cluster model seemed to best illustrate our patient cohort. Patients in cluster 5, 9 and 12 had best prognosis and were characterized by age >40 years, smaller primary tumor, lower histologic grade, positive ER and PR status, and mainly negative HER-2. Patients in the cluster 1 and 11 had the worst prognosis, The cluster 1 was characterized by a larger tumor, higher grade and negative ER and PR status, while the cluster 11 was characterized by positive microvascular invasion. Patients in other 7 clusters had a moderate prognosis, and patients in each cluster had distinctive clinicopathological features and recurrent patterns. This study identified distinctive clinicopathologic phenotypes in a large cohort of patients with pT1-2N0 breast cancer through hierarchical clustering and revealed different prognosis. This integrative model may help physicians to make more personalized decisions regarding adjuvant therapy.
Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G Michael; O'Connor, Christopher; Patel, Chetan B
2016-01-01
Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using simultaneous considerations of etiology, comorbid conditions, and biomarker levels, may be superior to bedside classifications.
Almeida, Suzana C; George, Steven Z; Leite, Raquel D V; Oliveira, Anamaria S; Chaves, Thais C
2018-05-17
We aimed to empirically derive psychosocial and pain sensitivity subgroups using cluster analysis within a sample of individuals with chronic musculoskeletal pain (CMP) and to investigate derived subgroups for differences in pain and disability outcomes. Eighty female participants with CMP answered psychosocial and disability scales and were assessed for pressure pain sensitivity. A cluster analysis was used to derive subgroups, and analysis of variance (ANOVA) was used to investigate differences between subgroups. Psychosocial factors (kinesiophobia, pain catastrophizing, anxiety, and depression) and overall pressure pain threshold (PPT) were entered into the cluster analysis. Three subgroups were empirically derived: cluster 1 (high pain sensitivity and high psychosocial distress; n = 12) characterized by low overall PPT and high psychosocial scores; cluster 2 (high pain sensitivity and intermediate psychosocial distress; n = 39) characterized by low overall PPT and intermediate psychosocial scores; and cluster 3 (low pain sensitivity and low psychosocial distress; n = 29) characterized by high overall PPT and low psychosocial scores compared to the other subgroups. Cluster 1 showed higher values for mean pain intensity (F (2,77) = 10.58, p < 0.001) compared with cluster 3, and cluster 1 showed higher values for disability (F (2,77) = 3.81, p = 0.03) compared with both clusters 2 and 3. Only cluster 1 was distinct from cluster 3 according to both pain and disability outcomes. Pain catastrophizing, depression, and anxiety were the psychosocial variables that best differentiated the subgroups. Overall, these results call attention to the importance of considering pain sensitivity and psychosocial variables to obtain a more comprehensive characterization of CMP patients' subtypes.
Jurencák, Roman; Fritzler, Marvin; Tyrrell, Pascal; Hiraki, Linda; Benseler, Susanne; Silverman, Earl
2009-02-01
(1) To evaluate the spectrum of serum autoantibodies in pediatric-onset systemic lupus erythematosus (pSLE) with a focus on ethnic differences; (2) using cluster analysis, to identify patients with similar autoantibody patterns and to determine their clinical associations. A single-center cohort study of all patients with newly diagnosed pSLE seen over an 8-year period was performed. Ethnicity, clinical, and serological data were prospectively collected from 156/169 patients (92%). The frequencies of 10 selected autoantibodies among ethnic groups were compared. Cluster analysis identified groups of patients with similar autoantibody profiles. Associations of these groups with clinical and laboratory features of pSLE were examined. Among our 5 ethnic groups, there were differences only in the prevalence of anti-U1RNP and anti-Sm antibodies, which occurred more frequently in non-Caucasian patients (p < 0.0001, p < 0.01, respectively). Cluster analysis revealed 3 autoantibody clusters. Cluster 1 consisted of anti-dsDNA antibodies. Cluster 2 consisted of anti-dsDNA, antichromatin, antiribosomal P, anti-U1RNP, anti-Sm, anti-Ro and anti-La autoantibody. Cluster 3 consisted of anti-dsDNA, anti-RNP, and anti-Sm autoantibody. The highest proportion of Caucasians was in cluster 1 (p < 0.05), which was characterized by a mild disease with infrequent major organ involvement compared to cluster 2, which had the highest frequency of nephritis, renal failure, serositis, and hemolytic anemia, or cluster 3, which was characterized by frequent neuropsychiatric disease and nephritis. We observed ethnic differences in autoantibody profiles in pSLE. Autoantibodies tended to cluster together and these clusters were associated with different clinical courses.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Cluster analysis for determining distribution center location
NASA Astrophysics Data System (ADS)
Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian
2017-12-01
Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.
cluML: A markup language for clustering and cluster validity assessment of microarray data.
Bolshakova, Nadia; Cunningham, Pádraig
2005-01-01
cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.
NASA Astrophysics Data System (ADS)
Zhang, Rui; Jiang, Shuai; Liu, Yi-Rong; Wen, Hui; Feng, Ya-Juan; Huang, Teng; Huang, Wei
2018-05-01
Despite the very important role of atmospheric aerosol nucleation in climate change and air quality, the detailed aerosol nucleation mechanism is still unclear. Here we investigated the formic acid (FA) involved multicomponent nucleation molecular clusters including sulfuric acid (SA), dimethylamine (DMA) and water (W) through a quantum chemical method. The thermodynamics and kinetics analysis was based on the global minima given by Basin-Hopping (BH) algorithm coupled with Density Functional Theory (DFT) and subsequent benchmarked calculations. Then the interaction analysis based on ElectroStatic Potential (ESP), Topological and Atomic Charges analysis was made to characterize the binding features of the clusters. The results show that FA binds weakly with the other molecules in the cluster while W binds more weakly. Further kinetic analysis about the time evolution of the clusters show that even though the formic acid's weak interaction with other nucleation precursors, its effect on sulfuric acid dimer steady state concentration cannot be neglected due to its high concentration in the atmosphere.
a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data
NASA Astrophysics Data System (ADS)
Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.
2017-09-01
Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.
ERIC Educational Resources Information Center
Gartstein, Maria A.; Prokasky, Amanda; Bell, Martha Ann; Calkins, Susan; Bridgett, David J.; Braungart-Rieker, Julia; Leerkes, Esther; Cheatham, Carol L.; Eiden, Rina D.; Mize, Krystal D.; Jones, Nancy Aaron; Mireault, Gina; Seamon, Erich
2017-01-01
There is renewed interest in person-centered approaches to understanding the structure of temperament. However, questions concerning temperament types are not frequently framed in a developmental context, especially during infancy. In addition, the most common person-centered techniques, cluster analysis (CA) and latent profile analysis (LPA),…
Standardized Effect Size Measures for Mediation Analysis in Cluster-Randomized Trials
ERIC Educational Resources Information Center
Stapleton, Laura M.; Pituch, Keenan A.; Dion, Eric
2015-01-01
This article presents 3 standardized effect size measures to use when sharing results of an analysis of mediation of treatment effects for cluster-randomized trials. The authors discuss 3 examples of mediation analysis (upper-level mediation, cross-level mediation, and cross-level mediation with a contextual effect) with demonstration of the…
Ruzik, L; Obarski, N; Papierz, A; Mojski, M
2015-06-01
High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability. © 2014 Society of Cosmetic Scientists and the Société Française de Cosmétologie.
Jadhav, Rohit R; Ye, Zhenqing; Huang, Rui-Lan; Liu, Joseph; Hsu, Pei-Yin; Huang, Yi-Wen; Rangel, Leticia B; Lai, Hung-Cheng; Roa, Juan Carlos; Kirma, Nameer B; Huang, Tim Hui-Ming; Jin, Victor X
2015-01-01
Recent genome-wide analysis has shown that DNA methylation spans long stretches of chromosome regions consisting of clusters of contiguous CpG islands or gene families. Hypermethylation of various gene clusters has been reported in many types of cancer. In this study, we conducted methyl-binding domain capture (MBDCap) sequencing (MBD-seq) analysis on a breast cancer cohort consisting of 77 patients and 10 normal controls, as well as a panel of 38 breast cancer cell lines. Bioinformatics analysis determined seven gene clusters with a significant difference in overall survival (OS) and further revealed a distinct feature that the conservation of a large gene cluster (approximately 70 kb) metallothionein-1 (MT1) among 45 species is much lower than the average of all RefSeq genes. Furthermore, we found that DNA methylation is an important epigenetic regulator contributing to gene repression of MT1 gene cluster in both ERα positive (ERα+) and ERα negative (ERα-) breast tumors. In silico analysis revealed much lower gene expression of this cluster in The Cancer Genome Atlas (TCGA) cohort for ERα + tumors. To further investigate the role of estrogen, we conducted 17β-estradiol (E2) and demethylating agent 5-aza-2'-deoxycytidine (DAC) treatment in various breast cancer cell types. Cell proliferation and invasion assays suggested MT1F and MT1M may play an anti-oncogenic role in breast cancer. Our data suggests that DNA methylation in large contiguous gene clusters can be potential prognostic markers of breast cancer. Further investigation of these clusters revealed that estrogen mediates epigenetic repression of MT1 cluster in ERα + breast cancer cell lines. In all, our studies identify thousands of breast tumor hypermethylated regions for the first time, in particular, discovering seven large contiguous hypermethylated gene clusters.
NASA Astrophysics Data System (ADS)
Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.
2017-11-01
In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.
Identification and validation of asthma phenotypes in Chinese population using cluster analysis.
Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang
2017-10-01
Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P < .0001 for all comparisons). We identified 3 clinical clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.
ERIC Educational Resources Information Center
Carter, Randy L.; And Others
1989-01-01
The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…
ERIC Educational Resources Information Center
Hofmann, Richard J.
A very general model for the computation of independent cluster solutions in factor analysis is presented. The model is discussed as being either orthogonal or oblique. Furthermore, it is demonstrated that for every orthogonal independent cluster solution there is an oblique analog. Using three illustrative examples, certain generalities are made…
A Constraint-Based Approach to Acquisition of Word-Final Consonant Clusters in Turkish Children
ERIC Educational Resources Information Center
Gokgoz-Kurt, Burcu
2017-01-01
The current study provides a constraint-based analysis of L1 word-final consonant cluster acquisition in Turkish child language, based on the data originally presented by Topbas and Kopkalli-Yavuz (2008). The present analysis was done using [?]+obstruent consonant cluster acquisition. A comparison of Gradual Learning Algorithm (GLA) under…
Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein
2014-11-01
Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Text grouping in patent analysis using adaptive K-means clustering algorithm
NASA Astrophysics Data System (ADS)
Shanie, Tiara; Suprijadi, Jadi; Zulhanif
2017-03-01
Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.
Structural evolution in the crystallization of rapid cooling silver melt
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Z.A., E-mail: ze.tian@gmail.com; Laboratory for Simulation and Modelling of Particulate Systems School of Materials Science and Engineering, University of New South Wales, Sydney, NSW 2052; Dong, K.J.
2015-03-15
The structural evolution in a rapid cooling process of silver melt has been investigated at different scales by adopting several analysis methods. The results testify Ostwald’s rule of stages and Frank conjecture upon icosahedron with many specific details. In particular, the cluster-scale analysis by a recent developed method called LSCA (the Largest Standard Cluster Analysis) clarified the complex structural evolution occurred in crystallization: different kinds of local clusters (such as ico-like (ico is the abbreviation of icosahedron), ico-bcc like (bcc, body-centred cubic), bcc, bcc-like structures) in turn have their maximal numbers as temperature decreases. And in a rather wide temperaturemore » range the icosahedral short-range order (ISRO) demonstrates a saturated stage (where the amount of ico-like structures keeps stable) that breeds metastable bcc clusters. As the precursor of crystallization, after reaching the maximal number bcc clusters finally decrease, resulting in the final solid being a mixture mainly composed of fcc/hcp (face-centred cubic and hexagonal-closed packed) clusters and to a less degree, bcc clusters. This detailed geometric picture for crystallization of liquid metal is believed to be useful to improve the fundamental understanding of liquid–solid phase transition. - Highlights: • A comprehensive structural analysis is conducted focusing on crystallization. • The involved atoms in our analysis are more than 90% for all samples concerned. • A series of distinct intermediate states are found in crystallization of silver melt. • A novelty icosahedron-saturated state breeds the metastable bcc state.« less
Analysis of correlated mutations in HIV-1 protease using spectral clustering.
Liu, Ying; Eyal, Eran; Bahar, Ivet
2008-05-15
The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.
Analysis of Tropical Cyclone Tracks in the North Indian Ocean
NASA Astrophysics Data System (ADS)
Patwardhan, A.; Paliwal, M.; Mohapatra, M.
2011-12-01
Cyclones are regarded as one of the most dangerous meteorological phenomena of the tropical region. The probability of landfall of a tropical cyclone depends on its movement (trajectory). Analysis of trajectories of tropical cyclones could be useful for identifying potentially predictable characteristics. There is long history of analysis of tropical cyclones tracks. A common approach is using different clustering techniques to group the cyclone tracks on the basis of certain characteristics. Various clustering method have been used to study the tropical cyclones in different ocean basins like western North Pacific ocean (Elsner and Liu, 2003; Camargo et al., 2007), North Atlantic Ocean (Elsner, 2003; Gaffney et al. 2007; Nakamura et al., 2009). In this study, tropical cyclone tracks in the North Indian Ocean basin, for the period 1961-2010 have been analyzed and grouped into clusters based on their spatial characteristics. A tropical cyclone trajectory is approximated as an open curve and described by its first two moments. The resulting clusters have different centroid locations and also differently shaped variance ellipses. These track characteristics are then used in the standard clustering algorithms which allow the whole track shape, length, and location to be incorporated into the clustering methodology. The resulting clusters have different genesis locations and trajectory shapes. We have also examined characteristics such as life span, maximum sustained wind speed, landfall, seasonality, many of which are significantly different across the identified clusters. The clustering approach groups cyclones with higher maximum wind speed and longest life span in to one cluster. Another cluster includes short duration cyclonic events that are mostly deep depressions and significant for rainfall over Eastern and Central India. The clustering approach is likely to prove useful for analysis of events of significance with regard to impacts.
Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T
2010-01-01
Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.
Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.
Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J
2017-12-01
Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the limitations, trade-offs, and considerations for the sensitivities of variables and the biological interpretations of results. The Cluster app is available as a free download for Apple computers at iTunes, with a link to a user guide website.
A formal concept analysis approach to consensus clustering of multi-experiment expression data
2014-01-01
Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407
Atomistic cluster alignment method for local order mining in liquids and glasses
NASA Astrophysics Data System (ADS)
Fang, X. W.; Wang, C. Z.; Yao, Y. X.; Ding, Z. J.; Ho, K. M.
2010-11-01
An atomistic cluster alignment method is developed to identify and characterize the local atomic structural order in liquids and glasses. With the “order mining” idea for structurally disordered systems, the method can detect the presence of any type of local order in the system and can quantify the structural similarity between a given set of templates and the aligned clusters in a systematic and unbiased manner. Moreover, population analysis can also be carried out for various types of clusters in the system. The advantages of the method in comparison with other previously developed analysis methods are illustrated by performing the structural analysis for four prototype systems (i.e., pure Al, pure Zr, Zr35Cu65 , and Zr36Ni64 ). The results show that the cluster alignment method can identify various types of short-range orders (SROs) in these systems correctly while some of these SROs are difficult to capture by most of the currently available analysis methods (e.g., Voronoi tessellation method). Such a full three-dimensional atomistic analysis method is generic and can be applied to describe the magnitude and nature of noncrystalline ordering in many disordered systems.
Application of a Self-Similar Pressure Profile to Sunyaev-Zeldovich Effect Data from Galaxy Clusters
NASA Technical Reports Server (NTRS)
Mroczkowski, Tony; Bonamente, Max; Carlstrom, John E.; Culverhouse, Thomas L.; Greer, Christopher; Hawkins, David; Hennessy, Ryan; Joy, Marshall; Lamb, James W.; Leitch, Erik M.;
2009-01-01
We investigate the utility of a new, self-similar pressure profile for fitting Sunyaev-Zel'dovich (SZ) effect observations of galaxy clusters. Current SZ imaging instruments-such as the Sunyaev-Zel'dovich Array (SZA)- are capable of probing clusters over a large range in a physical scale. A model is therefore required that can accurately describe a cluster's pressure profile over a broad range of radii from the core of the cluster out to a significant fraction of the virial radius. In the analysis presented here, we fit a radial pressure profile derived from simulations and detailed X-ray analysis of relaxed clusters to SZA observations of three clusters with exceptionally high-quality X-ray data: A1835, A1914, and CL J1226.9+3332. From the joint analysis of the SZ and X-ray data, we derive physical properties such as gas mass, total mass, gas fraction and the intrinsic, integrated Compton y-parameter. We find that parameters derived from the joint fit to the SZ and X-ray data agree well with a detailed, independent X-ray-only analysis of the same clusters. In particular, we find that, when combined with X-ray imaging data, this new pressure profile yields an independent electron radial temperature profile that is in good agreement with spectroscopic X-ray measurements.
Cluster analysis of multiple planetary flow regimes
NASA Technical Reports Server (NTRS)
Mo, Kingtse; Ghil, Michael
1987-01-01
A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.
Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.
Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun
2017-01-01
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar
2018-06-07
Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.
SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.
Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A
2018-01-01
Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.
Liu, Shelley H; Li, Yan; Liu, Bian
2018-05-17
Chronic kidney disease is a leading cause of death in the United States. We used cluster analysis to explore patterns of chronic kidney disease in 500 of the largest US cities. After adjusting for socio-demographic characteristics, we found that unhealthy behaviors, prevention measures, and health outcomes related to chronic kidney disease differ between cities in Utah and those in the rest of the United States. Cluster analysis can be useful for identifying geographic regions that may have important policy implications for preventing chronic kidney disease.
The composite sequential clustering technique for analysis of multispectral scanner data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.
Comprehensive cluster analysis with Transitivity Clustering.
Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan
2011-03-01
Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.
Analysis of 3D vortex motion in a dusty plasma
NASA Astrophysics Data System (ADS)
Mulsow, M.; Himpel, M.; Melzer, A.
2017-12-01
Dust clusters of about 50-1000 particles have been confined near the sheath region of a gaseous radio-frequency plasma discharge. These compact clusters exhibit a vortex motion which has been reconstructed in full three dimensions from stereoscopy. Smaller clusters are found to show a competition between solid-like cluster structure and vortex motion, whereas larger clusters feature very pronounced vortices. From the three-dimensional analysis, the dust flow field has been found to be nearly incompressible. The vortices in all observed clusters are essentially poloidal. The dependence of the vorticity on the cluster size is discussed. Finally, the vortex motion has been quantitatively attributed to radial gradients of the ion drag force.
Patterns of Dysmorphic Features in Schizophrenia
Scutt, L.E.; Chow, E.W.C.; Weksberg, R.; Honer, W.G.; Bassett, Anne S.
2011-01-01
Congenital dysmorphic features are prevalent in schizophrenia and may reflect underlying neurodevelopmental abnormalities. A cluster analysis approach delineating patterns of dysmorphic features has been used in genetics to classify individuals into more etiologically homogeneous subgroups. In the present study, this approach was applied to schizophrenia, using a sample with a suspected genetic syndrome as a testable model. Subjects (n = 159) with schizophrenia or schizoaffective disorder were ascertained from chronic patient populations (random, n=123) or referred with possible 22q11 deletion syndrome (referred, n = 36). All subjects were evaluated for presence or absence of 70 reliably assessed dysmorphic features, which were used in a three-step cluster analysis. The analysis produced four major clusters with different patterns of dysmorphic features. Significant between-cluster differences were found for rates of 37 dysmorphic features (P < 0.05), median number of dysmorphic features (P = 0.0001), and validating features not used in the cluster analysis: mild mental retardation (P = 0.001) and congenital heart defects (P = 0.002). Two clusters (1 and 4) appeared to represent more developmental subgroups of schizophrenia with elevated rates of dysmorphic features and validating features. Cluster 1 (n = 27) comprised mostly referred subjects. Cluster 4 (n= 18) had a different pattern of dysmorphic features; one subject had a mosaic Turner syndrome variant. Two other clusters had lower rates and patterns of features consistent with those found in previous studies of schizophrenia. Delineating patterns of dysmorphic features may help identify subgroups that could represent neurodevelopmental forms of schizophrenia with more homogeneous origins. PMID:11803519
COVARIATE-ADAPTIVE CLUSTERING OF EXPOSURES FOR AIR POLLUTION EPIDEMIOLOGY COHORTS*
Keller, Joshua P.; Drton, Mathias; Larson, Timothy; Kaufman, Joel D.; Sandler, Dale P.; Szpiro, Adam A.
2017-01-01
Cohort studies in air pollution epidemiology aim to establish associations between health outcomes and air pollution exposures. Statistical analysis of such associations is complicated by the multivariate nature of the pollutant exposure data as well as the spatial misalignment that arises from the fact that exposure data are collected at regulatory monitoring network locations distinct from cohort locations. We present a novel clustering approach for addressing this challenge. Specifically, we present a method that uses geographic covariate information to cluster multi-pollutant observations and predict cluster membership at cohort locations. Our predictive k-means procedure identifies centers using a mixture model and is followed by multi-class spatial prediction. In simulations, we demonstrate that predictive k-means can reduce misclassification error by over 50% compared to ordinary k-means, with minimal loss in cluster representativeness. The improved prediction accuracy results in large gains of 30% or more in power for detecting effect modification by cluster in a simulated health analysis. In an analysis of the NIEHS Sister Study cohort using predictive k-means, we find that the association between systolic blood pressure (SBP) and long-term fine particulate matter (PM2.5) exposure varies significantly between different clusters of PM2.5 component profiles. Our cluster-based analysis shows that for subjects assigned to a cluster located in the Midwestern U.S., a 10 μg/m3 difference in exposure is associated with 4.37 mmHg (95% CI, 2.38, 6.35) higher SBP. PMID:28572869
Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Dwirahmadi, Febi; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Do, Cuong Manh; Nguyen, Trung Hieu; Dinh, Tuan Anh Diep
2015-05-01
The present study is an evaluation of temporal/spatial variations of surface water quality using multivariate statistical techniques, comprising cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA). Eleven water quality parameters were monitored at 38 different sites in Can Tho City, a Mekong Delta area of Vietnam from 2008 to 2012. Hierarchical cluster analysis grouped the 38 sampling sites into three clusters, representing mixed urban-rural areas, agricultural areas and industrial zone. FA/PCA resulted in three latent factors for the entire research location, three for cluster 1, four for cluster 2, and four for cluster 3 explaining 60, 60.2, 80.9, and 70% of the total variance in the respective water quality. The varifactors from FA indicated that the parameters responsible for water quality variations are related to erosion from disturbed land or inflow of effluent from sewage plants and industry, discharges from wastewater treatment plants and domestic wastewater, agricultural activities and industrial effluents, and contamination by sewage waste with faecal coliform bacteria through sewer and septic systems. Discriminant analysis (DA) revealed that nephelometric turbidity units (NTU), chemical oxygen demand (COD) and NH₃ are the discriminating parameters in space, affording 67% correct assignation in spatial analysis; pH and NO₂ are the discriminating parameters according to season, assigning approximately 60% of cases correctly. The findings suggest a possible revised sampling strategy that can reduce the number of sampling sites and the indicator parameters responsible for large variations in water quality. This study demonstrates the usefulness of multivariate statistical techniques for evaluation of temporal/spatial variations in water quality assessment and management.
Shin, Sang Soo; Shin, Young-Jeon
2016-01-01
With an increasing number of studies highlighting regional social capital (SC) as a determinant of health, many studies are using multi-level analysis with merged and averaged scores of community residents' survey responses calculated from community SC data. Sufficient examination is required to validate if the merged and averaged data can represent the community. Therefore, this study analyzes the validity of the selected indicators and their applicability in multi-level analysis. Within and between analysis (WABA) was performed after creating community variables using merged and averaged data of community residents' responses from the 2013 Community Health Survey in Korea, using subjective self-rated health assessment as a dependent variable. Further analysis was performed following the model suggested by WABA result. Both E-test results (1) and WABA results (2) revealed that single-level analysis needs to be performed using qualitative SC variable with cluster mean centering. Through single-level multivariate regression analysis, qualitative SC with cluster mean centering showed positive effect on self-rated health (0.054, p<0.001), although there was no substantial difference in comparison to analysis using SC variables without cluster mean centering or multi-level analysis. As modification in qualitative SC was larger within the community than between communities, we validate that relational analysis of individual self-rated health can be performed within the group, using cluster mean centering. Other tests besides the WABA can be performed in the future to confirm the validity of using community variables and their applicability in multi-level analysis.
ERIC Educational Resources Information Center
Lake County Area Vocational Center, Grayslake, IL.
This document contains a task analysis for health occupations (professional nurse) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…
ERIC Educational Resources Information Center
Lake County Area Vocational Center, Grayslake, IL.
This document contains a task analysis for health occupations (home health aid) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…
On the blind use of statistical tools in the analysis of globular cluster stars
NASA Astrophysics Data System (ADS)
D'Antona, Francesca; Caloi, Vittoria; Tailo, Marco
2018-04-01
As with most data analysis methods, the Bayesian method must be handled with care. We show that its application to determine stellar evolution parameters within globular clusters can lead to paradoxical results if used without the necessary precautions. This is a cautionary tale on the use of statistical tools for big data analysis.
Schultz, K K; Bennett, T B; Nordlund, K V; Döpfer, D; Cook, N B
2016-09-01
Transition cow management has been tracked via the Transition Cow Index (TCI; AgSource Cooperative Services, Verona, WI) since 2006. Transition Cow Index was developed to measure the difference between actual and predicted milk yield at first test day to evaluate the relative success of the transition period program. This project aimed to assess TCI in relation to all commonly used Dairy Herd Improvement (DHI) metrics available through AgSource Cooperative Services. Regression analysis was used to isolate variables that were relevant to TCI, and then principal components analysis and network analysis were used to determine the relative strength and relatedness among variables. Finally, cluster analysis was used to segregate herds based on similarity of relevant variables. The DHI data were obtained from 2,131 Wisconsin dairy herds with test-day mean ≥30 cows, which were tested ≥10 times throughout the 2014 calendar year. The original list of 940 DHI variables was reduced through expert-driven selection and regression analysis to 23 variables. The K-means cluster analysis produced 5 distinct clusters. Descriptive statistics were calculated for the 23 variables per cluster grouping. Using principal components analysis, cluster analysis, and network analysis, 4 parameters were isolated as most relevant to TCI; these were energy-corrected milk, 3 measures of intramammary infection (dry cow cure rate, linear somatic cell count score in primiparous cows, and new infection rate), peak ratio, and days in milk at peak milk production. These variables together with cow and newborn calf survival measures form a group of metrics that can be used to assist in the evaluation of overall transition period performance. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Subgroups of physically abusive parents based on cluster analysis of parenting behavior and affect.
Haskett, Mary E; Smith Scott, Susan; Sabourin Ward, Caryn
2004-10-01
Cluster analysis of observed parenting and self-reported discipline was used to categorize 83 abusive parents into subgroups. A 2-cluster solution received support for validity. Cluster 1 parents were relatively warm, positive, sensitive, and engaged during interactions with their children, whereas Cluster 2 parents were relatively negative, disengaged or intrusive, and insensitive. Further, clusters differed in emotional health, parenting stress, perceptions of children, and problem solving. Children of parents in the 2 clusters differed on several indexes of social adjustment. Cluster 1 parents were similar to nonabusive parents (n = 66) on parenting and related constructs, but Cluster 2 parents differed from nonabusive parents on all clustering variables and many validation variables. Results highlight clinically relevant diversity in parenting practices and functioning among abusive parents. ((c) 2004 APA, all rights reserved).
Dimensional assessment of personality pathology in patients with eating disorders.
Goldner, E M; Srikameswaran, S; Schroeder, M L; Livesley, W J; Birmingham, C L
1999-02-22
This study examined patients with eating disorders on personality pathology using a dimensional method. Female subjects who met DSM-IV diagnostic criteria for eating disorder (n = 136) were evaluated and compared to an age-controlled general population sample (n = 68). We assessed 18 features of personality disorder with the Dimensional Assessment of Personality Pathology - Basic Questionnaire (DAPP-BQ). Factor analysis and cluster analysis were used to derive three clusters of patients. A five-factor solution was obtained with limited intercorrelation between factors. Cluster analysis produced three clusters with the following characteristics: Cluster 1 members (constituting 49.3% of the sample and labelled 'rigid') had higher mean scores on factors denoting compulsivity and interpersonal difficulties; Cluster 2 (18.4% of the sample) showed highest scores in factors denoting psychopathy, neuroticism and impulsive features, and appeared to constitute a borderline psychopathology group; Cluster 3 (32.4% of the sample) was characterized by few differences in personality pathology in comparison to the normal population sample. Cluster membership was associated with DSM-IV diagnosis -- a large proportion of patients with anorexia nervosa were members of Cluster 1. An empirical classification of eating-disordered patients derived from dimensional assessment of personality pathology identified three groups with clinical relevance.
van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A
2009-06-01
Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.
ERIC Educational Resources Information Center
Miyamoto, S.; Nakayama, K.
1983-01-01
A method of two-stage clustering of literature based on citation frequency is applied to 5,065 articles from 57 journals in environmental and civil engineering. Results of related methods of citation analysis (hierarchical graph, clustering of journals, multidimensional scaling) applied to same set of articles are compared. Ten references are…
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
ERIC Educational Resources Information Center
Firdausiah Mansur, Andi Besse; Yusof, Norazah
2013-01-01
Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…
Bayesian network meta-analysis for cluster randomized trials with binary outcomes.
Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard
2017-06-01
Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Zhang, Jiang; Liu, Qi; Chen, Huafu; Yuan, Zhen; Huang, Jin; Deng, Lihua; Lu, Fengmei; Zhang, Junpeng; Wang, Yuqing; Wang, Mingwen; Chen, Liangyin
2015-01-01
Clustering analysis methods have been widely applied to identifying the functional brain networks of a multitask paradigm. However, the previously used clustering analysis techniques are computationally expensive and thus impractical for clinical applications. In this study a novel method, called SOM-SAPC that combines self-organizing mapping (SOM) and supervised affinity propagation clustering (SAPC), is proposed and implemented to identify the motor execution (ME) and motor imagery (MI) networks. In SOM-SAPC, SOM was first performed to process fMRI data and SAPC is further utilized for clustering the patterns of functional networks. As a result, SOM-SAPC is able to significantly reduce the computational cost for brain network analysis. Simulation and clinical tests involving ME and MI were conducted based on SOM-SAPC, and the analysis results indicated that functional brain networks were clearly identified with different response patterns and reduced computational cost. In particular, three activation clusters were clearly revealed, which include parts of the visual, ME and MI functional networks. These findings validated that SOM-SAPC is an effective and robust method to analyze the fMRI data with multitasks.
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
NASA Astrophysics Data System (ADS)
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2017-10-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
Distant Massive Clusters and Cosmology
NASA Technical Reports Server (NTRS)
Donahue, Megan
1999-01-01
We present a status report of our X-ray study and analysis of a complete sample of distant (z=0.5-0.8), X-ray luminous clusters of galaxies. We have obtained ASCA and ROSAT observations of the five brightest Extended Medium Sensitivity (EMSS) clusters with z > 0.5. We have constructed an observed temperature function for these clusters, and measured iron abundances for all of these clusters. We have developed an analytic expression for the behavior of the mass-temperature relation in a low-density universe. We use this mass-temperature relation together with a Press-Schechter-based model to derive the expected temperature function for different values of Omega-M. We combine this analysis with the observed temperature functions at redshifts from 0 - 0.8 to derive maximum likelihood estimates for the value of Omega-M. We report preliminary results of this analysis.
Surface Analysis Cluster Tool | Materials Science | NREL
spectroscopic ellipsometry during film deposition. The cluster tool can be used to study the effect of various prior to analysis. Here we illustrate the surface cleaning effect of an aqueous ammonia treatment on a
Gathering Real World Evidence with Cluster Analysis for Clinical Decision Support.
Xia, Eryu; Liu, Haifeng; Li, Jing; Mei, Jing; Li, Xuejun; Xu, Enliang; Li, Xiang; Hu, Gang; Xie, Guotong; Xu, Meilin
2017-01-01
Clinical decision support systems are information technology systems that assist clinical decision-making tasks, which have been shown to enhance clinical performance. Cluster analysis, which groups similar patients together, aims to separate patient cases into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. Useful as it is, the application of cluster analysis in clinical decision support systems is less reported. Here, we describe the usage of cluster analysis in clinical decision support systems, by first dividing patient cases into similar groups and then providing diagnosis or treatment suggestions based on the group profiles. This integration provides data for clinical decisions and compiles a wide range of clinical practices to inform the performance of individual clinicians. We also include an example usage of the system under the scenario of blood lipid management in type 2 diabetes. These efforts represent a step toward promoting patient-centered care and enabling precision medicine.
Feng, Sujuan; Qian, Xiaosong; Li, Han; Zhang, Xiaodong
2017-12-01
The aim of the present study was to investigate the effectiveness of the miR-17-92 cluster as a disease progression marker in prostate cancer (PCa). Reverse transcription-quantitative polymerase chain reaction analysis was used to detect the microRNA (miR)-17-92 cluster expression levels in tissues from patients with PCa or benign prostatic hyperplasia (BPH), in addition to in PCa and BPH cell lines. Spearman correlation was used for comparison and estimation of correlations between miRNA expression levels and clinicopathological characteristics such as the Gleason score and prostate-specific antigen (PSA). Receiver operating curve (ROC) analysis was performed for evaluation of specificity and sensitivity of miR-17-92 cluster expression levels for discriminating patients with PCa from patients with BPH. Kaplan-Meier analysis was plotted to investigate the predictive potential of miR-17-92 cluster for PCa biochemical recurrence. Expression of the majority of miRNAs in the miR-17-92 cluster was identified to be significantly increased in PCa tissues and cell lines. Bivariate correlation analysis indicated that the high expression of unregulated miRNAs was positively correlated with Gleason grade, but had no significant association with PSA. ROC curves demonstrated that high expression of miR-17-92 cluster predicted a higher diagnostic accuracy compared with PSA. Improved discriminating quotients were observed when combinations of unregulated miRNAs with PSA were used. Survival analysis confirmed a high combined miRNA score of miR-17-92 cluster was associated with shorter biochemical recurrence interval. miR-17-92 cluster could be a potential diagnostic and prognostic biomarker for PCa, and the combination of the miR-17-92 cluster and serum PSA may enhance the accuracy for diagnosis of PCa.
Statistical analysis and handling of missing data in cluster randomized trials: a systematic review.
Fiero, Mallorie H; Huang, Shuang; Oren, Eyal; Bell, Melanie L
2016-02-09
Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.
Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma.
Schatz, Michael; Hsu, Jin-Wen Y; Zeiger, Robert S; Chen, Wansu; Dorenbaum, Alejandro; Chipps, Bradley E; Haselkorn, Tmirah
2014-06-01
Asthma phenotyping can facilitate understanding of disease pathogenesis and potential targeted therapies. To further characterize the distinguishing features of phenotypic groups in difficult-to-treat asthma. Children ages 6-11 years (n = 518) and adolescents and adults ages ≥12 years (n = 3612) with severe or difficult-to-treat asthma from The Epidemiology and Natural History of Asthma: Outcomes and Treatment Regimens (TENOR) study were evaluated in this post hoc cluster analysis. Analyzed variables included sex, race, atopy, age of asthma onset, smoking (adolescents and adults), passive smoke exposure (children), obesity, and aspirin sensitivity. Cluster analysis used the hierarchical clustering algorithm with the Ward minimum variance method. The results were compared among clusters by χ(2) analysis; variables with significant (P < .05) differences among clusters were considered as distinguishing feature candidates. Associations among clusters and asthma-related health outcomes were assessed in multivariable analyses by adjusting for socioeconomic status, environmental exposures, and intensity of therapy. Five clusters were identified in each age stratum. Sex, atopic status, and nonwhite race were distinguishing variables in both strata; passive smoke exposure was distinguishing in children and aspirin sensitivity in adolescents and adults. Clusters were not related to outcomes in children, but 2 adult and adolescent clusters distinguished by nonwhite race and aspirin sensitivity manifested poorer quality of life (P < .0001), and the aspirin-sensitive cluster experienced more frequent asthma exacerbations (P < .0001). Distinct phenotypes appear to exist in patients with severe or difficult-to-treat asthma, which is related to outcomes in adolescents and adults but not in children. The study of the therapeutic implications of these phenotypes is warranted. Copyright © 2013 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights reserved.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.
Modica, Maddalena; Carabalona, Roberta; Spezzaferri, Rosa; Tavanelli, Monica; Torri, A; Ripamonti, Vittorino; Castiglioni, Paolo; De Maria, Renata; Ferratini, Maurizio
2012-03-01
To evaluate the psychological characteristics of coronary heart disease (CHD) patients after coronary artery bypass grafting (CABG) by cluster analysis of Minnesota Multiphasic Personality Inventory (MMPI-2) questionnaires and to assess the impact of the profiles obtained on long-term outcome. 229 CHD patients admitted to cardiac rehabilitation filled in self-administered MMPI-2 questionnaires early after CABG. We assessed the relation between MMPI-2 profiles derived by cluster analysis, clinical characteristics and outcome at 3-year follow-up. Among the 215 patients (76% men, median age 66 years) with valid criteria in control scales, we identified 3 clusters (G) with homogenous psychological characteristics: G1 patients (N = 75) presented somatoform complaints but overall minimal psychological distress. G2 patients (N=72) presented type D personality traits. G3 subjects (N=68) showed a trend to cynicism, mild increases in anger, social introversion and hostility. Clusters overlapped for clinical characteristics such as smoking (G1 21%, G2 24%, G3 24%, p ns), previous myocardial infarction (G1 43%, G2 47%, G3 49% p ns), LV ejection fraction (G1 60 [51-60]; G2 58 [49-60]; G3 60 [55-60], p ns), 3-vessel-disease prevalence (G1 69%, G2 65%, G3 71%, p ns). Three-year event rates were comparable (G1 15%; G2 18%; G3 15%) and Kaplan-Meier curves overlapped among clusters (p ns). After CABG, the interpretation of MMPI-2 by cluster analysis is useful for the psychological and personological diagnosis to direct psychological assistance. Conversely, results from cluster analysis of MMPI-2 do not seem helpful to the clinician to predict long term outcome.
Laursen, Jens; Milman, Nils; Pind, Niels; Pedersen, Henrik; Mulvad, Gert
2014-01-01
Meta-analysis of previous studies evaluating associations between content of elements sulphur (S), chlorine (Cl), potassium (K), iron (Fe), copper (Cu), zinc (Zn) and bromine (Br) in normal and cirrhotic autopsy liver tissue samples. Normal liver samples from 45 Greenlandic Inuit, median age 60 years and from 71 Danes, median age 61 years. Cirrhotic liver samples from 27 Danes, median age 71 years. Element content was measured using X-ray fluorescence spectrometry. Dual hierarchical clustering analysis, creating a dual dendrogram, one clustering element contents according to calculated similarities, one clustering elements according to correlation coefficients between the element contents, both using Euclidian distance and Ward Procedure. One dendrogram separated subjects in 7 clusters showing no differences in ethnicity, gender or age. The analysis discriminated between elements in normal and cirrhotic livers. The other dendrogram clustered elements in four clusters: sulphur and chlorine; copper and bromine; potassium and zinc; iron. There were significant correlations between the elements in normal liver samples: S was associated with Cl, K, Br and Zn; Cl with S and Br; K with S, Br and Zn; Cu with Br. Zn with S and K. Br with S, Cl, K and Cu. Fe did not show significant associations with any other element. In contrast to simple statistical methods, which analyses content of elements separately one by one, dual hierarchical clustering analysis incorporates all elements at the same time and can be used to examine the linkage and interplay between multiple elements in tissue samples. Copyright © 2013 Elsevier GmbH. All rights reserved.
Banelli, Barbara; Brigati, Claudio; Di Vinci, Angela; Casciano, Ida; Forlani, Alessandra; Borzì, Luana; Allemanni, Giorgio; Romani, Massimo
2012-03-01
Epigenetic alterations are hallmarks of cancer and powerful biomarkers, whose clinical utilization is made difficult by the absence of standardization and of common methods of data interpretation. The coordinate methylation of many loci in cancer is defined as 'CpG island methylator phenotype' (CIMP) and identifies clinically distinct groups of patients. In neuroblastoma (NB), CIMP is defined by a methylation signature, which includes different loci, but its predictive power on outcome is entirely recapitulated by the PCDHB cluster only. We have developed a robust and cost-effective pyrosequencing-based assay that could facilitate the clinical application of CIMP in NB. This assay permits the unbiased simultaneous amplification and sequencing of 17 out of 19 genes of the PCDHB cluster for quantitative methylation analysis, taking into account all the sequence variations. As some of these variations were at CpG doublets, we bypassed the data interpretation conducted by the methylation analysis software to assign the corrected methylation value at these sites. The final result of the assay is the mean methylation level of 17 gene fragments in the protocadherin B cluster (PCDHB) cluster. We have utilized this assay to compare the methylation levels of the PCDHB cluster between high-risk and very low-risk NB patients, confirming the predictive value of CIMP. Our results demonstrate that the pyrosequencing-based assay herein described is a powerful instrument for the analysis of this gene cluster that may simplify the data comparison between different laboratories and, in perspective, could facilitate its clinical application. Furthermore, our results demonstrate that, in principle, pyrosequencing can be efficiently utilized for the methylation analysis of gene clusters with high internal homologies.
Data depth based clustering analysis
Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...
2016-01-01
Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less
Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie
2011-11-01
The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.
A Study of Pupil Control Ideology: A Person-Oriented Approach to Data Analysis
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2010-01-01
Responses of urban school teachers to the Pupil Control Ideology questionnaire were studied using Latent Class Analysis. The results of the analysis suggest that the best fitting model to the data is a two-cluster solution. In particular, the pupil control ideology of the sample delineates into two clusters of teachers, those with humanistic and…
Level-2 Milestone 4797: Early Users on Max, Sequoia Visualization Cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cupps, Kim C.
This report documents the fact that an early user has run successfully on Max, the Sequoia visualization cluster, ASC L2 milestone 4797: Early Users on Sequoia Visualization System (Max), due December 31, 2013. The Max visualization and data analysis cluster will provide Sequoia users with compute cycles and an interactive option for data exploration and analysis. The system will be integrated in the first quarter of FY14 and the system is expected to be moved to the classified network by the second quarter of FY14. The goal of this milestone is to have early users running their visualization and datamore » analysis work on the Max cluster on the classified network.« less
Gorzalczany, Marian B; Rudzinski, Filip
2017-06-07
This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain--during learning--to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network--working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)--to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.
Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.
Conley, Samantha
2017-12-01
The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.
A method of using cluster analysis to study statistical dependence in multivariate data
NASA Technical Reports Server (NTRS)
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis
Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.
2012-01-01
Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360
Using cluster analysis for medical resource decision making.
Dilts, D; Khamalah, J; Plotkin, A
1995-01-01
Escalating costs of health care delivery have in the recent past often made the health care industry investigate, adapt, and apply those management techniques relating to budgeting, resource control, and forecasting that have long been used in the manufacturing sector. A strategy that has contributed much in this direction is the definition and classification of a hospital's output into "products" or groups of patients that impose similar resource or cost demands on the hospital. Existing classification schemes have frequently employed cluster analysis in generating these groupings. Unfortunately, the myriad articles and books on clustering and classification contain few formalized selection methodologies for choosing a technique for solving a particular problem, hence they often leave the novice investigator at a loss. This paper reviews the literature on clustering, particularly as it has been applied in the medical resource-utilization domain, addresses the critical choices facing an investigator in the medical field using cluster analysis, and offers suggestions (using the example of clustering low-vision patients) for how such choices can be made.
Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F; Perez, Danny
2017-10-21
Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
NASA Astrophysics Data System (ADS)
Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F.; Perez, Danny
2017-10-01
Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
NASA Astrophysics Data System (ADS)
Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.
2015-07-01
In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed yielding an explict cluster attribution for each particle, improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Freud: a software suite for high-throughput simulation analysis
NASA Astrophysics Data System (ADS)
Harper, Eric; Spellings, Matthew; Anderson, Joshua; Glotzer, Sharon
Computer simulation is an indispensable tool for the study of a wide variety of systems. As simulations scale to fill petascale and exascale supercomputing clusters, so too does the size of the data produced, as well as the difficulty in analyzing these data. We present Freud, an analysis software suite for efficient analysis of simulation data. Freud makes no assumptions about the system being analyzed, allowing for general analysis methods to be applied to nearly any type of simulation. Freud includes standard analysis methods such as the radial distribution function, as well as new methods including the potential of mean force and torque and local crystal environment analysis. Freud combines a Python interface with fast, parallel C + + analysis routines to run efficiently on laptops, workstations, and supercomputing clusters. Data analysis on clusters reduces data transfer requirements, a prohibitive cost for petascale computing. Used in conjunction with simulation software, Freud allows for smart simulations that adapt to the current state of the system, enabling the study of phenomena such as nucleation and growth, intelligent investigation of phases and phase transitions, and determination of effective pair potentials.
Cluster analysis of Southeastern U.S. climate stations
NASA Astrophysics Data System (ADS)
Stooksbury, D. E.; Michaels, P. J.
1991-09-01
A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.
Clustering stocks using partial correlation coefficients
NASA Astrophysics Data System (ADS)
Jung, Sean S.; Chang, Woojin
2016-11-01
A partial correlation analysis is performed on the Korean stock market (KOSPI). The difference between Pearson correlation and the partial correlation is analyzed and it is found that when conditioned on the market return, Pearson correlation coefficients are generally greater than those of the partial correlation, which implies that the market return tends to drive up the correlation between stock returns. A clustering analysis is then performed to study the market structure given by the partial correlation analysis and the members of the clusters are compared with the Global Industry Classification Standard (GICS). The initial hypothesis is that the firms in the same GICS sector are clustered together since they are in a similar business and environment. However, the result is inconsistent with the hypothesis and most clusters are a mix of multiple sectors suggesting that the traditional approach of using sectors to determine the proximity between stocks may not be sufficient enough to diversify a portfolio.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China.
Zou, Hui; Zou, Zhihong; Wang, Xiaojing
2015-11-12
The increase and the complexity of data caused by the uncertain environment is today's reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006-2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality.
Ogden, Lorraine G; Stroebele, Nanette; Wyatt, Holly R; Catenacci, Victoria A; Peters, John C; Stuht, Jennifer; Wing, Rena R; Hill, James O
2012-10-01
The National Weight Control Registry (NWCR) is the largest ongoing study of individuals successful at maintaining weight loss; the registry enrolls individuals maintaining a weight loss of at least 13.6 kg (30 lb) for a minimum of 1 year. The current report uses multivariate latent class cluster analysis to identify unique clusters of individuals within the NWCR that have distinct experiences, strategies, and attitudes with respect to weight loss and weight loss maintenance. The cluster analysis considers weight and health history, weight control behaviors and strategies, effort and satisfaction with maintaining weight, and psychological and demographic characteristics. The analysis includes 2,228 participants enrolled between 1998 and 2002. Cluster 1 (50.5%) represents a weight-stable, healthy, exercise conscious group who are very satisfied with their current weight. Cluster 2 (26.9%) has continuously struggled with weight since childhood; they rely on the greatest number of resources and strategies to lose and maintain weight, and report higher levels of stress and depression. Cluster 3 (12.7%) represents a group successful at weight reduction on the first attempt; they were least likely to be overweight as children, are maintaining the longest duration of weight loss, and report the least difficulty maintaining weight. Cluster 4 (9.9%) represents a group less likely to use exercise to control weight; they tend to be older, eat fewer meals, and report more health problems. Further exploration of the unique characteristics of these clusters could be useful for tailoring future weight loss and weight maintenance programs to the specific characteristics of an individual.
Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M
2016-01-01
Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.
Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc
2018-01-01
In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.
Scoring clustering solutions by their biological relevance.
Gat-Viks, I; Sharan, R; Shamir, R
2003-12-12
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
An improved K-means clustering algorithm in agricultural image segmentation
NASA Astrophysics Data System (ADS)
Cheng, Huifeng; Peng, Hui; Liu, Shanmei
Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
A latent profile analysis of Asian American men's and women's adherence to cultural values.
Wong, Y Joel; Nguyen, Chi P; Wang, Shu-Yi; Chen, Weilin; Steinfeldt, Jesse A; Kim, Bryan S K
2012-07-01
The goal of this study was to identify diverse profiles of Asian American women's and men's adherence to values that are salient in Asian cultures (i.e., conformity to norms, family recognition through achievement, emotional self-control, collectivism, and humility). To this end, the authors conducted a latent profile analysis using the 5 subscales of the Asian American Values Scale-Multidimensional in a sample of 214 Asian Americans. The analysis uncovered a four-cluster solution. In general, Clusters 1 and 2 were characterized by relatively low and moderate levels of adherence to the 5 dimensions of cultural values, respectively. Cluster 3 was characterized by the highest level of adherence to the cultural value of family recognition through achievement, whereas Cluster 4 was typified by the highest levels of adherence to collectivism, emotional self-control, and humility. Clusters 3 and 4 were associated with higher levels of depressive symptoms than Cluster 1. Furthermore, Asian American women and Asian American men had lower odds of being in Cluster 4 and Cluster 3, respectively. These findings attest to the importance of identifying specific patterns of adherence to cultural values when examining the relationship between Asian Americans' cultural orientation and mental health status.
Phylogenetic relationship of Ornithobacterium rhinotracheale strains.
DE Oca-Jimenez, Roberto Montes; Vega-Sanchez, Vicente; Morales-Erasto, Vladimir; Salgado-Miranda, Celene; Blackall, Patrick J; Soriano-Vargas, Edgardo
2018-04-10
The bacterium Ornithobacterium rhinotracheale is associated with respiratory disease in wild birds and poultry. In this study, the phylogenetic analysis of nine reference strains of O. rhinotracheale belonging to serovars A to I, and eight Mexican isolates belonging to serovar A, was performed. The analysis was extended to include available sequences from another 23 strains available in the public domain. The analysis showed that the 40 sequences formed six clusters, I to VI. All eight Mexican field isolates were placed in cluster I. One of the reference strains appears to present genetic diversity not previously recognized and was placed in a new genetic cluster. In conclusion, the phylogenetic analysis of O. rhinotracheale strains, based on the 16S rRNA gene, is a suitable tool for epidemiologic studies.
Clinical phenotypes and survival of pre-capillary pulmonary hypertension in systemic sclerosis.
Launay, David; Montani, David; Hassoun, Paul M; Cottin, Vincent; Le Pavec, Jérôme; Clerson, Pierre; Sitbon, Olivier; Jaïs, Xavier; Savale, Laurent; Weatherald, Jason; Sobanski, Vincent; Mathai, Stephen C; Shafiq, Majid; Cordier, Jean-François; Hachulla, Eric; Simonneau, Gérald; Humbert, Marc
2018-01-01
Pre-capillary pulmonary hypertension (PH) in systemic sclerosis (SSc) is a heterogeneous condition with an overall bad prognosis. The objective of this study was to identify and characterize homogeneous phenotypes by a cluster analysis in SSc patients with PH. Patients were identified from two prospective cohorts from the US and France. Clinical, pulmonary function, high-resolution chest tomography, hemodynamic and survival data were extracted. We performed cluster analysis using the k-means method and compared survival between clusters using Cox regression analysis. Cluster analysis of 200 patients identified four homogenous phenotypes. Cluster C1 included patients with mild to moderate risk pulmonary arterial hypertension (PAH) with limited or no interstitial lung disease (ILD) and low DLCO with a 3-year survival of 81.5% (95% CI: 71.4-88.2). C2 had pre-capillary PH due to extensive ILD and worse 3-year survival compared to C1 (adjusted hazard ratio [HR] 3.14; 95% CI 1.66-5.94; p = 0.0004). C3 had severe PAH and a trend towards worse survival (HR 2.53; 95% CI 0.99-6.49; p = 0.052). Cluster C4 and C1 were similar with no difference in survival (HR 0.65; 95% CI 0.19-2.27, p = 0.507) but with a higher DLCO in C4. PH in SSc can be characterized into distinct clusters that differ in prognosis.
Snell, Deborah L; Surgenor, Lois J; Hay-Smith, E Jean C; Williman, Jonathan; Siegert, Richard J
2015-01-01
Outcomes after mild traumatic brain injury (MTBI) vary, with slow or incomplete recovery for a significant minority. This study examines whether groups of cases with shared psychological factors but with different injury outcomes could be identified using cluster analysis. This is a prospective observational study following 147 adults presenting to a hospital-based emergency department or concussion services in Christchurch, New Zealand. This study examined associations between baseline demographic, clinical, psychological variables (distress, injury beliefs and symptom burden) and outcome 6 months later. A two-step approach to cluster analysis was applied (Ward's method to identify clusters, K-means to refine results). Three meaningful clusters emerged (high-adapters, medium-adapters, low-adapters). Baseline cluster-group membership was significantly associated with outcomes over time. High-adapters appeared recovered by 6-weeks and medium-adapters revealed improvements by 6-months. The low-adapters continued to endorse many symptoms, negative recovery expectations and distress, being significantly at risk for poor outcome more than 6-months after injury (OR (good outcome) = 0.12; CI = 0.03-0.53; p < 0.01). Cluster analysis supported the notion that groups could be identified early post-injury based on psychological factors, with group membership associated with differing outcomes over time. Implications for clinical care providers regarding therapy targets and cases that may benefit from different intensities of intervention are discussed.
Development of small scale cluster computer for numerical analysis
NASA Astrophysics Data System (ADS)
Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.
2017-09-01
In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
A nonparametric clustering technique which estimates the number of clusters
NASA Technical Reports Server (NTRS)
Ramey, D. B.
1983-01-01
In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven
2017-01-01
Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313
Connick, Mark J; Beckman, Emma; Vanlandewijck, Yves; Malone, Laurie A; Blomqvist, Sven; Tweedy, Sean M
2017-11-25
The Para athletics wheelchair-racing classification system employs best practice to ensure that classes comprise athletes whose impairments cause a comparable degree of activity limitation. However, decision-making is largely subjective and scientific evidence which reduces this subjectivity is required. To evaluate whether isometric strength tests were valid for the purposes of classifying wheelchair racers and whether cluster analysis of the strength measures produced a valid classification structure. Thirty-two international level, male wheelchair racers from classes T51-54 completed six isometric strength tests evaluating elbow extensors, shoulder flexors, trunk flexors and forearm pronators and two wheelchair performance tests-Top-Speed (0-15 m) and Top-Speed (absolute). Strength tests significantly correlated with wheelchair performance were included in a cluster analysis and the validity of the resulting clusters was assessed. All six strength tests correlated with performance (r=0.54-0.88). Cluster analysis yielded four clusters with reasonable overall structure (mean silhouette coefficient=0.58) and large intercluster strength differences. Six athletes (19%) were allocated to clusters that did not align with their current class. While the mean wheelchair racing performance of the resulting clusters was unequivocally hierarchical, the mean performance of current classes was not, with no difference between current classes T53 and T54. Cluster analysis of isometric strength tests produced classes comprising athletes who experienced a similar degree of activity limitation. The strength tests reported can provide the basis for a new, more transparent, less subjective wheelchair racing classification system, pending replication of these findings in a larger, representative sample. This paper also provides guidance for development of evidence-based systems in other Para sports. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Applications of modern statistical methods to analysis of data in physical science
NASA Astrophysics Data System (ADS)
Wicker, James Eric
Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook
NASA Astrophysics Data System (ADS)
Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.
2012-12-01
The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Assessment of cluster yield components by image analysis.
Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose
2015-04-01
Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
A Taxonomic Approach to the Gestalt Theory of Perls
ERIC Educational Resources Information Center
Raming, Henry E.; Frey, David H.
1974-01-01
This study applied content analysis and cluster analysis to the ideas of Fritz Perls to develop a taxonomy of Gestalt processes and goals. Summaries of the typal groups or clusters were written and the implications of taxonomic research in counseling discussed. (Author)
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
Presentation on systems cluster research
NASA Technical Reports Server (NTRS)
Morgenthaler, George W.
1989-01-01
This viewgraph presentation presents an overview of systems cluster research performed by the Center for Space Construction. The goals of the research are to develop concepts, insights, and models for space construction and to develop systems engineering/analysis curricula for training future aerospace engineers. The following topics are covered: CSC systems analysis/systems engineering (SIMCON) model, CSC systems cluster schedule, system life-cycle, model optimization techniques, publications, cooperative efforts, and sponsored research.
Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...
2017-06-06
There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less
Tse, Herman; Chen, Jonathan H.K.; Tang, Ying; Lau, Susanna K.P.; Woo, Patrick C.Y.
2014-01-01
Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the “sanguinis group.” As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the “mitis group.” On the basis of the findings, we propose a novel group, named “sinensis group,” to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. PMID:25331233
Teng, Jade L L; Huang, Yi; Tse, Herman; Chen, Jonathan H K; Tang, Ying; Lau, Susanna K P; Woo, Patrick C Y
2014-10-20
Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the "sanguinis group." As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the "mitis group." On the basis of the findings, we propose a novel group, named "sinensis group," to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
A New Classification of Diabetic Gait Pattern Based on Cluster Analysis of Biomechanical Data
Sawacha, Zimi; Guarneri, Gabriella; Avogaro, Angelo; Cobelli, Claudio
2010-01-01
Background The diabetic foot, one of the most serious complications of diabetes mellitus and a major risk factor for plantar ulceration, is determined mainly by peripheral neuropathy. Neuropathic patients exhibit decreased stability while standing as well as during dynamic conditions. A new methodology for diabetic gait pattern classification based on cluster analysis has been proposed that aims to identify groups of subjects with similar patterns of gait and verify if three-dimensional gait data are able to distinguish diabetic gait patterns from one of the control subjects. Method The gait of 20 nondiabetic individuals and 46 diabetes patients with and without peripheral neuropathy was analyzed [mean age 59.0 (2.9) and 61.1(4.4) years, mean body mass index (BMI) 24.0 (2.8), and 26.3 (2.0)]. K-means cluster analysis was applied to classify the subjects' gait patterns through the analysis of their ground reaction forces, joints and segments (trunk, hip, knee, ankle) angles, and moments. Results Cluster analysis classification led to definition of four well-separated clusters: one aggregating just neuropathic subjects, one aggregating both neuropathics and non-neuropathics, one including only diabetes patients, and one including either controls or diabetic and neuropathic subjects. Conclusions Cluster analysis was useful in grouping subjects with similar gait patterns and provided evidence that there were subgroups that might otherwise not be observed if a group ensemble was presented for any specific variable. In particular, we observed the presence of neuropathic subjects with a gait similar to the controls and diabetes patients with a long disease duration with a gait as altered as the neuropathic one. PMID:20920432
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
NASA Astrophysics Data System (ADS)
Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.
2017-07-01
Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.
Graph-Theoretic Analysis of Monomethyl Phosphate Clustering in Ionic Solutions.
Han, Kyungreem; Venable, Richard M; Bryant, Anne-Marie; Legacy, Christopher J; Shen, Rong; Li, Hui; Roux, Benoît; Gericke, Arne; Pastor, Richard W
2018-02-01
All-atom molecular dynamics simulations combined with graph-theoretic analysis reveal that clustering of monomethyl phosphate dianion (MMP 2- ) is strongly influenced by the types and combinations of cations in the aqueous solution. Although Ca 2+ promotes the formation of stable and large MMP 2- clusters, K + alone does not. Nonetheless, clusters are larger and their link lifetimes are longer in mixtures of K + and Ca 2+ . This "synergistic" effect depends sensitively on the Lennard-Jones interaction parameters between Ca 2+ and the phosphorus oxygen and correlates with the hydration of the clusters. The pronounced MMP 2- clustering effect of Ca 2+ in the presence of K + is confirmed by Fourier transform infrared spectroscopy. The characterization of the cation-dependent clustering of MMP 2- provides a starting point for understanding cation-dependent clustering of phosphoinositides in cell membranes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Belles, Randy J.; Omitaomu, Olufemi A.
2014-09-01
Geographic information systems (GIS) technology was applied to analyze federal energy demand across the contiguous US. Several federal energy clusters were previously identified, including Hampton Roads, Virginia, which was subsequently studied in detail. This study provides an analysis of three additional diverse federal energy clusters. The analysis shows that there are potential sites in various federal energy clusters that could be evaluated further for placement of an integral pressurized-water reactor (iPWR) to support meeting federal clean energy goals.
RSAT 2015: Regulatory Sequence Analysis Tools
Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques
2015-01-01
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review
Morris, Tom; Gray, Laura
2017-01-01
Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637
Bible, Joe; Beck, James D.; Datta, Somnath
2016-01-01
Summary Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of inference drawn on observed data. Much work has been done in order to address the analysis of clustered data with informative cluster size; examples include Inverse Probability Weighting (IPW), Cluster Weighted Generalized Estimating Equations (CWGEE), and Doubly Weighted Generalized Estimating Equations (DWGEE). When cluster size changes with time, i.e., the data set possess temporally varying cluster sizes (TVCS), these methods may produce biased inference for the underlying marginal distribution of interest. We propose a new marginalization that may be appropriate for addressing clustered longitudinal data with TVCS. The principal motivation for our present work is to analyze the periodontal data collected by Beck et al. (1997, Journal of Periodontal Research 6, 497–505). Longitudinal periodontal data often exhibits both ICS and TVCS as the number of teeth possessed by participants at the onset of study is not constant and teeth as well as individuals may be displaced throughout the study. PMID:26682911
Descriptive Statistics and Cluster Analysis for Extreme Rainfall in Java Island
NASA Astrophysics Data System (ADS)
E Komalasari, K.; Pawitan, H.; Faqih, A.
2017-03-01
This study aims to describe regional pattern of extreme rainfall based on maximum daily rainfall for period 1983 to 2012 in Java Island. Descriptive statistics analysis was performed to obtain centralization, variation and distribution of maximum precipitation data. Mean and median are utilized to measure central tendency data while Inter Quartile Range (IQR) and standard deviation are utilized to measure variation of data. In addition, skewness and kurtosis used to obtain shape the distribution of rainfall data. Cluster analysis using squared euclidean distance and ward method is applied to perform regional grouping. Result of this study show that mean (average) of maximum daily rainfall in Java Region during period 1983-2012 is around 80-181mm with median between 75-160mm and standard deviation between 17 to 82. Cluster analysis produces four clusters and show that western area of Java tent to have a higher annual maxima of daily rainfall than northern area, and have more variety of annual maximum value.
Orsi, Rebecca
2017-02-01
Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ye, Weimin; Robbins, R. T.
2004-01-01
Hierarchical cluster analysis based on female morphometric character means including body length, distance from vulva opening to anterior end, head width, odontostyle length, esophagus length, body width, tail length, and tail width were used to examine the morphometric relationships and create dendrograms for (i) 62 populations belonging to 9 Longidorus species from Arkansas, (ii) 137 published Longidorus species, and (iii) 137 published Longidorus species plus 86 populations of 16 Longidorus species from Arkansas and various other locations by using JMP 4.02 software (SAS Institute, Cary, NC). Cluster analysis dendograms visually illustrated the grouping and morphometric relationships of the species and populations. It provided a computerized statistical approach to assist by helping to identify and distinguish species, by indicating morphometric relationships among species, and by assisting with new species diagnosis. The preliminary species identification can be accomplished by running cluster analysis for unknown species together with the data matrix of known published Longidorus species. PMID:19262809
Cluster signal-to-noise analysis for evaluation of the information content in an image.
Weerawanich, Warangkana; Shimizu, Mayumi; Takeshita, Yohei; Okamura, Kazutoshi; Yoshida, Shoko; Yoshiura, Kazunori
2018-01-01
(1) To develop an observer-free method of analysing image quality related to the observer performance in the detection task and (2) to analyse observer behaviour patterns in the detection of small mass changes in cone-beam CT images. 13 observers detected holes in a Teflon phantom in cone-beam CT images. Using the same images, we developed a new method, cluster signal-to-noise analysis, to detect the holes by applying various cut-off values using ImageJ and reconstructing cluster signal-to-noise curves. We then evaluated the correlation between cluster signal-to-noise analysis and the observer performance test. We measured the background noise in each image to evaluate the relationship with false positive rates (FPRs) of the observers. Correlations between mean FPRs and intra- and interobserver variations were also evaluated. Moreover, we calculated true positive rates (TPRs) and accuracies from background noise and evaluated their correlations with TPRs from observers. Cluster signal-to-noise curves were derived in cluster signal-to-noise analysis. They yield the detection of signals (true holes) related to noise (false holes). This method correlated highly with the observer performance test (R 2 = 0.9296). In noisy images, increasing background noise resulted in higher FPRs and larger intra- and interobserver variations. TPRs and accuracies calculated from background noise had high correlation with actual TPRs from observers; R 2 was 0.9244 and 0.9338, respectively. Cluster signal-to-noise analysis can simulate the detection performance of observers and thus replace the observer performance test in the evaluation of image quality. Erroneous decision-making increased with increasing background noise.
Comparison of Salmonella enteritidis phage types isolated from layers and humans in Belgium in 2005.
Welby, Sarah; Imberechts, Hein; Riocreux, Flavien; Bertrand, Sophie; Dierick, Katelijne; Wildemauwe, Christa; Hooyberghs, Jozef; Van der Stede, Yves
2011-08-01
The aim of this study was to investigate the available results for Belgium of the European Union coordinated monitoring program (2004/665 EC) on Salmonella in layers in 2005, as well as the results of the monthly outbreak reports of Salmonella Enteritidis in humans in 2005 to identify a possible statistical significant trend in both populations. Separate descriptive statistics and univariate analysis were carried out and the parametric and/or non-parametric hypothesis tests were conducted. A time cluster analysis was performed for all Salmonella Enteritidis phage types (PTs) isolated. The proportions of each Salmonella Enteritidis PT in layers and in humans were compared and the monthly distribution of the most common PT, isolated in both populations, was evaluated. The time cluster analysis revealed significant clusters during the months May and June for layers and May, July, August, and September for humans. PT21, the most frequently isolated PT in both populations in 2005, seemed to be responsible of these significant clusters. PT4 was the second most frequently isolated PT. No significant difference was found for the monthly trend evolution of both PT in both populations based on parametric and non-parametric methods. A similar monthly trend of PT distribution in humans and layers during the year 2005 was observed. The time cluster analysis and the statistical significance testing confirmed these results. Moreover, the time cluster analysis showed significant clusters during the summer time and slightly delayed in time (humans after layers). These results suggest a common link between the prevalence of Salmonella Enteritidis in layers and the occurrence of the pathogen in humans. Phage typing was confirmed to be a useful tool for identifying temporal trends.
Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro
2014-01-01
The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631
Jarboe, G R; Gates, R H; McDaniel, C D
1990-01-01
Healthcare providers of multiple option plans may be confronted with special market segmentation problems. This study demonstrates how cluster analysis may be used for discovering distinct patterns of preference for multiple option plans. The availability of metric, as opposed to categorical or ordinal, data provides the ability to use sophisticated analysis techniques which may be superior to frequency distributions and cross-tabulations in revealing preference patterns.
Fuzzy cluster analysis of air quality in Beijing district
NASA Astrophysics Data System (ADS)
Liu, Hongkai
2018-02-01
The principle of fuzzy clustering analysis is applied in this article, by using the method of transitive closure, the main air pollutants in 17 districts of Beijing from 2014 to 2016 were classified. The results of the analysis reflects the nearly three year’s changes of the main air pollutants in Beijing. This can provide the scientific for atmospheric governance in the Beijing area and digital support.
NASA Astrophysics Data System (ADS)
Hozé, Nathanaël; Holcman, David
2012-01-01
We develop a coagulation-fragmentation model to study a system composed of a small number of stochastic objects moving in a confined domain, that can aggregate upon binding to form local clusters of arbitrary sizes. A cluster can also dissociate into two subclusters with a uniform probability. To study the statistics of clusters, we combine a Markov chain analysis with a partition number approach. Interestingly, we obtain explicit formulas for the size and the number of clusters in terms of hypergeometric functions. Finally, we apply our analysis to study the statistical physics of telomeres (ends of chromosomes) clustering in the yeast nucleus and show that the diffusion-coagulation-fragmentation process can predict the organization of telomeres.
Determining the trophic guilds of fishes and macroinvertebrates in a seagrass food web
Luczkovich, J.J.; Ward, G.P.; Johnson, J.C.; Christian, R.R.; Baird, D.; Neckles, H.; Rizzo, W.M.
2002-01-01
We established trophic guilds of macroinvertebrate and fish taxa using correspondence analysis and a hierarchical clustering strategy for a seagrass food web in winter in the northeastern Gulf of Mexico. To create the diet matrix, we characterized the trophic linkages of macroinvertebrate and fish taxa present in Halodule wrightii seagrass habitat areas within the St. Marks National Wildlife Refuge (Florida) using binary data, combining dietary links obtained from relevant literature for macroinvertebrates with stomach analysis of common fishes collected during January and February of 1994. Heirarchical average-linkage cluster analysis of the 73 taxa of fishes and macroinvertebrates in the diet matrix yielded 14 clusters with diet similarity ??? 0.60. We then used correspondence analysis with three factors to jointly plot the coordinates of the consumers (identified by cluster membership) and of the 33 food sources. Correspondence analysis served as a visualization tool for assigning each taxon to one of eight trophic guilds: herbivores, detritivores, suspension feeders, omnivores, molluscivores, meiobenthos consumers, macrobenthos consumers, and piscivores. These trophic groups, cross-classified with major taxonomic groups, were further used to develop consumer compartments in a network analysis model of carbon flow in this seagrass ecosystem. The method presented here should greatly improve the development of future network models of food webs by providing an objective procedure for aggregating trophic groups.
Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian
2015-01-01
In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.
Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F
2017-02-01
Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.
Egorov, A D; Stepantsov, V I; Nosovskiĭ, A M; Shipov, A A
2009-01-01
Cluster analysis was applied to evaluate locomotion training (running and running intermingled with walking) of 13 cosmonauts on long-term ISS missions by the parameters of duration (min), distance (m) and intensity (km/h). Based on the results of analyses, the cosmonauts were distributed into three steady groups of 2, 5 and 6 persons. Distance and speed showed a statistical rise (p < 0.03) from group 1 to group 3. Duration of physical locomotion training was not statistically different in the groups (p = 0.125). Therefore, cluster analysis is an adequate method of evaluating fitness of cosmonauts on long-term missions.
Samsir, Sri A'jilah; Bunawan, Hamidun; Yen, Choong Chee; Noor, Normah Mohd
2016-09-01
In this dataset, we distinguish 15 accessions of Garcinia mangostana from Peninsular Malaysia using Fourier transform-infrared spectroscopy coupled with chemometric analysis. We found that the position and intensity of characteristic peaks at 3600-3100 cm(-) (1) in IR spectra allowed discrimination of G. mangostana from different locations. Further principal component analysis (PCA) of all the accessions suggests the two main clusters were formed: samples from Johor, Melaka, and Negeri Sembilan (South) were clustered together in one group while samples from Perak, Kedah, Penang, Selangor, Kelantan, and Terengganu (North and East Coast) were in another clustered group.
Ratinaud, Pierre; Andersson, Gerhard
2018-01-01
Background When people with health conditions begin to manage their health issues, one important issue that emerges is the question as to what exactly do they do with the information that they have obtained through various sources (eg, news media, social media, health professionals, friends, and family). The information they gather helps form their opinions and, to some degree, influences their attitudes toward managing their condition. Objective This study aimed to understand how tinnitus is represented in the US newspaper media and in Facebook pages (ie, social media) using text pattern analysis. Methods This was a cross-sectional study based upon secondary analyses of publicly available data. The 2 datasets (ie, text corpuses) analyzed in this study were generated from US newspaper media during 1980-2017 (downloaded from the database US Major Dailies by ProQuest) and Facebook pages during 2010-2016. The text corpuses were analyzed using the Iramuteq software using cluster analysis and chi-square tests. Results The newspaper dataset had 432 articles. The cluster analysis resulted in 5 clusters, which were named as follows: (1) brain stimulation (26.2%), (2) symptoms (13.5%), (3) coping (19.8%), (4) social support (24.2%), and (5) treatment innovation (16.4%). A time series analysis of clusters indicated a change in the pattern of information presented in newspaper media during 1980-2017 (eg, more emphasis on cluster 5, focusing on treatment inventions). The Facebook dataset had 1569 texts. The cluster analysis resulted in 7 clusters, which were named as: (1) diagnosis (21.9%), (2) cause (4.1%), (3) research and development (13.6%), (4) social support (18.8%), (5) challenges (11.1%), (6) symptoms (21.4%), and (7) coping (9.2%). A time series analysis of clusters indicated no change in information presented in Facebook pages on tinnitus during 2011-2016. Conclusions The study highlights the specific aspects about tinnitus that the US newspaper media and Facebook pages focus on, as well as how these aspects change over time. These findings can help health care providers better understand the presuppositions that tinnitus patients may have. More importantly, the findings can help public health experts and health communication experts in tailoring health information about tinnitus to promote self-management, as well as assisting in appropriate choices of treatment for those living with tinnitus. PMID:29739734
Šonka, Karel; Šusta, Marek; Billiard, Michel
2015-02-01
The successive editions of the International Classification of Sleep Disorders (ICSD) reflect the evolution of the concepts of various sleep disorders. This is particularly the case for central disorders of hypersomnolence, with continuous changes in terminology and divisions of narcolepsy, idiopathic hypersomnia, and recurrent hypersomnia. According to the ICSD 2nd Edition (ICSD-2), narcolepsy with cataplexy (NwithC), narcolepsy without cataplexy (Nw/oC), idiopathic hypersomnia with long sleep time (IHwithLST), and idiopathic hypersomnia without long sleep time (IHw/oLST) are four, well-defined hypersomnias of central origin. However, in the absence of biological markers, doubts have been raised as to the relevance of a division of idiopathic hypersomnia into two forms, and it is not yet clear whether Nw/oC and IHw/oLST are two distinct entities. With this in mind, it was decided to empirically review the ICSD-2 classification by using a hierarchical cluster analysis to see whether this division has some relevance, even though the terms "with long sleep time" and "without long sleep time" are inappropriate. The cluster analysis differentiated three main clusters: Cluster 1, "combined monosymptomatic hypersomnia/narcolepsy type 2" (people initially diagnosed with IHw/oLST and Nw/oC); Cluster 2 "polysymptomatic hypersomnia" (people initially diagnosed with IHwithLST); and Cluster 3, narcolepsy type 1 (people initially diagnosed with NwithC). Cluster analysis confirmed that narcolepsy type 1 and polysymptomatic hypersomnia are independent sleep disorders. People who were initially diagnosed with Nw/oC and IHw/oLST formed a single cluster, referred to as "combined monosymptomatic hypersomnia/narcolepsy type 2." Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Fučkar, Neven-Stjepan; Guemas, Virginie; Massonnet, François; Doblas-Reyes, Francisco
2015-04-01
Over the modern observational era, the northern hemisphere sea ice concentration, age and thickness have experienced a sharp long-term decline superimposed with strong internal variability. Hence, there is a crucial need to identify robust patterns of Arctic sea ice variability on interannual timescales and disentangle them from the long-term trend in noisy datasets. The principal component analysis (PCA) is a versatile and broadly used method for the study of climate variability. However, the PCA has several limiting aspects because it assumes that all modes of variability have symmetry between positive and negative phases, and suppresses nonlinearities by using a linear covariance matrix. Clustering methods offer an alternative set of dimension reduction tools that are more robust and capable of taking into account possible nonlinear characteristics of a climate field. Cluster analysis aggregates data into groups or clusters based on their distance, to simultaneously minimize the distance between data points in a given cluster and maximize the distance between the centers of the clusters. We extract modes of Arctic interannual sea-ice variability with nonhierarchical K-means cluster analysis and investigate the mechanisms leading to these modes. Our focus is on the sea ice thickness (SIT) as the base variable for clustering because SIT holds most of the climate memory for variability and predictability on interannual timescales. We primarily use global reconstructions of sea ice fields with a state-of-the-art ocean-sea-ice model, but we also verify the robustness of determined clusters in other Arctic sea ice datasets. Applied cluster analysis over the 1958-2013 period shows that the optimal number of detrended SIT clusters is K=3. Determined SIT cluster patterns and their time series of occurrence are rather similar between different seasons and months. Two opposite thermodynamic modes are characterized with prevailing negative or positive SIT anomalies over the Arctic basin. The intermediate mode, with negative anomalies centered on the East Siberian shelf and positive anomalies along the North American side of the basin, has predominately dynamic characteristics. The associated sea ice concentration (SIC) clusters vary more between different seasons and months, but the SIC patterns are physically framed by the SIT cluster patterns.
Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C
2011-04-01
The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF clusters. © Georg Thieme Verlag KG Stuttgart · New York.
Person mobility in the design and analysis of cluster-randomized cohort prevention trials.
Vuchinich, Sam; Flay, Brian R; Aber, Lawrence; Bickman, Leonard
2012-06-01
Person mobility is an inescapable fact of life for most cluster-randomized (e.g., schools, hospitals, clinic, cities, state) cohort prevention trials. Mobility rates are an important substantive consideration in estimating the effects of an intervention. In cluster-randomized trials, mobility rates are often correlated with ethnicity, poverty and other variables associated with disparity. This raises the possibility that estimated intervention effects may generalize to only the least mobile segments of a population and, thus, create a threat to external validity. Such mobility can also create threats to the internal validity of conclusions from randomized trials. Researchers must decide how to deal with persons who leave study clusters during a trial (dropouts), persons and clusters that do not comply with an assigned intervention, and persons who enter clusters during a trial (late entrants), in addition to the persons who remain for the duration of a trial (stayers). Statistical techniques alone cannot solve the key issues of internal and external validity raised by the phenomenon of person mobility. This commentary presents a systematic, Campbellian-type analysis of person mobility in cluster-randomized cohort prevention trials. It describes four approaches for dealing with dropouts, late entrants and stayers with respect to data collection, analysis and generalizability. The questions at issue are: 1) From whom should data be collected at each wave of data collection? 2) Which cases should be included in the analyses of an intervention effect? and 3) To what populations can trial results be generalized? The conclusions lead to recommendations for the design and analysis of future cluster-randomized cohort prevention trials.
NASA Technical Reports Server (NTRS)
Ballew, G.
1977-01-01
The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.
Identification and functional analysis of the aspergillic acid gene cluster in Aspergillus flavus
USDA-ARS?s Scientific Manuscript database
Aspergillus flavus can colonize important food staples and produces aflatoxins, toxic and carcinogenic secondary metabolites. In silico analysis of the A. flavus genome revealed 56 gene clusters encoding for secondary metabolites. How these many of these metabolites affect fungal development, surviv...
Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken; ...
2016-11-29
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
Cluster analysis as a prediction tool for pregnancy outcomes.
Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L
2015-03-01
Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.
Delpla, Ianis; Florea, Mihai; Pelletier, Geneviève; Rodriguez, Manuel J
2018-06-04
Trihalomethanes (THMs) and Haloacetic Acids (HAAs) are the main groups detected in drinking water and are consequently strictly regulated. However, the increasing quantity of data for disinfection byproducts (DBPs) produced from research projects and regulatory programs remains largely unexploited, despite a great potential for its use in optimizing drinking water quality monitoring to meet specific objectives. In this work, we developed a procedure to optimize locations and periods for DBPs monitoring based on a set of monitoring scenarios using the cluster analysis technique. The optimization procedure used a robust set of spatio-temporal monitoring results on DBPs (THMs and HAAs) generated from intensive sampling campaigns conducted in a residential sector of a water distribution system. Results shows that cluster analysis allows for the classification of water quality in different groups of THMs and HAAs according to their similarities, and the identification of locations presenting water quality concerns. By using cluster analysis with different monitoring objectives, this work provides a set of monitoring solutions and a comparison between various monitoring scenarios for decision-making purposes. Finally, it was demonstrated that the data from intensive monitoring of free chlorine residual and water temperature as DBP proxy parameters, when processed using cluster analysis, could also help identify the optimal sampling points and periods for regulatory THMs and HAAs monitoring. Copyright © 2018 Elsevier Ltd. All rights reserved.
López-Contreras, María José; López, Maria Ángeles; Canteras, Manuel; Candela, María Emilia; Zamora, Salvador; Pérez-Llamas, Francisca
2014-03-01
To apply a cluster analysis to groups of individuals of similar characteristics in an attempt to identify undernutrition or the risk of undernutrition in this population. A cross-sectional study. Seven public nursing homes in the province of Murcia, on the Mediterranean coast of Spain. 205 subjects aged 65 and older (131 women and 74 men). Dietary intake (energy and nutrients), anthropometric (body mass index, skinfold thickness, mid-arm muscle circumference, mid-arm muscle area, corrected arm muscle area, waist to hip ratio) and biochemical and haematological (serum albumin, transferrin, total cholesterol, total lymphocyte count). Variables were analyzed by cluster analysis. The results of the cluster analysis, including intake, anthropometric and analytical data showed that, of the 205 elderly subjects, 66 (32.2%) were over - weight/obese, 72 (35.1%) had an adequate nutritional status and 67 (32.7%) were undernourished or at risk of undernutrition. The undernourished or at risk of undernutrition group showed the lowest values for dietary intake and the anthropometric and analytical parameters measured. Our study shows that cluster analysis is a useful statistical method for assessing the nutritional status of institutionalized elderly populations. In contrast, use of the specific reference values frequently described in the literature might fail to detect real cases of undernourishment or those at risk of undernutrition. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
Yin, Yihang; Liu, Fengzheng; Zhou, Xiang; Li, Quanzhong
2015-08-07
Wireless sensor networks (WSNs) have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA). First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.
Cluster and constraint analysis in tetrahedron packings
NASA Astrophysics Data System (ADS)
Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang
2015-04-01
The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.
Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G
2012-10-01
The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Amirnasr, Elham
It is widely recognized that nonwoven basis weight non-uniformity affects various properties of nonwovens. However, few studies can be found in this topic. The development of uniformity definition and measurement methods and the study of their impact on various web properties such as filtration properties and air permeability would be beneficial both in industrial applications and in academia. They can be utilized as a quality control tool and would provide insights about nonwoven behaviors that cannot be solely explained by average values. Therefore, for quantifying nonwoven web basis weight uniformity we purse to develop an optical analytical tool. The quadrant method and clustering analysis was utilized in an image analysis scheme to help define "uniformity" and its spatial variation. Implementing the quadrant method in an image analysis system allows the establishment of a uniformity index that can be used to quantify the degree of uniformity. Clustering analysis has also been modified and verified using uniform and random simulated images with known parameters. Number of clusters and cluster properties such as cluster size, member and density was determined. We also utilized this new measurement method to evaluate uniformity of nonwovens produced with different processes and investigated impacts of uniformity on filtration and permeability. The results of quadrant method shows that uniformity index computed from quadrant method demonstrate a good range for non-uniformity of nonwoven webs. Clustering analysis is also been applied on reference nonwoven with known visual uniformity. From clustering analysis results, cluster size is promising to be used as uniformity parameter. It is been shown that non-uniform nonwovens has provide lager cluster size than uniform nonwovens. It was been tried to find a relationship between web properties and uniformity index (as a web characteristic). To achieve this, filtration properties, air permeability, solidity and uniformity index of meltblown and spunbond samples was measured. Results for filtration test show some deviation between theoretical and experimental filtration efficiency by considering different types of fiber diameter. This deviation can occur due to variation in basis weight non-uniformity. So an appropriate theory is required to predict the variation of filtration efficiency with respect to non-uniformity of nonwoven filter media. And the results for air permeability test showed that uniformity index determined by quadrant method and measured properties have some relationship. In the other word, air permeability decreases as uniformity index on nonwoven web increase.
Andridge, Rebecca. R.
2011-01-01
In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. PMID:21259309
The application of cluster analysis in the intercomparison of loop structures in RNA.
Huang, Hung-Chung; Nagaswamy, Uma; Fox, George E
2005-04-01
We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.
The application of cluster analysis in the intercomparison of loop structures in RNA
HUANG, HUNG-CHUNG; NAGASWAMY, UMA; FOX, GEORGE E.
2005-01-01
We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence. PMID:15769871
Chen, Ling; Feng, Yanqin; Sun, Jianguo
2017-10-01
This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
Patterns of victimization between and within peer clusters in a high school social network.
Swartz, Kristin; Reyns, Bradford W; Wilcox, Pamela; Dunham, Jessica R
2012-01-01
This study presents a descriptive analysis of patterns of violent victimization between and within the various cohesive clusters of peers comprising a sample of more than 500 9th-12th grade students from one high school. Social network analysis techniques provide a visualization of the overall friendship network structure and allow for the examination of variation in victimization across the various peer clusters within the larger network. Social relationships among clusters with varying levels of victimization are also illustrated so as to provide a sense of possible spatial clustering or diffusion of victimization across proximal peer clusters. Additionally, to provide a sense of the sorts of peer clusters that support (or do not support) victimization, characteristics of clusters at both the high and low ends of the victimization scale are discussed. Finally, several of the peer clusters at both the high and low ends of the victimization continuum are "unpacked", allowing examination of within-network individual-level differences in victimization for these select clusters.
MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence
Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta
2014-01-01
Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499
Ortholog-based screening and identification of genes related to intracellular survival.
Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin
2018-04-20
Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637
Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis
NASA Astrophysics Data System (ADS)
Yen, Chi-Fu; Sivasankar, Sanjeevi
2018-03-01
Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China
Zou, Hui; Zou, Zhihong; Wang, Xiaojing
2015-01-01
The increase and the complexity of data caused by the uncertain environment is today’s reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006–2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality. PMID:26569283
[A spatial adaptive algorithm for endmember extraction on multispectral remote sensing image].
Zhu, Chang-Ming; Luo, Jian-Cheng; Shen, Zhan-Feng; Li, Jun-Li; Hu, Xiao-Dong
2011-10-01
Due to the problem that the convex cone analysis (CCA) method can only extract limited endmember in multispectral imagery, this paper proposed a new endmember extraction method by spatial adaptive spectral feature analysis in multispectral remote sensing image based on spatial clustering and imagery slice. Firstly, in order to remove spatial and spectral redundancies, the principal component analysis (PCA) algorithm was used for lowering the dimensions of the multispectral data. Secondly, iterative self-organizing data analysis technology algorithm (ISODATA) was used for image cluster through the similarity of the pixel spectral. And then, through clustering post process and litter clusters combination, we divided the whole image data into several blocks (tiles). Lastly, according to the complexity of image blocks' landscape and the feature of the scatter diagrams analysis, the authors can determine the number of endmembers. Then using hourglass algorithm extracts endmembers. Through the endmember extraction experiment on TM multispectral imagery, the experiment result showed that the method can extract endmember spectra form multispectral imagery effectively. What's more, the method resolved the problem of the amount of endmember limitation and improved accuracy of the endmember extraction. The method has provided a new way for multispectral image endmember extraction.
NASA Astrophysics Data System (ADS)
Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.
2015-03-01
Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.
Sherman, Recinda L; Henry, Kevin A; Tannenbaum, Stacey L; Feaster, Daniel J; Kobetz, Erin; Lee, David J
2014-03-20
Epidemiologists are gradually incorporating spatial analysis into health-related research as geocoded cases of disease become widely available and health-focused geospatial computer applications are developed. One health-focused application of spatial analysis is cluster detection. Using cluster detection to identify geographic areas with high-risk populations and then screening those populations for disease can improve cancer control. SaTScan is a free cluster-detection software application used by epidemiologists around the world to describe spatial clusters of infectious and chronic disease, as well as disease vectors and risk factors. The objectives of this article are to describe how spatial analysis can be used in cancer control to detect geographic areas in need of colorectal cancer screening intervention, identify issues commonly encountered by SaTScan users, detail how to select the appropriate methods for using SaTScan, and explain how method selection can affect results. As an example, we used various methods to detect areas in Florida where the population is at high risk for late-stage diagnosis of colorectal cancer. We found that much of our analysis was underpowered and that no single method detected all clusters of statistical or public health significance. However, all methods detected 1 area as high risk; this area is potentially a priority area for a screening intervention. Cluster detection can be incorporated into routine public health operations, but the challenge is to identify areas in which the burden of disease can be alleviated through public health intervention. Reliance on SaTScan's default settings does not always produce pertinent results.
First CCD UBVI photometric analysis of six open cluster candidates
NASA Astrophysics Data System (ADS)
Piatti, A. E.; Clariá, J. J.; Ahumada, A. V.
2011-04-01
We have obtained CCD UBVIKC photometry down to V ˜ 22 for the open cluster candidates Haffner 3, Haffner 5, NGC 2368, Haffner 25, Hogg 3 and Hogg 4 and their surrounding fields. None of these objects have been photometrically studied so far. Our analysis shows that these stellar groups are not genuine open clusters since no clear main sequences or other meaningful features can be seen in their colour-magnitude and colour-colour diagrams. We checked for possible differential reddening across the studied fields that could be hiding the characteristics of real open clusters. However, the dust in the directions to these objects appears to be uniformly distributed. Moreover, star counts carried out within and outside the open cluster candidate fields do not support the hypothesis that these objects are real open clusters or even open cluster remnants.
NASA Astrophysics Data System (ADS)
Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.
2015-11-01
In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misattribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed, yielding an explicit cluster attribution for each particle and improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Comparisons of non-Gaussian statistical models in DNA methylation analysis.
Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-06-16
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis
Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-01-01
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
A novel symptom cluster analysis among ambulatory HIV/AIDS patients in Uganda.
Namisango, Eve; Harding, Richard; Katabira, Elly T; Siegert, Richard J; Powell, Richard A; Atuhaire, Leonard; Moens, Katrien; Taylor, Steve
2015-01-01
Symptom clusters are gaining importance given HIV/AIDS patients experience multiple, concurrent symptoms. This study aimed to: determine clusters of patients with similar symptom combinations; describe symptom combinations distinguishing the clusters; and evaluate the clusters regarding patient socio-demographic, disease and treatment characteristics, quality of life (QOL) and functional performance. This was a cross-sectional study of 302 adult HIV/AIDS outpatients consecutively recruited at two teaching and referral hospitals in Uganda. Socio-demographic and seven-day period symptom prevalence and distress data were self-reported using the Memorial Symptom Assessment Schedule. QOL was assessed using the Medical Outcome Scale and functional performance using the Karnofsky Performance Scale. Symptom clusters were established using hierarchical cluster analysis with squared Euclidean distances using Ward's clustering methods based on symptom occurrence. Analysis of variance compared clusters on mean QOL and functional performance scores. Patient subgroups were categorised based on symptom occurrence rates. Five symptom occurrence clusters were identified: Cluster 1 (n=107), high-low for sensory discomfort and eating difficulties symptoms; Cluster 2 (n=47), high-low for psycho-gastrointestinal symptoms; Cluster 3 (n=71), high for pain and sensory disturbance symptoms; Cluster 4 (n=35), all high for general HIV/AIDS symptoms; and Cluster 5 (n=48), all low for mood-cognitive symptoms. The all high occurrence cluster was associated with worst functional status, poorest QOL scores and highest symptom-associated distress. Use of antiretroviral therapy was associated with all high symptom occurrence rate (Fisher's exact=4, P<0.001). CD4 count group below 200 was associated with the all high occurrence rate symptom cluster (Fisher's exact=41, P<0.001). Symptom clusters have a differential, affect HIV/AIDS patients' self-reported outcomes, with the subgroup experiencing high-symptom occurrence rates having a higher risk of poorer outcomes. Identification of symptom clusters could provide insights into commonly co-occurring symptoms that should be jointly targeted for management in patients with multiple complaints.
Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard
2015-01-01
Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001); being on ART (highest proportions for clusters two and three, p=0.012); global distress (F=26.8, p<0.001), physical distress (F=36.3, p<0.001) and psychological distress subscale (F=21.8, p<0.001) (all subscales worst for cluster one, best for cluster four). The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.
Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.
Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan
2017-01-01
Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.
Clustering of Health Behaviors and Cardiorespiratory Fitness Among U.S. Adolescents.
Hartz, Jacob; Yingling, Leah; Ayers, Colby; Adu-Brimpong, Joel; Rivers, Joshua; Ahuja, Chaarushi; Powell-Wiley, Tiffany M
2018-05-01
Decreased cardiorespiratory fitness (CRF) is associated with an increased risk of cardiovascular disease. However, little is known how the interaction of diet, physical activity (PA), and sedentary time (ST) affects CRF among adolescents. By using a nationally representative sample of U.S. adolescents, we used cluster analysis to investigate the interactions of these behaviors with CRF. We hypothesized that distinct clustering patterns exist and that less healthy clusters are associated with lower CRF. We used 2003-2004 National Health and Nutrition Examination Survey data for persons aged 12-19 years (N = 1,225). PA and ST were measured objectively by an accelerometer, and the American Heart Association Healthy Diet Score quantified diet quality. Maximal oxygen consumption (V˙O 2 max) was measured by submaximal treadmill exercise test. We performed cluster analysis to identify sex-specific clustering of diet, PA, and ST. Adjusting for accelerometer wear time, age, body mass index, race/ethnicity, and the poverty-to-income ratio, we performed sex-stratified linear regression analysis to evaluate the association of cluster with V˙O 2 max. Three clusters were identified for girls and boys. For girls, there was no difference across clusters for age (p = .1), weight (p = .3), and BMI (p = .5), and no relationship between clusters and V˙O 2 max. For boys, the youngest cluster (p < .01) had three healthy behaviors, weighed less, and was associated with a higher V˙O 2 max compared with the two older clusters. We observed clustering of diet, PA, and ST in U.S. adolescents. Specific patterns were associated with lower V˙O 2 max for boys, suggesting that our clusters may help identify adolescent boys most in need of interventions. Published by Elsevier Inc.
USDA-ARS?s Scientific Manuscript database
SUMMARY Comparative analysis of 207 genomes representing 159 species of the fungus Fusarium detected 9403 known and putative secondary metabolite (SM) biosynthetic gene clusters. The clusters included those responsible for synthesis of mycotoxins, plant hormones and pigments, and varied in distribut...
Li, Weizhong
2018-02-12
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
The Equivalence of Three Statistical Packages for Performing Hierarchical Cluster Analysis
ERIC Educational Resources Information Center
Blashfield, Roger
1977-01-01
Three different software programs which contain hierarchical agglomerative cluster analysis procedures were shown to generate different solutions on the same data set using apparently the same options. The basis for the differences in the solutions was the formulae used to calculate Euclidean distance. (Author/JKS)
Cluster Analysis of International Information and Social Development.
ERIC Educational Resources Information Center
Lau, Jesus
1990-01-01
Analyzes information activities in relation to socioeconomic characteristics in low, middle, and highly developed economies for the years 1960 and 1977 through the use of cluster analysis. Results of data from 31 countries suggest that information development is achieved mainly by countries that have also achieved social development. (26…
Identifying Peer Institutions Using Cluster Analysis
ERIC Educational Resources Information Center
Boronico, Jess; Choksi, Shail S.
2012-01-01
The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…
Differences in Pedaling Technique in Cycling: A Cluster Analysis.
Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A
2016-10-01
To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (PO VT2 ) compared with cycling at their maximal power output (PO MAX ). Twenty athletes performed an incremental cycling test to determine their power output (PO MAX and PO VT2 ; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in PO MAX and PO VT2 . Athletes were assigned to 2 clusters based on the behavior of outcome variables at PO VT2 and PO MAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at PO VT2 vs PO MAX , cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.
Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.
Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D; Ober, Carole; Nicolae, Dan L; Barnes, Kathleen C; London, Stephanie J; Gilliland, Frank; Weiss, Scott T; Raby, Benjamin A; Cohn, Lauren; Chupp, Geoffrey L
2015-05-15
The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10(-6)) and hospitalization (P = 0.01), respectively. There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma.
Atlas-guided cluster analysis of large tractography datasets.
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.
NASA Astrophysics Data System (ADS)
Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.
2018-02-01
This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.
NASA Astrophysics Data System (ADS)
Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann
2017-07-01
Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
NASA Astrophysics Data System (ADS)
Hynds, Paul; Misstear, Bruce D.; Gill, Laurence W.; Murphy, Heather M.
2014-04-01
An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p = 0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p < 0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.
Noninvasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma
Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F.; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D.; Ober, Carole; Nicolae, Dan L.; Barnes, Kathleen C.; London, Stephanie J.; Gilliland, Frank; Weiss, Scott T.; Raby, Benjamin A.; Cohn, Lauren
2015-01-01
Rationale: The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. Objectives: We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Methods: Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Measurements and Main Results: Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10−6) and hospitalization (P = 0.01), respectively. Conclusions: There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma. PMID:25763605
Going beyond Clustering in MD Trajectory Analysis: An Application to Villin Headpiece Folding
Rajan, Aruna; Freddolino, Peter L.; Schulten, Klaus
2010-01-01
Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous. PMID:20419160
Going beyond clustering in MD trajectory analysis: an application to villin headpiece folding.
Rajan, Aruna; Freddolino, Peter L; Schulten, Klaus
2010-04-15
Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.
Large and Small Magellanic Clouds age-metallicity relationships
NASA Astrophysics Data System (ADS)
Perren, G. I.; Piatti, A. E.; Vázquez, R. A.
2017-10-01
We present a new determination of the age-metallicity relation for both Magellanic Clouds, estimated through the homogeneous analysis of 239 observed star clusters. All clusters in our set were observed with the filters of the Washington photometric system. The Automated Stellar cluster Analysis package (ASteCA) was employed to derive the cluster's fundamental parameters, in particular their ages and metallicities, through an unassisted process. We find that our age-metallicity relations (AMRs) can not be fully matched to any of the estimations found in twelve previous works, and are better explained by a combination of several of them in different age intervals.
Vernon J. LaBau; John W. Hazard
2000-01-01
During an inventory to assess spruce bark beetle impact on the Kenai Peninsula in south-central Alaska, 5-year mortality estimates were made for all growing-stock trees on 0.6 ha areas, on 0.4 ha areas, and on a cluster of four 1/60-ha subplots. The analysis of the results of the comparison between cluster data and the larger plot data highlighted some of the problems...
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Cappelletti, C. A.
1982-01-01
A stratification oriented to crop area and yield estimation problems was performed using an algorithm of clustering. The variables used were a set of agroclimatological characteristics measured in each one of the 232 municipalities of the State of Rio Grande do Sul, Brazil. A nonhierarchical cluster analysis was used and the pseudo F-statistics criterion was implemented for determining the "cut point" in the number of strata.
NASA Astrophysics Data System (ADS)
Black, Joshua A.; Knowles, Peter J.
2018-06-01
The performance of quasi-variational coupled-cluster (QV) theory applied to the calculation of activation and reaction energies has been investigated. A statistical analysis of results obtained for six different sets of reactions has been carried out, and the results have been compared to those from standard single-reference methods. In general, the QV methods lead to increased activation energies and larger absolute reaction energies compared to those obtained with traditional coupled-cluster theory.
Application of microarray analysis on computer cluster and cloud platforms.
Bernau, C; Boulesteix, A-L; Knaus, J
2013-01-01
Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Nursing home care quality: a cluster analysis.
Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit
2017-02-13
Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ 2 tests and one-way between-groups ANOVA were performed to characterise the clusters ( p<0.05). Findings Two clusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.
REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Glascoe, L G; Glaser, R E; Chin, H S
2004-06-17
The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less
VanderKnyff, Jeremy; Friedman, Daniela B; Tanner, Andrea
2015-01-01
Using a sample of YouTube videos posted on the YouTube channels of organ procurement organizations, a content analysis was conducted to identify the frames used to strategically communicate prodonation messages. A total of 377 videos were coded for general characteristics, format, speaker characteristics, organs discussed, structure, problem definition, and treatment. Principal components analysis identified message frames, and k-means cluster analysis established distinct groupings of videos on the basis of the strength of their relationship to message frames. Analysis of these frames and clusters found that organ procurement organizations present multiple, and sometimes competing, video types and message frames on YouTube. This study serves as important formative research that will inform future studies to measure the effectiveness of the distinct message frames and clusters identified.
Factor Analysis and Counseling Research
ERIC Educational Resources Information Center
Weiss, David J.
1970-01-01
Topics discussed include factor analysis versus cluster analysis, analysis of Q correlation matrices, ipsativity and factor analysis, and tests for the significance of a correlation matrix prior to application of factor analytic techniques. Techniques for factor extraction discussed include principal components, canonical factor analysis, alpha…
Spatio-Temporal Analysis of Smear-Positive Tuberculosis in the Sidama Zone, Southern Ethiopia
Dangisso, Mesay Hailu; Datiko, Daniel Gemechu; Lindtjørn, Bernt
2015-01-01
Background Tuberculosis (TB) is a disease of public health concern, with a varying distribution across settings depending on socio-economic status, HIV burden, availability and performance of the health system. Ethiopia is a country with a high burden of TB, with regional variations in TB case notification rates (CNRs). However, TB program reports are often compiled and reported at higher administrative units that do not show the burden at lower units, so there is limited information about the spatial distribution of the disease. We therefore aim to assess the spatial distribution and presence of the spatio-temporal clustering of the disease in different geographic settings over 10 years in the Sidama Zone in southern Ethiopia. Methods A retrospective space–time and spatial analysis were carried out at the kebele level (the lowest administrative unit within a district) to identify spatial and space-time clusters of smear-positive pulmonary TB (PTB). Scan statistics, Global Moran’s I, and Getis and Ordi (Gi*) statistics were all used to help analyze the spatial distribution and clusters of the disease across settings. Results A total of 22,545 smear-positive PTB cases notified over 10 years were used for spatial analysis. In a purely spatial analysis, we identified the most likely cluster of smear-positive PTB in 192 kebeles in eight districts (RR= 2, p<0.001), with 12,155 observed and 8,668 expected cases. The Gi* statistic also identified the clusters in the same areas, and the spatial clusters showed stability in most areas in each year during the study period. The space-time analysis also detected the most likely cluster in 193 kebeles in the same eight districts (RR= 1.92, p<0.001), with 7,584 observed and 4,738 expected cases in 2003-2012. Conclusion The study found variations in CNRs and significant spatio-temporal clusters of smear-positive PTB in the Sidama Zone. The findings can be used to guide TB control programs to devise effective TB control strategies for the geographic areas characterized by the highest CNRs. Further studies are required to understand the factors associated with clustering based on individual level locations and investigation of cases. PMID:26030162
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, Lin; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu; Hammond, Karl D.
We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes ofmore » helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.« less
Freitas-Vilela, Ana Amélia; Smith, Andrew D A C; Kac, Gilberto; Pearson, Rebecca M; Heron, Jon; Emond, Alan; Hibbeln, Joseph R; Castro, Maria Beatriz Trindade; Emmett, Pauline M
2017-04-01
Little is known about how dietary patterns of mothers and their children track over time. The objectives of this study are to obtain dietary patterns in pregnancy using cluster analysis, to examine women's mean nutrient intakes in each cluster and to compare the dietary patterns of mothers to those of their children. Pregnant women (n = 12 195) from the Avon Longitudinal Study of Parents and Children reported their frequency of consumption of 47 foods and food groups. These data were used to obtain dietary patterns during pregnancy by cluster analysis. The absolute and energy-adjusted nutrient intakes were compared between clusters. Women's dietary patterns were compared with previously derived clusters of their children at 7 years of age. Multinomial logistic regression was performed to evaluate relationships comparing maternal and offspring clusters. Three maternal clusters were identified: 'fruit and vegetables', 'meat and potatoes' and 'white bread and coffee'. After energy adjustment women in the 'fruit and vegetables' cluster had the highest mean nutrient intakes. Mothers in the 'fruit and vegetables' cluster were more likely than mothers in 'meat and potatoes' (adjusted odds ratio [OR]: 2.00; 95% Confidence Interval [CI]: 1.69-2.36) or 'white bread and coffee' (OR: 2.18; 95% CI: 1.87-2.53) clusters to have children in a 'plant-based' cluster. However the majority of children were in clusters unrelated to their mother dietary pattern. Three distinct dietary patterns were obtained in pregnancy; the 'fruit and vegetables' pattern being the most nutrient dense. Mothers' dietary patterns were associated with but did not dominate offspring dietary patterns. © 2016 The Authors. Maternal & Child Nutrition published by John Wiley & Sons Ltd.
Cluster analysis of sputum cytokine-high profiles reveals diversity in T(h)2-high asthma patients.
Seys, Sven F; Scheers, Hans; Van den Brande, Paul; Marijsse, Gudrun; Dilissen, Ellen; Van Den Bergh, Annelies; Goeminne, Pieter C; Hellings, Peter W; Ceuppens, Jan L; Dupont, Lieven J; Bullens, Dominique M A
2017-02-23
Asthma is characterized by a heterogeneous inflammatory profile and can be subdivided into T(h)2-high and T(h)2-low airway inflammation. Profiling of a broader panel of airway cytokines in large unselected patient cohorts is lacking. Patients (n = 205) were defined as being "cytokine-low/high" if sputum mRNA expression of a particular cytokine was outside the respective 10 th /90 th percentile range of the control group (n = 80). Unsupervised hierarchical clustering was used to determine clusters based on sputum cytokine profiles. Half of patients (n = 108; 52.6%) had a classical T(h)2-high ("IL-4-, IL-5- and/or IL-13-high") sputum cytokine profile. Unsupervised cluster analysis revealed 5 clusters. Patients with an "IL-4- and/or IL-13-high" pattern surprisingly did not cluster but were equally distributed among the 5 clusters. Patients with an "IL-5-, IL-17A-/F- and IL-25- high" profile were restricted to cluster 1 (n = 24) with increased sputum eosinophil as well as neutrophil counts and poor lung function parameters at baseline and 2 years later. Four other clusters were identified: "IL-5-high or IL-10-high" (n = 16), "IL-6-high" (n = 8), "IL-22-high" (n = 25). Cluster 5 (n = 132) consists of patients without "cytokine-high" pattern or patients with only high IL-4 and/or IL-13. We identified 5 unique asthma molecular phenotypes by biological clustering. Type 2 cytokines cluster with non-type 2 cytokines in 4 out of 5 clusters. Unsupervised analysis thus not supports a priori type 2 versus non-type 2 molecular phenotypes. www.clinicaltrials.gov NCT01224938. Registered 18 October 2010.
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.
Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J
2008-06-18
Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.
InCHlib - interactive cluster heatmap for web applications.
Skuta, Ctibor; Bartůněk, Petr; Svozil, Daniel
2014-12-01
Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust . The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
Marshall, Sarah A.; Yang, Christopher C.; Ping, Qing; Zhao, Mengnan; Avis, Nancy E.
2016-01-01
Purpose User-generated content on social media sites, such as health-related online forums, offers researchers a tantalizing amount of information, but concerns regarding scientific application of such data remain. This paper compares and contrasts symptom cluster patterns derived from messages on a breast cancer forum with those from a symptom checklist completed by breast cancer survivors participating in a research study. Methods Over 50,000 messages generated by 12,991 users of the breast cancer forum on MedHelp.org were transformed into a standard form and examined for the co-occurrence of 25 symptoms. The k-medoid clustering method was used to determine appropriate placement of symptoms within clusters. Findings were compared with a similar analysis of a symptom checklist administered to 653 breast cancer survivors participating in a research study. Results The following clusters were identified using forum data: menopausal/psychological, pain/fatigue, gastrointestinal, and miscellaneous. Study data generated the clusters: menopausal, pain, fatigue/sleep/gastrointestinal, psychological, and increased weight/appetite. Although the clusters are somewhat different, many symptoms that clustered together in the social media analysis remained together in the analysis of the study participants. Density of connections between symptoms, as reflected by rates of co-occurrence and similarity, was higher in the study data. Conclusions The copious amount of data generated by social media outlets can augment findings from traditional data sources. When different sources of information are combined, areas of overlap and discrepancy can be detected, perhaps giving researchers a more accurate picture of reality. However, data derived from social media must be used carefully and with understanding of its limitations. PMID:26476836
Wardenaar, K J; van Loo, H M; Cai, T; Fava, M; Gruber, M J; Li, J; de Jonge, P; Nierenberg, A A; Petukhova, M V; Rose, S; Sampson, N A; Schoevers, R A; Wilcox, M A; Alonso, J; Bromet, E J; Bunting, B; Florescu, S E; Fukao, A; Gureje, O; Hu, C; Huang, Y Q; Karam, A N; Levinson, D; Medina Mora, M E; Posada-Villa, J; Scott, K M; Taib, N I; Viana, M C; Xavier, M; Zarkov, Z; Kessler, R C
2014-11-01
Although variation in the long-term course of major depressive disorder (MDD) is not strongly predicted by existing symptom subtype distinctions, recent research suggests that prediction can be improved by using machine learning methods. However, it is not known whether these distinctions can be refined by added information about co-morbid conditions. The current report presents results on this question. Data came from 8261 respondents with lifetime DSM-IV MDD in the World Health Organization (WHO) World Mental Health (WMH) Surveys. Outcomes included four retrospectively reported measures of persistence/severity of course (years in episode; years in chronic episodes; hospitalization for MDD; disability due to MDD). Machine learning methods (regression tree analysis; lasso, ridge and elastic net penalized regression) followed by k-means cluster analysis were used to augment previously detected subtypes with information about prior co-morbidity to predict these outcomes. Predicted values were strongly correlated across outcomes. Cluster analysis of predicted values found three clusters with consistently high, intermediate or low values. The high-risk cluster (32.4% of cases) accounted for 56.6-72.9% of high persistence, high chronicity, hospitalization and disability. This high-risk cluster had both higher sensitivity and likelihood ratio positive (LR+; relative proportions of cases in the high-risk cluster versus other clusters having the adverse outcomes) than in a parallel analysis that excluded measures of co-morbidity as predictors. Although the results using the retrospective data reported here suggest that useful MDD subtyping distinctions can be made with machine learning and clustering across multiple indicators of illness persistence/severity, replication with prospective data is needed to confirm this preliminary conclusion.
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Marshall, Sarah A; Yang, Christopher C; Ping, Qing; Zhao, Mengnan; Avis, Nancy E; Ip, Edward H
2016-03-01
User-generated content on social media sites, such as health-related online forums, offers researchers a tantalizing amount of information, but concerns regarding scientific application of such data remain. This paper compares and contrasts symptom cluster patterns derived from messages on a breast cancer forum with those from a symptom checklist completed by breast cancer survivors participating in a research study. Over 50,000 messages generated by 12,991 users of the breast cancer forum on MedHelp.org were transformed into a standard form and examined for the co-occurrence of 25 symptoms. The k-medoid clustering method was used to determine appropriate placement of symptoms within clusters. Findings were compared with a similar analysis of a symptom checklist administered to 653 breast cancer survivors participating in a research study. The following clusters were identified using forum data: menopausal/psychological, pain/fatigue, gastrointestinal, and miscellaneous. Study data generated the clusters: menopausal, pain, fatigue/sleep/gastrointestinal, psychological, and increased weight/appetite. Although the clusters are somewhat different, many symptoms that clustered together in the social media analysis remained together in the analysis of the study participants. Density of connections between symptoms, as reflected by rates of co-occurrence and similarity, was higher in the study data. The copious amount of data generated by social media outlets can augment findings from traditional data sources. When different sources of information are combined, areas of overlap and discrepancy can be detected, perhaps giving researchers a more accurate picture of reality. However, data derived from social media must be used carefully and with understanding of its limitations.
NASA Astrophysics Data System (ADS)
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Study on Adaptive Parameter Determination of Cluster Analysis in Urban Management Cases
NASA Astrophysics Data System (ADS)
Fu, J. Y.; Jing, C. F.; Du, M. Y.; Fu, Y. L.; Dai, P. P.
2017-09-01
The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object's highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data's spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.
RSAT 2015: Regulatory Sequence Analysis Tools.
Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques
2015-07-01
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Pattern Activity Clustering and Evaluation (PACE)
NASA Astrophysics Data System (ADS)
Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna
2012-06-01
With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.
Ofner, Johannes; Kamilli, Katharina A; Eitenberger, Elisabeth; Friedbacher, Gernot; Lendl, Bernhard; Held, Andreas; Lohninger, Hans
2015-09-15
The chemometric analysis of multisensor hyperspectral data allows a comprehensive image-based analysis of precipitated atmospheric particles. Atmospheric particulate matter was precipitated on aluminum foils and analyzed by Raman microspectroscopy and subsequently by electron microscopy and energy dispersive X-ray spectroscopy. All obtained images were of the same spot of an area of 100 × 100 μm(2). The two hyperspectral data sets and the high-resolution scanning electron microscope images were fused into a combined multisensor hyperspectral data set. This multisensor data cube was analyzed using principal component analysis, hierarchical cluster analysis, k-means clustering, and vertex component analysis. The detailed chemometric analysis of the multisensor data allowed an extensive chemical interpretation of the precipitated particles, and their structure and composition led to a comprehensive understanding of atmospheric particulate matter.
Onda, Kyle; Crocker, Jonny; Kayser, Georgia Lyn; Bartram, Jamie
2013-01-01
The fields of global health and international development commonly cluster countries by geography and income to target resources and describe progress. For any given sector of interest, a range of relevant indicators can serve as a more appropriate basis for classification. We create a new typology of country clusters specific to the water and sanitation (WatSan) sector based on similarities across multiple WatSan-related indicators. After a literature review and consultation with experts in the WatSan sector, nine indicators were selected. Indicator selection was based on relevance to and suggested influence on national water and sanitation service delivery, and to maximize data availability across as many countries as possible. A hierarchical clustering method and a gap statistic analysis were used to group countries into a natural number of relevant clusters. Two stages of clustering resulted in five clusters, representing 156 countries or 6.75 billion people. The five clusters were not well explained by income or geography, and were unique from existing country clusters used in international development. Analysis of these five clusters revealed that they were more compact and well separated than United Nations and World Bank country clusters. This analysis and resulting country typology suggest that previous geography- or income-based country groupings can be improved upon for applications in the WatSan sector by utilizing globally available WatSan-related indicators. Potential applications include guiding and discussing research, informing policy, improving resource targeting, describing sector progress, and identifying critical knowledge gaps in the WatSan sector. PMID:24054545
Wolf, Antje; Kirschner, Karl N
2013-02-01
With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
Determining the trophic guilds of fishes and macroinvertebrates in a seagrass food web
Luczkovich, J.J.; Ward, G.P.; Johnson, J.C.; Christian, R.R.; Baird, D.; Neckles, H.; Rizzo, W.M.
2002-01-01
We established trophic guilds of macroinvertebrate and fish taxa using correspondence analysis and a hierarchical clustering strategy for a seagrass food web in winter in the northeastern Gulf of Mexico. To create the diet matrix, we characterized the trophic linkages of macroinvertebrate and fish taxa. present in Hatodule wrightii seagrass habitat areas within the St. Marks National Wildlife Refuge (Florida) using binary data, combining dietary links obtained from relevant literature for macroinvertebrates with stomach analysis of common fishes collected during January and February of 1994. Heirarchical average-linkage cluster analysis of the 73 taxa of fishes and macroinvertebrates in the diet matrix yielded 14 clusters with diet similarity greater than or equal to 0.60. We then used correspondence analysis with three factors to jointly plot the coordinates of the consumers (identified by cluster membership) and of the 33 food sources. Correspondence analysis served as a visualization tool for assigning each taxon to one of eight trophic guilds: herbivores, detritivores, suspension feeders, omnivores, molluscivores, meiobenthos consumers, macrobenthos consumers, and piscivores. These trophic groups, cross-classified with major taxonomic groups, were further used to develop consumer compartments in a network analysis model of carbon flow in this seagrass ecosystem. The method presented here should greatly improve the development of future network models of food webs by providing an objective procedure for aggregating trophic groups.
NASA Astrophysics Data System (ADS)
Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.
2014-12-01
The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.
Common factor analysis versus principal component analysis: choice for symptom cluster research.
Kim, Hee-Ju
2008-03-01
The purpose of this paper is to examine differences between two factor analytical methods and their relevance for symptom cluster research: common factor analysis (CFA) versus principal component analysis (PCA). Literature was critically reviewed to elucidate the differences between CFA and PCA. A secondary analysis (N = 84) was utilized to show the actual result differences from the two methods. CFA analyzes only the reliable common variance of data, while PCA analyzes all the variance of data. An underlying hypothetical process or construct is involved in CFA but not in PCA. PCA tends to increase factor loadings especially in a study with a small number of variables and/or low estimated communality. Thus, PCA is not appropriate for examining the structure of data. If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research), CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.
Bayesian multivariate hierarchical transformation models for ROC analysis.
O'Malley, A James; Zou, Kelly H
2006-02-15
A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.
Bayesian multivariate hierarchical transformation models for ROC analysis
O'Malley, A. James; Zou, Kelly H.
2006-01-01
SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836
Characterization of spatial and temporal variability in hydrochemistry of Johor Straits, Malaysia.
Abdullah, Pauzi; Abdullah, Sharifah Mastura Syed; Jaafar, Othman; Mahmud, Mastura; Khalik, Wan Mohd Afiq Wan Mohd
2015-12-15
Characterization of hydrochemistry changes in Johor Straits within 5 years of monitoring works was successfully carried out. Water quality data sets (27 stations and 19 parameters) collected in this area were interpreted subject to multivariate statistical analysis. Cluster analysis grouped all the stations into four clusters ((Dlink/Dmax) × 100<90) and two clusters ((Dlink/Dmax) × 100<80) for site and period similarities. Principal component analysis rendered six significant components (eigenvalue>1) that explained 82.6% of the total variance of the data set. Classification matrix of discriminant analysis assigned 88.9-92.6% and 83.3-100% correctness in spatial and temporal variability, respectively. Times series analysis then confirmed that only four parameters were not significant over time change. Therefore, it is imperative that the environmental impact of reclamation and dredging works, municipal or industrial discharge, marine aquaculture and shipping activities in this area be effectively controlled and managed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Making Sense of Cluster Analysis: Revelations from Pakistani Science Classes
ERIC Educational Resources Information Center
Pell, Tony; Hargreaves, Linda
2011-01-01
Cluster analysis has been applied to quantitative data in educational research over several decades and has been a feature of the Maurice Galton's research in primary and secondary classrooms. It has offered potentially useful insights for teaching yet its implications for practice are rarely implemented. It has been subject also to negative…
Student Motivation and Learning in Mathematics and Science: A Cluster Analysis
ERIC Educational Resources Information Center
Ng, Betsy L. L.; Liu, W. C.; Wang, John C. K.
2016-01-01
The present study focused on an in-depth understanding of student motivation and self-regulated learning in mathematics and science through cluster analysis. It examined the different learning profiles of motivational beliefs and self-regulatory strategies in relation to perceived teacher autonomy support, basic psychological needs (i.e. autonomy,…
A Cluster Analysis of Personality Style in Adults with ADHD
ERIC Educational Resources Information Center
Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita
2008-01-01
Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…
An algol program for dissimilarity analysis: a divisive-omnithetic clustering technique
Tipper, J.C.
1979-01-01
Clustering techniques are used properly to generate hypotheses about patterns in data. Of the hierarchical techniques, those which are divisive and omnithetic possess many theoretically optimal properties. One such method, dissimilarity analysis, is implemented here in ALGOL 60, and determined to be competitive computationally with most other methods. ?? 1979.
Using Data Mining Results to Improve Educational Video Game Design
ERIC Educational Resources Information Center
Kerr, Deirdre
2015-01-01
This study uses information about in-game strategy use, identified through cluster analysis of actions in an educational video game, to make data-driven modifications to the game in order to reduce construct-irrelevant behavior. The examination of student strategies identified through cluster analysis indicated that (a) it was common for students…
Language Learner Motivational Types: A Cluster Analysis Study
ERIC Educational Resources Information Center
Papi, Mostafa; Teimouri, Yasser
2014-01-01
The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…
NASA Astrophysics Data System (ADS)
Kosasih, Wilson; Salomon, Lithrone Laricha; Hutomo, Reynaldo
2017-08-01
This paper discusses the development of new products of Micro, Small and Medium Entreprises (SMEs) to identify what attributes are considered by consumers, as well as combinations of attributes that need to be analyzed into the main preferences of consumers. The purpose of this research is to increase the added value and competitiveness of SMEs through product innovation. The object of this study is banana chips produced by SMEs from the province of Lampung which it considered to be unique souvenirs of the province. The research data were collected by distributing questionnaires in Jakarta which has heterogeneous population, in order to develop banana chip's marketing and increase its market share in Indonesia. Data processing was performed using conjoint analysis and cluster analysis. Segmentation was performed using conjoint analysis based on the importance level of attributes and part-worth of level attributes of each cluster. Finally, characteristics and consumer preferences of each cluster will be a consideration in determining the product development and marketing strategies.
Dorfman, David M; LaPlante, Charlotte D; Pozdnyakova, Olga; Li, Betty
2015-11-01
In our high-sensitivity flow cytometric approach for systemic mastocytosis (SM), we identified mast cell event clustering as a new diagnostic criterion for the disease. To objectively characterize mast cell gated event distributions, we performed cluster analysis using FLOCK, a computational approach to identify cell subsets in multidimensional flow cytometry data in an unbiased, automated fashion. FLOCK identified discrete mast cell populations in most cases of SM (56/75 [75%]) but only a minority of non-SM cases (17/124 [14%]). FLOCK-identified mast cell populations accounted for 2.46% of total cells on average in SM cases and 0.09% of total cells on average in non-SM cases (P < .0001) and were predictive of SM, with a sensitivity of 75%, a specificity of 86%, a positive predictive value of 76%, and a negative predictive value of 85%. FLOCK analysis provides useful diagnostic information for evaluating patients with suspected SM, and may be useful for the analysis of other hematopoietic neoplasms. Copyright© by the American Society for Clinical Pathology.
Combining Multiobjective Optimization and Cluster Analysis to Study Vocal Fold Functional Morphology
Palaparthi, Anil; Riede, Tobias
2017-01-01
Morphological design and the relationship between form and function have great influence on the functionality of a biological organ. However, the simultaneous investigation of morphological diversity and function is difficult in complex natural systems. We have developed a multiobjective optimization (MOO) approach in association with cluster analysis to study the form-function relation in vocal folds. An evolutionary algorithm (NSGA-II) was used to integrate MOO with an existing finite element model of the laryngeal sound source. Vocal fold morphology parameters served as decision variables and acoustic requirements (fundamental frequency, sound pressure level) as objective functions. A two-layer and a three-layer vocal fold configuration were explored to produce the targeted acoustic requirements. The mutation and crossover parameters of the NSGA-II algorithm were chosen to maximize a hypervolume indicator. The results were expressed using cluster analysis and were validated against a brute force method. Results from the MOO and the brute force approaches were comparable. The MOO approach demonstrated greater resolution in the exploration of the morphological space. In association with cluster analysis, MOO can efficiently explore vocal fold functional morphology. PMID:24771563
Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
X-ray morphological study of galaxy cluster catalogues
NASA Astrophysics Data System (ADS)
Democles, Jessica; Pierre, Marguerite; Arnaud, Monique
2016-07-01
Context : The intra-cluster medium distribution as probed by X-ray morphology based analysis gives good indication of the system dynamical state. In the race for the determination of precise scaling relations and understanding their scatter, the dynamical state offers valuable information. Method : We develop the analysis of the centroid-shift so that it can be applied to characterize galaxy cluster surveys such as the XXL survey or high redshift cluster samples. We use it together with the surface brightness concentration parameter and the offset between X-ray peak and brightest cluster galaxy in the context of the XXL bright cluster sample (Pacaud et al 2015) and a set of high redshift massive clusters detected by Planck and SPT and observed by both XMM-Newton and Chandra observatories. Results : Using the wide redshift coverage of the XXL sample, we see no trend between the dynamical state of the systems with the redshift.
Statistical analysis of catalogs of extragalactic objects. II - The Abell catalog of rich clusters
NASA Technical Reports Server (NTRS)
Hauser, M. G.; Peebles, P. J. E.
1973-01-01
The results of a power-spectrum analysis are presented for the distribution of clusters in the Abell catalog. Clear and direct evidence is found for superclusters with small angular scale, in agreement with the recent study of Bogart and Wagoner (1973). It is also found that the degree and angular scale of the apparent superclustering varies with distance in the manner expected if the clustering is intrinsic to the spatial distribution rather than a consequence of patchy local obscuration.
Who are the healthy active seniors? A cluster analysis.
Lai, Claudia K Y; Chan, Engle Angela; Chin, Kenny C W
2014-12-01
This paper reports a cluster analysis of a sample recruited from a randomized controlled trial that explored the effect of using a life story work approach to improve the psychological outcomes of older people in the community. 238 subjects from community centers were included in this analysis. After statistical testing, 169 seniors were assigned to the active ageing (AG) cluster and 69 to the inactive ageing (IG) cluster. Those in the AG were younger and healthier, with fewer chronic diseases and fewer depressive symptoms than those in the IG. They were more satisfied with their lives, and had higher self-esteem. They met with their family members more frequently, they engaged in more leisure activities and were more likely to have the ability to move freely. In summary, active ageing was observed in people with better health and functional performance. Our results echoed the limited findings reported in the literature.
Generating a Magellanic star cluster catalog with ASteCA
NASA Astrophysics Data System (ADS)
Perren, G. I.; Piatti, A. E.; Vázquez, R. A.
2016-08-01
An increasing number of software tools have been employed in the recent years for the automated or semi-automated processing of astronomical data. The main advantages of using these tools over a standard by-eye analysis include: speed (particularly for large databases), homogeneity, reproducibility, and precision. At the same time, they enable a statistically correct study of the uncertainties associated with the analysis, in contrast with manually set errors, or the still widespread practice of simply not assigning errors. We present a catalog comprising 210 star clusters located in the Large and Small Magellanic Clouds, observed with Washington photometry. Their fundamental parameters were estimated through an homogeneous, automatized and completely unassisted process, via the Automated Stellar Cluster Analysis package ( ASteCA). Our results are compared with two types of studies on these clusters: one where the photometry is the same, and another where the photometric system is different than that employed by ASteCA.
Eher, R; Windhaber, J; Rau, H; Schmitt, M; Kellner, E
2000-05-01
Conflict and conflict resolution in intimate relationships are not only among the most important factors influencing relationship satisfaction but are also seen in association with clinical symptoms. Styles of conflict will be assessed in patients suffering from panic disorder with and without agoraphobia, in alcoholics and in patients suffering from rheumatoid arthritis. 176 patients and healthy controls filled out the Styles of Conflict Inventory and questionnaires concerning severity of clinical symptoms. A cluster analysis revealed 5 types of conflict management. Healthy controls showed predominantely assertive and constructive styles, patients with panic disorder showed high levels of cognitive and/or behavioral aggression. Alcoholics showed high levels of repressed aggression, and patients with rheumatoid arthritis often did not exhibit any aggression during conflict. 5 Clusters of conflict pattern have been identified by cluster analysis. Each patient group showed considerable different patterns of conflict management.
Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D
2014-09-01
This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.
Kim, Hyeyoung; Kim, Bora; Kim, Se Hyun; Park, C Hyung Keun; Kim, Eun Young; Ahn, Yong Min
2018-08-01
It is essential to understand the latent structure of the population of suicide attempters for effective suicide prevention. The aim of this study was to identify subgroups among Korean suicide attempters in terms of the details of the suicide attempt. A total of 888 people who attempted suicide and were subsequently treated in the emergency rooms of 17 medical centers between May and November of 2013 were included in the analysis. The variables assessed included demographic characteristics, clinical information, and details of the suicide attempt assessed by the Suicide Intent Scale (SIS) and Columbia-Suicide Severity Rating Scale (C-SSRS). Cluster analysis was performed using the Ward method. Of the participants, 85.4% (n = 758) fell into a cluster characterized by less planning, low lethality methods, and ambivalence towards death ("impulsive"). The other cluster (n = 130) involved a more severe and well-planned attempt, used highly lethal methods, and took more precautions to avoid being interrupted ("planned"). The first cluster was dominated by women, while the second cluster was associated more with men, older age, and physical illness. We only included participants who visited the emergency department after their suicide attempt and had no missing values for SIS or C-SSRS. Cluster analysis extracted two distinct subgroups of Korean suicide attempters showing different patterns of suicidal behaviors. Understanding that a significant portion of suicide attempts occur impulsively calls for new prevention strategies tailored to differing subgroup profiles. Copyright © 2018 Elsevier B.V. All rights reserved.
Wavelet-based clustering of resting state MRI data in the rat.
Medda, Alessio; Hoffmann, Lukas; Magnuson, Matthew; Thompson, Garth; Pan, Wen-Ju; Keilholz, Shella
2016-01-01
While functional connectivity has typically been calculated over the entire length of the scan (5-10min), interest has been growing in dynamic analysis methods that can detect changes in connectivity on the order of cognitive processes (seconds). Previous work with sliding window correlation has shown that changes in functional connectivity can be observed on these time scales in the awake human and in anesthetized animals. This exciting advance creates a need for improved approaches to characterize dynamic functional networks in the brain. Previous studies were performed using sliding window analysis on regions of interest defined based on anatomy or obtained from traditional steady-state analysis methods. The parcellation of the brain may therefore be suboptimal, and the characteristics of the time-varying connectivity between regions are dependent upon the length of the sliding window chosen. This manuscript describes an algorithm based on wavelet decomposition that allows data-driven clustering of voxels into functional regions based on temporal and spectral properties. Previous work has shown that different networks have characteristic frequency fingerprints, and the use of wavelets ensures that both the frequency and the timing of the BOLD fluctuations are considered during the clustering process. The method was applied to resting state data acquired from anesthetized rats, and the resulting clusters agreed well with known anatomical areas. Clusters were highly reproducible across subjects. Wavelet cross-correlation values between clusters from a single scan were significantly higher than the values from randomly matched clusters that shared no temporal information, indicating that wavelet-based analysis is sensitive to the relationship between areas. Copyright © 2015 Elsevier Inc. All rights reserved.
Benefits of off-campus education for students in the health sciences: a text-mining analysis.
Nakagawa, Kazumasa; Asakawa, Yasuyoshi; Yamada, Keiko; Ushikubo, Mitsuko; Yoshida, Tohru; Yamaguchi, Haruyasu
2012-08-28
In Japan, few community-based approaches have been adopted in health-care professional education, and the appropriate content for such approaches has not been clarified. In establishing community-based education for health-care professionals, clarification of its learning effects is required. A community-based educational program was started in 2009 in the health sciences course at Gunma University, and one of the main elements in this program is conducting classes outside school. The purpose of this study was to investigate using text-analysis methods how the off-campus program affects students. In all, 116 self-assessment worksheets submitted by students after participating in the off-campus classes were decomposed into words. The extracted words were carefully selected from the perspective of contained meaning or content. With the selected terms, the relations to each word were analyzed by means of cluster analysis. Cluster analysis was used to select and divide 32 extracted words into four clusters: cluster 1-"actually/direct," "learn/watch/hear," "how," "experience/participation," "local residents," "atmosphere in community-based clinical care settings," "favorable," "communication/conversation," and "study"; cluster 2-"work of staff member" and "role"; cluster 3-"interaction/communication," "understanding," "feel," "significant/important/necessity," and "think"; and cluster 4-"community," "confusing," "enjoyable," "proactive," "knowledge," "academic knowledge," and "class." The students who participated in the program achieved different types of learning through the off-campus classes. They also had a positive impression of the community-based experience and interaction with the local residents, which is considered a favorable outcome. Off-campus programs could be a useful educational approach for students in health sciences.
Modified multidimensional scaling approach to analyze financial markets.
Yin, Yi; Shang, Pengjian
2014-06-01
Detrended cross-correlation coefficient (σDCCA) and dynamic time warping (DTW) are introduced as the dissimilarity measures, respectively, while multidimensional scaling (MDS) is employed to translate the dissimilarities between daily price returns of 24 stock markets. We first propose MDS based on σDCCA dissimilarity and MDS based on DTW dissimilarity creatively, while MDS based on Euclidean dissimilarity is also employed to provide a reference for comparisons. We apply these methods in order to further visualize the clustering between stock markets. Moreover, we decide to confront MDS with an alternative visualization method, "Unweighed Average" clustering method, for comparison. The MDS analysis and "Unweighed Average" clustering method are employed based on the same dissimilarity. Through the results, we find that MDS gives us a more intuitive mapping for observing stable or emerging clusters of stock markets with similar behavior, while the MDS analysis based on σDCCA dissimilarity can provide more clear, detailed, and accurate information on the classification of the stock markets than the MDS analysis based on Euclidean dissimilarity. The MDS analysis based on DTW dissimilarity indicates more knowledge about the correlations between stock markets particularly and interestingly. Meanwhile, it reflects more abundant results on the clustering of stock markets and is much more intensive than the MDS analysis based on Euclidean dissimilarity. In addition, the graphs, originated from applying MDS methods based on σDCCA dissimilarity and DTW dissimilarity, may also guide the construction of multivariate econometric models.
Hebels, Dennie G A J; Rasche, Axel; Herwig, Ralf; van Westen, Gerard J P; Jennen, Danyel G J; Kleinjans, Jos C S
2016-01-01
When evaluating compound similarity, addressing multiple sources of information to reach conclusions about common pharmaceutical and/or toxicological mechanisms of action is a crucial strategy. In this chapter, we describe a systems biology approach that incorporates analyses of hepatotoxicant data for 33 compounds from three different sources: a chemical structure similarity analysis based on the 3D Tanimoto coefficient, a chemical structure-based protein target prediction analysis, and a cross-study/cross-platform meta-analysis of in vitro and in vivo human and rat transcriptomics data derived from public resources (i.e., the diXa data warehouse). Hierarchical clustering of the outcome scores of the separate analyses did not result in a satisfactory grouping of compounds considering their known toxic mechanism as described in literature. However, a combined analysis of multiple data types may hypothetically compensate for missing or unreliable information in any of the single data types. We therefore performed an integrated clustering analysis of all three data sets using the R-based tool iClusterPlus. This indeed improved the grouping results. The compound clusters that were formed by means of iClusterPlus represent groups that show similar gene expression while simultaneously integrating a similarity in structure and protein targets, which corresponds much better with the known mechanism of action of these toxicants. Using an integrative systems biology approach may thus overcome the limitations of the separate analyses when grouping liver toxicants sharing a similar mechanism of toxicity.
HICOSMO - X-ray analysis of a complete sample of galaxy clusters
NASA Astrophysics Data System (ADS)
Schellenberger, G.; Reiprich, T.
2017-10-01
Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.
Sun, Chia-Tsen; Chiang, Austin W T; Hwang, Ming-Jing
2017-10-27
Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
NASA Technical Reports Server (NTRS)
Ballew, G.
1977-01-01
The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed using Johnson's HICLUS program. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.
Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae
Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira
2011-01-01
Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.
Kristunas, Caroline; Morris, Tom; Gray, Laura
2017-11-15
To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Epidemiologic Surveillance of Teenage Birth Rates in the United States, 2006-2012.
Amin, Raid; Decesare, Julie Zemaitis; Hans, Jennifer; Roussos-Ross, Kay
2017-06-01
To investigate the geographic variation in the average teenage birth rates by county in the contiguous United States. Data from the National Center for Health Statistics were used in this retrospective cohort to count the total number of live births to females aged 15-19 years by county between 2006 and 2012. Software for disease surveillance and spatial cluster analysis was used to identify clusters of high or low teenage births in counties or areas of greater than 100,000 teenage females. The analysis was then adjusted for percentage of poverty and high school diploma achievement. The unadjusted analysis identified the top 10 clusters of teenage births. The cluster with the highest rate was a city and the surrounding 40 counties, demonstrating an average teen birth rate of 67 per 1,000 females in the age range, 87% higher than the rate in the contiguous United States. Adjustments for poverty rates and high school diploma achievement shifted the top clusters to other areas. Despite an overall national decline in the teenage birth rate, clusters of elevated teenage birth rates remain. These clusters are not random and remain higher than expected when adjusted for poverty and education. This data set provides a framework to focus targeted interventions to reduce teenage birth rates in this high-risk population.
Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N
2017-01-04
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Justen, Gisele C; Espinoza-Quiñones, Fernando R; Módenes, Aparecido Nivaldo; Bergamasco, Rosangela
2012-01-01
In this work the analysis of elements concentration in groundwater was performed using the synchrotron radiation total-reflection X-ray fluorescence (SR-TXRF) technique. A set of nine tube-wells with serious risk of contamination was chosen to monitor the mean concentration of elements in groundwater from the North Serra Geral aquifer in Santa Helena, Brazil, during 1 year. Element concentrations were determined applying a SR-TXRF methodology. The accuracy of SR-TXRF technique was validated by analysis of a certified reference material. As the groundwater composition in the North Serra Geral aquifer showed heterogeneity in the spatial distribution of eight major elements, a hierarchical clustering to the data was performed. By a similarity in their compositions, two of the nine wells were grouped in a first cluster, while the other seven were grouped in a second cluster. Calcium was the major element in all wells, with higher Ca concentration in the second cluster than in the first cluster. However, concentrations of Ti, V, Cr in the first cluster are slightly higher than those in the second cluster. The findings of this study within a monitoring program of tube-wells could provide a useful assessment of controls over groundwater composition and support management at regional level.
[Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].
Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige
2015-12-01
We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.
Luciano, Juan V; Forero, Carlos G; Cerdà-Lafont, Marta; Peñarrubia-María, María Teresa; Fernández-Vergel, Rita; Cuesta-Vargas, Antonio I; Ruíz, José M; Rozadilla-Sacanell, Antoni; Sirvent-Alierta, Elena; Santo-Panero, Pilar; García-Campayo, Javier; Serrano-Blanco, Antoni; Pérez-Aranda, Adrián; Rubio-Valera, María
2016-10-01
Although fibromyalgia syndrome (FM) is considered a heterogeneous condition, there is no generally accepted subgroup typology. We used hierarchical cluster analysis and latent profile analysis to replicate Giesecke's classification in Spanish FM patients. The second aim was to examine whether the subgroups differed in sociodemographic characteristics, functional status, quality of life, and in direct and indirect costs. A total of 160 FM patients completed the following measures for cluster derivation: the Center for Epidemiological Studies-Depression Scale, the Trait Anxiety Inventory, the Pain Catastrophizing Scale, and the Control over Pain subscale. Pain threshold was measured with a sphygmomanometer. In addition, the Fibromyalgia Impact Questionnaire-Revised, the EuroQoL-5D-3L, and the Client Service Receipt Inventory were administered for cluster validation. Two distinct clusters were identified using hierarchical cluster analysis ("hypersensitive" group, 69.8% and "functional" group, 30.2%). In contrast, the latent profile analysis goodness-of-fit indices supported the existence of 3 FM patient profiles: (1) a "functional" profile (28.1%) defined as moderate tenderness, distress, and pain catastrophizing; (2) a "dysfunctional" profile (45.6%) defined by elevated tenderness, distress, and pain catastrophizing; and (3) a "highly dysfunctional and distressed" profile (26.3%) characterized by elevated tenderness and extremely high distress and catastrophizing. We did not find significant differences in sociodemographic characteristics between the 2 clusters or among the 3 profiles. The functional profile was associated with less impairment, greater quality of life, and lower health care costs. We identified 3 distinct profiles which accounted for the heterogeneity of FM patients. Our findings might help to design tailored interventions for FM patients.
Chalmet, Kristen; Staelens, Delfien; Blot, Stijn; Dinakis, Sylvie; Pelgrom, Jolanda; Plum, Jean; Vogelaers, Dirk; Vandekerckhove, Linos; Verhofstede, Chris
2010-09-07
The number of HIV-1 infected individuals in the Western world continues to rise. More in-depth understanding of regional HIV-1 epidemics is necessary for the optimal design and adequate use of future prevention strategies. The use of a combination of phylogenetic analysis of HIV sequences, with data on patients' demographics, infection route, clinical information and laboratory results, will allow a better characterization of individuals responsible for local transmission. Baseline HIV-1 pol sequences, obtained through routine drug-resistance testing, from 506 patients, newly diagnosed between 2001 and 2009, were used to construct phylogenetic trees and identify transmission-clusters. Patients' demographics, laboratory and clinical data, were retrieved anonymously. Statistical analysis was performed to identify subtype-specific and transmission-cluster-specific characteristics. Multivariate analysis showed significant differences between the 59.7% of individuals with subtype B infection and the 40.3% non-B infected individuals, with regard to route of transmission, origin, infection with Chlamydia (p = 0.01) and infection with Hepatitis C virus (p = 0.017). More and larger transmission-clusters were identified among the subtype B infections (p < 0.001). Overall, in multivariate analysis, clustering was significantly associated with Caucasian origin, infection through homosexual contact and younger age (all p < 0.001). Bivariate analysis additionally showed a correlation between clustering and syphilis (p < 0.001), higher CD4 counts (p = 0.002), Chlamydia infection (p = 0.013) and primary HIV (p = 0.017). Combination of phylogenetics with demographic information, laboratory and clinical data, revealed that HIV-1 subtype B infected Caucasian men-who-have-sex-with-men with high prevalence of sexually transmitted diseases, account for the majority of local HIV-transmissions. This finding elucidates observed epidemiological trends through molecular analysis, and justifies sustained focus in prevention on this high risk group.
Nara, Ayako; Hashimoto, Takuya; Komatsu, Mamoru; Nishiyama, Makoto; Kuzuyama, Tomohisa; Ikeda, Haruo
2017-05-01
Bafilomycins A 1 , C 1 and B 1 (setamycin) produced by Kitasatospora setae KM-6054 belong to the plecomacrolide family, which exhibit antibacterial, antifungal, antineoplastic and immunosuppressive activities. An analysis of gene clusters from K. setae KM-6054 governing the biosynthesis of bafilomycins revealed that it contains five large open reading frames (ORFs) encoding the multifunctional polypeptides of bafilomycin polyketide synthases (PKSs). These clustered PKS genes, which are responsible for bafilomycin biosynthesis, together encode 11 homologous sets of enzyme activities, each catalyzing a specific round of polyketide chain elongation. The region contains an additional 13 ORFs spanning a distance of 73 287 bp, some of which encode polypeptides governing other key steps in bafilomycin biosynthesis. Five ORFs, BfmB, BfmC, BfmD, BfmE and BfmF, were involved in the formation of methoxymalonyl-acyl carrier protein (ACP). Two possible regulatory genes, bfmR and bfmH, were found downstream of the above genes. A gene-knockout analysis revealed that BfmR was only a transcriptional regulator for the transcription of bafilomycin biosynthetic genes. Two genes, bfmI and bfmJ, were found downstream of bfmH. An analysis of these gene-disruption mutants in addition to an enzymatic analysis of BfmI and BfmJ revealed that BfmJ activated fumarate and BfmI functioned as a catalyst to form a fumaryl ester at the C21 hydroxyl residue of bafilomycin A 1 . A comparative analysis of bafilomycin gene clusters in K. setae KM-6054, Streptomyces lohii JCM 14114 and Streptomyces griseus DSM 2608 revealed that each ORF of both gene clusters in two Streptomyces strains were quite similar to each other. However, each ORF of gene cluster in K. setae KM-6054 was of lower similarity to that of corresponding ORF in the two Streptomyces species.
Yang, Guang; Raschke, Felix; Barrick, Thomas R; Howe, Franklyn A
2015-09-01
To investigate whether nonlinear dimensionality reduction improves unsupervised classification of (1) H MRS brain tumor data compared with a linear method. In vivo single-voxel (1) H magnetic resonance spectroscopy (55 patients) and (1) H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With (1) H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. The LE method is promising for unsupervised clustering to separate brain and tumor tissue with automated color-coding for visualization of (1) H MRSI data after cluster analysis. © 2014 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Schaefer, A. M.; Daniell, J. E.; Wenzel, F.
2014-12-01
Earthquake clustering tends to be an increasingly important part of general earthquake research especially in terms of seismic hazard assessment and earthquake forecasting and prediction approaches. The distinct identification and definition of foreshocks, aftershocks, mainshocks and secondary mainshocks is taken into account using a point based spatio-temporal clustering algorithm originating from the field of classic machine learning. This can be further applied for declustering purposes to separate background seismicity from triggered seismicity. The results are interpreted and processed to assemble 3D-(x,y,t) earthquake clustering maps which are based on smoothed seismicity records in space and time. In addition, multi-dimensional Gaussian functions are used to capture clustering parameters for spatial distribution and dominant orientations. Clusters are further processed using methodologies originating from geostatistics, which have been mostly applied and developed in mining projects during the last decades. A 2.5D variogram analysis is applied to identify spatio-temporal homogeneity in terms of earthquake density and energy output. The results are mitigated using Kriging to provide an accurate mapping solution for clustering features. As a case study, seismic data of New Zealand and the United States is used, covering events since the 1950s, from which an earthquake cluster catalogue is assembled for most of the major events, including a detailed analysis of the Landers and Christchurch sequences.
Persistent Topology and Metastable State in Conformational Dynamics
Chang, Huang-Wei; Bacallado, Sergio; Pande, Vijay S.; Carlsson, Gunnar E.
2013-01-01
The large amount of molecular dynamics simulation data produced by modern computational models brings big opportunities and challenges to researchers. Clustering algorithms play an important role in understanding biomolecular kinetics from the simulation data, especially under the Markov state model framework. However, the ruggedness of the free energy landscape in a biomolecular system makes common clustering algorithms very sensitive to perturbations of the data. Here, we introduce a data-exploratory tool which provides an overview of the clustering structure under different parameters. The proposed Multi-Persistent Clustering analysis combines insights from recent studies on the dynamics of systems with dominant metastable states with the concept of multi-dimensional persistence in computational topology. We propose to explore the clustering structure of the data based on its persistence on scale and density. The analysis provides a systematic way to discover clusters that are robust to perturbations of the data. The dominant states of the system can be chosen with confidence. For the clusters on the borderline, the user can choose to do more simulation or make a decision based on their structural characteristics. Furthermore, our multi-resolution analysis gives users information about the relative potential of the clusters and their hierarchical relationship. The effectiveness of the proposed method is illustrated in three biomolecules: alanine dipeptide, Villin headpiece, and the FiP35 WW domain. PMID:23565139
Liang, Xianrui; Ma, Meiling; Su, Weike
2013-01-01
Background: A method for chemical fingerprint analysis of Hibiscus mutabilis L. leaves was developed based on ultra performance liquid chromatography with photodiode array detector (UPLC-PAD) combined with similarity analysis (SA) and hierarchical clustering analysis (HCA). Materials and Methods: 10 batches of Hibiscus mutabilis L. leaves samples were collected from different regions of China. UPLC-PAD was employed to collect chemical fingerprints of Hibiscus mutabilis L. leaves. Results: The relative standard deviations (RSDs) of the relative retention times (RRT) and relative peak areas (RPA) of 10 characteristic peaks (one of them was identified as rutin) in precision, repeatability and stability test were less than 3%, and the method of fingerprint analysis was validated to be suitable for the Hibiscus mutabilis L. leaves. Conclusions: The chromatographic fingerprints showed abundant diversity of chemical constituents qualitatively in the 10 batches of Hibiscus mutabilis L. leaves samples from different locations by similarity analysis on basis of calculating the correlation coefficients between each two fingerprints. Moreover, the HCA method clustered the samples into four classes, and the HCA dendrogram showed the close or distant relations among the 10 samples, which was consistent to the SA result to some extent. PMID:23930008
Stephens, D.W.; Wangsgard, J.B.
1988-01-01
A computer program, Numerical Taxonomy System of Multivariate Statistical Programs (NTSYS), was used with interfacing software to perform cluster analyses of phytoplankton data stored in the biological files of the U.S. Geological Survey. The NTSYS software performs various types of statistical analyses and is capable of handling a large matrix of data. Cluster analyses were done on phytoplankton data collected from 1974 to 1981 at four national Stream Quality Accounting Network stations in the Tennessee River basin. Analysis of the changes in clusters of phytoplankton genera indicated possible changes in the water quality of the French Broad River near Knoxville, Tennessee. At this station, the most common diatom groups indicated a shift in dominant forms with some of the less common diatoms being replaced by green and blue-green algae. There was a reduction in genera variability between 1974-77 and 1979-81 sampling periods. Statistical analysis of chloride and dissolved solids confirmed that concentrations of these substances were smaller in 1974-77 than in 1979-81. At Pickwick Landing Dam, the furthest downstream station used in the study, there was an increase in the number of genera of ' rare ' organisms with time. The appearance of two groups of green and blue-green algae indicated that an increase in temperature or nutrient concentrations occurred from 1974 to 1981, but this could not be confirmed using available water quality data. Associations of genera forming the phytoplankton communities at three stations on the Tennessee River were found to be seasonal. Nodal analysis of combined data from all four stations used in the study did not identify any seasonal or temporal patterns during 1974-81. Cluster analysis using the NYSYS programs was effective in reducing the large phytoplankton data set to a manageable size and provided considerable insight into the structure of phytoplankton communities in the Tennessee River basin. Problems encountered using cluster analysis were the subjectivity introduced in the definition of meaningful clusters, and the lack of taxonomic identification to the species level. (Author 's abstract)
Atlas-Guided Cluster Analysis of Large Tractography Datasets
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
Stynes, Siobhán; Konstantinou, Kika; Ogollah, Reuben; Hay, Elaine M; Dunn, Kate M
2018-04-01
Traditionally, low back-related leg pain (LBLP) is diagnosed clinically as referred leg pain or sciatica (nerve root involvement). However, within the spectrum of LBLP, we hypothesised that there may be other unrecognised patient subgroups. This study aimed to identify clusters of patients with LBLP using latent class analysis and describe their clinical course. The study population was 609 LBLP primary care consulters. Variables from clinical assessment were included in the latent class analysis. Characteristics of the statistically identified clusters were compared, and their clinical course over 1 year was described. A 5 cluster solution was optimal. Cluster 1 (n = 104) had mild leg pain severity and was considered to represent a referred leg pain group with no clinical signs, suggesting nerve root involvement (sciatica). Cluster 2 (n = 122), cluster 3 (n = 188), and cluster 4 (n = 69) had mild, moderate, and severe pain and disability, respectively, and response to clinical assessment items suggested categories of mild, moderate, and severe sciatica. Cluster 5 (n = 126) had high pain and disability, longer pain duration, and more comorbidities and was difficult to map to a clinical diagnosis. Most improvement for pain and disability was seen in the first 4 months for all clusters. At 12 months, the proportion of patients reporting recovery ranged from 27% for cluster 5 to 45% for cluster 2 (mild sciatica). This is the first study that empirically shows the variability in profile and clinical course of patients with LBLP including sciatica. More homogenous groups were identified, which could be considered in future clinical and research settings.
Manchaiah, Vinaya; Ratinaud, Pierre; Andersson, Gerhard
2018-05-08
When people with health conditions begin to manage their health issues, one important issue that emerges is the question as to what exactly do they do with the information that they have obtained through various sources (eg, news media, social media, health professionals, friends, and family). The information they gather helps form their opinions and, to some degree, influences their attitudes toward managing their condition. This study aimed to understand how tinnitus is represented in the US newspaper media and in Facebook pages (ie, social media) using text pattern analysis. This was a cross-sectional study based upon secondary analyses of publicly available data. The 2 datasets (ie, text corpuses) analyzed in this study were generated from US newspaper media during 1980-2017 (downloaded from the database US Major Dailies by ProQuest) and Facebook pages during 2010-2016. The text corpuses were analyzed using the Iramuteq software using cluster analysis and chi-square tests. The newspaper dataset had 432 articles. The cluster analysis resulted in 5 clusters, which were named as follows: (1) brain stimulation (26.2%), (2) symptoms (13.5%), (3) coping (19.8%), (4) social support (24.2%), and (5) treatment innovation (16.4%). A time series analysis of clusters indicated a change in the pattern of information presented in newspaper media during 1980-2017 (eg, more emphasis on cluster 5, focusing on treatment inventions). The Facebook dataset had 1569 texts. The cluster analysis resulted in 7 clusters, which were named as: (1) diagnosis (21.9%), (2) cause (4.1%), (3) research and development (13.6%), (4) social support (18.8%), (5) challenges (11.1%), (6) symptoms (21.4%), and (7) coping (9.2%). A time series analysis of clusters indicated no change in information presented in Facebook pages on tinnitus during 2011-2016. The study highlights the specific aspects about tinnitus that the US newspaper media and Facebook pages focus on, as well as how these aspects change over time. These findings can help health care providers better understand the presuppositions that tinnitus patients may have. More importantly, the findings can help public health experts and health communication experts in tailoring health information about tinnitus to promote self-management, as well as assisting in appropriate choices of treatment for those living with tinnitus. ©Vinaya Manchaiah, Pierre Ratinaud, Gerhard Andersson. Originally published in the Interactive Journal of Medical Research (http://www.i-jmr.org/), 08.05.2018.
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
A Bimodal Hybrid Model for Time-Dependent Probabilistic Seismic Hazard Analysis
NASA Astrophysics Data System (ADS)
Yaghmaei-Sabegh, Saman; Shoaeifar, Nasser; Shoaeifar, Parva
2018-03-01
The evaluation of evidence provided by geological studies and historical catalogs indicates that in some seismic regions and faults, multiple large earthquakes occur in cluster. Then, the occurrences of large earthquakes confront with quiescence and only the small-to-moderate earthquakes take place. Clustering of large earthquakes is the most distinguishable departure from the assumption of constant hazard of random occurrence of earthquakes in conventional seismic hazard analysis. In the present study, a time-dependent recurrence model is proposed to consider a series of large earthquakes that occurs in clusters. The model is flexible enough to better reflect the quasi-periodic behavior of large earthquakes with long-term clustering, which can be used in time-dependent probabilistic seismic hazard analysis with engineering purposes. In this model, the time-dependent hazard results are estimated by a hazard function which comprises three parts. A decreasing hazard of last large earthquake cluster and an increasing hazard of the next large earthquake cluster, along with a constant hazard of random occurrence of small-to-moderate earthquakes. In the final part of the paper, the time-dependent seismic hazard of the New Madrid Seismic Zone at different time intervals has been calculated for illustrative purpose.
Graph analysis of cell clusters forming vascular networks
NASA Astrophysics Data System (ADS)
Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.
2018-03-01
This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.
Miyamoto, Yuki; Momose, Takamasa; Kanamori, Hideto
2012-11-21
Infrared absorption spectra of methyl fluoride with ortho-hydrogen (ortho-H(2)) clusters in a solid para-hydrogen (para-H(2)) crystal at 3.6 K were studied in the C-H stretching fundamental region (~3000 cm(-1)) using an FTIR spectrometer. As shown previously, the ν(3) C-F stretching fundamental band of CH(3)F-(ortho-H(2))(n) (n = 0, 1, 2, ...) clusters at 1040 cm(-1) shows a series of n discrete absorption lines, which correspond to different-sized clusters. We observed three unresolved broad peaks in the C-H stretching region and applied this cluster model to them assuming the same intensity distribution function as the ν(3) band. A fitting analysis successfully gave us the linewidth and lineshift of the components in each vibrational band. It was found that the separately determined linewidth, matrix shift of the band origin, and cluster shift are dependent on the vibrational mode. From the transition intensities of the monomer component derived from the fitting analysis, we discuss the mixing ratio of the vibrational modes due to Fermi resonance.
Wan, B; Yarbrough, J W; Schultz, T W
2008-01-01
This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.
Space-time analysis of pneumonia hospitalisations in the Netherlands.
Benincà, Elisa; van Boven, Michiel; Hagenaars, Thomas; van der Hoek, Wim
2017-01-01
Community acquired pneumonia is a major global public health problem. In the Netherlands there are 40,000-50,000 hospital admissions for pneumonia per year. In the large majority of these hospital admissions the etiologic agent is not determined and a real-time surveillance system is lacking. Localised and temporal increases in hospital admissions for pneumonia are therefore only detected retrospectively and the etiologic agents remain unknown. Here, we perform spatio-temporal analyses of pneumonia hospital admission data in the Netherlands. To this end, we scanned for spatial clusters on yearly and seasonal basis, and applied wavelet cluster analysis on the time series of five main regions. The pneumonia hospital admissions show strong clustering in space and time superimposed on a regular yearly cycle with high incidence in winter and low incidence in summer. Cluster analysis reveals a heterogeneous pattern, with most significant clusters occurring in the western, highly urbanised, and in the eastern, intensively farmed, part of the Netherlands. Quantitatively, the relative risk (RR) of the significant clusters for the age-standardised incidence varies from a minimum of 1.2 to a maximum of 2.2. We discuss possible underlying causes for the patterns observed, such as variations in air pollution.
Cluster randomised trials in the medical literature: two bibliometric surveys
Bland, J Martin
2004-01-01
Background Several reviews of published cluster randomised trials have reported that about half did not take clustering into account in the analysis, which was thus incorrect and potentially misleading. In this paper I ask whether cluster randomised trials are increasing in both number and quality of reporting. Methods Computer search for papers on cluster randomised trials since 1980, hand search of trial reports published in selected volumes of the British Medical Journal over 20 years. Results There has been a large increase in the numbers of methodological papers and of trial reports using the term 'cluster random' in recent years, with about equal numbers of each type of paper. The British Medical Journal contained more such reports than any other journal. In this journal there was a corresponding increase over time in the number of trials where subjects were randomised in clusters. In 2003 all reports showed awareness of the need to allow for clustering in the analysis. In 1993 and before clustering was ignored in most such trials. Conclusion Cluster trials are becoming more frequent and reporting is of higher quality. Perhaps statistician pressure works. PMID:15310402
Applied anatomic site study of palatal anchorage implants using cone beam computed tomography.
Lai, Ren-fa; Zou, Hui; Kong, Wei-dong; Lin, Wei
2010-06-01
The purpose of this study was to conduct quantitative research on bone height and bone mineral density of palatal implant sites for implantation, and to provide reference sites for safe and stable palatal implants. Three-dimensional reformatting images were reconstructed by cone beam computed tomography (CBCT) in 34 patients, aged 18 to 35 years, using EZ Implant software. Bone height was measured at 20 sites of interest on the palate. Bone mineral density was measured at the 10 sites with the highest implantation rate, classified using K-mean cluster analysis based on bone height and bone mineral density. According to the cluster analysis, 10 sites were classified into three clusters. Significant differences in bone height and bone mineral density were detected between these three clusters (P<0.05). The greatest bone height was obtained in cluster 2, followed by cluster 1 and cluster 3. The highest bone mineral density was found in cluster 3, followed by cluster 1 and cluster 2. CBCT plays an important role in pre-surgical treatment planning. CBCT is helpful in identifying safe and stable implantation sites for palatal anchorage.
Alexander, Nathan; Woetzel, Nils; Meiler, Jens
2011-02-01
Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.
Network module detection: Affinity search technique with the multi-node topological overlap measure
Li, Ai; Horvath, Steve
2009-01-01
Background Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. Findings We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Conclusion Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: PMID:19619323
Network module detection: Affinity search technique with the multi-node topological overlap measure.
Li, Ai; Horvath, Steve
2009-07-20
Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/
On selecting a prior for the precision parameter of Dirichlet process mixture models
Dorazio, R.M.
2009-01-01
In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter ?? and a base probability measure G0. In problems where ?? is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for ??. In this paper an approach is developed for computing a prior for the precision parameter ?? that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.
Preliminary Comparisons of the Information Content and Utility of TM Versus MSS Data
NASA Technical Reports Server (NTRS)
Markham, B. L.
1984-01-01
Comparisons were made between subscenes from the first TM scene acquired of the Washington, D.C. area and a MSS scene acquired approximately one year earlier. Three types of analyses were conducted to compare TM and MSS data: a water body analysis, a principal components analysis and a spectral clustering analysis. The water body analysis compared the capability of the TM to the MSS for detecting small uniform targets. Of the 59 ponds located on aerial photographs 34 (58%) were detected by the TM with six commission errors (15%) and 13 (22%) were detected by the MSS with three commission errors (19%). The smallest water body detected by the TM was 16 meters; the smallest detected by the MSS was 40 meters. For the principal components analysis, means and covariance matrices were calculated for each subscene, and principal components images generated and characterized. In the spectral clustering comparison each scene was independently clustered and the clusters were assigned to informational classes. The preliminary comparison indicated that TM data provides enhancements over MSS in terms of (1) small target detection and (2) data dimensionality (even with 4-band data). The extra dimension, partially resultant from TM band 1, appears useful for built-up/non-built-up area separation.
Manchaiah, Vinaya; Zhao, Fei; Oladeji, Susan; Ratinaud, Pierre
2018-01-01
Purpose: The current study was aimed at understanding the patterns in the social representation of loud music reported by young adults in different countries. Materials and Methods: The study included a sample of 534 young adults (18–25 years) from India, Iran, Portugal, United Kingdom, and United States. Participants were recruited using a convince sampling, and data were collected using the free association task. Participants were asked to provide up to five words or phrases that come to mind when thinking about “loud music.” The data were first analyzed using the qualitative content analysis. This was followed by quantitative cluster analysis and chi-square analysis. Results: The content analysis suggested 19 main categories of responses related to loud music. The cluster analysis resulted in for main clusters, namely: (1) emotional oriented perception; (2) problem oriented perception; (3) music and enjoyment oriented perception; and (4) positive emotional and recreation-oriented perception. Country of origin was associated with the likelihood of participants being in each of these clusters. Conclusion: The current study highlights the differences and similarities in young adults’ perception of loud music. These results may have implications to hearing health education to facilitate healthy listening habits. PMID:29457602
Guo, Qi; Lu, Xiaoni; Gao, Ya; Zhang, Jingjing; Yan, Bin; Su, Dan; Song, Anqi; Zhao, Xi; Wang, Gang
2017-03-07
Grading of essential hypertension according to blood pressure (BP) level may not adequately reflect clinical heterogeneity of hypertensive patients. This study was carried out to explore clinical phenotypes in essential hypertensive patients using cluster analysis. This study recruited 513 hypertensive patients and evaluated BP variations with ambulatory blood pressure monitoring. Four distinct hypertension groups were identified using cluster analysis: (1) younger male smokers with relatively high BP had the most severe carotid plaque thickness but no coronary artery disease (CAD); (2) older women with relatively low diastolic BP had more diabetes; (3) non-smokers with a low systolic BP level had neither diabetes nor CAD; (4) hypertensive patients with BP reverse dipping were most likely to have CAD but had least severe carotid plaque thickness. In binary logistic analysis, reverse dipping was significantly associated with prevalence of CAD. Cluster analysis was shown to be a feasible approach for investigating the heterogeneity of essential hypertension in clinical studies. BP reverse dipping might be valuable for prediction of CAD in hypertensive patients when compared with carotid plaque thickness. However, large-scale prospective trials with more information of plaque morphology are necessary to further compare the predicative power between BP dipping pattern and carotid plaque.
Guo, Qi; Lu, Xiaoni; Gao, Ya; Zhang, Jingjing; Yan, Bin; Su, Dan; Song, Anqi; Zhao, Xi; Wang, Gang
2017-01-01
Grading of essential hypertension according to blood pressure (BP) level may not adequately reflect clinical heterogeneity of hypertensive patients. This study was carried out to explore clinical phenotypes in essential hypertensive patients using cluster analysis. This study recruited 513 hypertensive patients and evaluated BP variations with ambulatory blood pressure monitoring. Four distinct hypertension groups were identified using cluster analysis: (1) younger male smokers with relatively high BP had the most severe carotid plaque thickness but no coronary artery disease (CAD); (2) older women with relatively low diastolic BP had more diabetes; (3) non-smokers with a low systolic BP level had neither diabetes nor CAD; (4) hypertensive patients with BP reverse dipping were most likely to have CAD but had least severe carotid plaque thickness. In binary logistic analysis, reverse dipping was significantly associated with prevalence of CAD. Cluster analysis was shown to be a feasible approach for investigating the heterogeneity of essential hypertension in clinical studies. BP reverse dipping might be valuable for prediction of CAD in hypertensive patients when compared with carotid plaque thickness. However, large-scale prospective trials with more information of plaque morphology are necessary to further compare the predicative power between BP dipping pattern and carotid plaque. PMID:28266630
Geographic atrophy phenotype identification by cluster analysis.
Monés, Jordi; Biarnés, Marc
2018-03-01
To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm 2 /year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Berenguer, Roberto; Pastor-Juan, María Del Rosario; Canales-Vázquez, Jesús; Castro-García, Miguel; Villas, María Victoria; Legorburo, Francisco Mansilla; Sabater, Sebastià
2018-04-24
Purpose To identify the reproducible and nonredundant radiomics features (RFs) for computed tomography (CT). Materials and Methods Two phantoms were used to test RF reproducibility by using test-retest analysis, by changing the CT acquisition parameters (hereafter, intra-CT analysis), and by comparing five different scanners with the same CT parameters (hereafter, inter-CT analysis). Reproducible RFs were selected by using the concordance correlation coefficient (as a measure of the agreement between variables) and the coefficient of variation (defined as the ratio of the standard deviation to the mean). Redundant features were grouped by using hierarchical cluster analysis. Results A total of 177 RFs including intensity, shape, and texture features were evaluated. The test-retest analysis showed that 91% (161 of 177) of the RFs were reproducible according to concordance correlation coefficient. Reproducibility of intra-CT RFs, based on coefficient of variation, ranged from 89.3% (151 of 177) to 43.1% (76 of 177) where the pitch factor and the reconstruction kernel were modified, respectively. Reproducibility of inter-CT RFs, based on coefficient of variation, also showed large material differences, from 85.3% (151 of 177; wood) to only 15.8% (28 of 177; polyurethane). Ten clusters were identified after the hierarchical cluster analysis and one RF per cluster was chosen as representative. Conclusion Many RFs were redundant and nonreproducible. If all the CT parameters are fixed except field of view, tube voltage, and milliamperage, then the information provided by the analyzed RFs can be summarized in only 10 RFs (each representing a cluster) because of redundancy. © RSNA, 2018 Online supplemental material is available for this article.
Fuzzy cluster analysis of high-field functional MRI data.
Windischberger, Christian; Barth, Markus; Lamm, Claus; Schroeder, Lee; Bauer, Herbert; Gur, Ruben C; Moser, Ewald
2003-11-01
Functional magnetic resonance imaging (fMRI) based on blood-oxygen level dependent (BOLD) contrast today is an established brain research method and quickly gains acceptance for complementary clinical diagnosis. However, neither the basic mechanisms like coupling between neuronal activation and haemodynamic response are known exactly, nor can the various artifacts be predicted or controlled. Thus, modeling functional signal changes is non-trivial and exploratory data analysis (EDA) may be rather useful. In particular, identification and separation of artifacts as well as quantification of expected, i.e. stimulus correlated, and novel information on brain activity is important for both, new insights in neuroscience and future developments in functional MRI of the human brain. After an introduction on fuzzy clustering and very high-field fMRI we present several examples where fuzzy cluster analysis (FCA) of fMRI time series helps to identify and locally separate various artifacts. We also present and discuss applications and limitations of fuzzy cluster analysis in very high-field functional MRI: differentiate temporal patterns in MRI using (a) a test object with static and dynamic parts, (b) artifacts due to gross head motion artifacts. Using a synthetic fMRI data set we quantitatively examine the influences of relevant FCA parameters on clustering results in terms of receiver-operator characteristics (ROC) and compare them with a commonly used model-based correlation analysis (CA) approach. The application of FCA in analyzing in vivo fMRI data is shown for (a) a motor paradigm, (b) data from multi-echo imaging, and (c) a fMRI study using mental rotation of three-dimensional cubes. We found that differentiation of true "neural" from false "vascular" activation is possible based on echo time dependence and specific activation levels, as well as based on their signal time-course. Exploratory data analysis methods in general and fuzzy cluster analysis in particular may help to identify artifacts and add novel and unexpected information valuable for interpretation, classification and characterization of functional MRI data which can be used to design new data acquisition schemes, stimulus presentations, neuro(physio)logical paradigms, as well as to improve quantitative biophysical models.
Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2014-01-01
Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900
A CLIPS expert system for clinical flow cytometry data analysis
NASA Technical Reports Server (NTRS)
Salzman, G. C.; Duque, R. E.; Braylan, R. C.; Stewart, C. C.
1990-01-01
An expert system is being developed using CLIPS to assist clinicians in the analysis of multivariate flow cytometry data from cancer patients. Cluster analysis is used to find subpopulations representing various cell types in multiple datasets each consisting of four to five measurements on each of 5000 cells. CLIPS facts are derived from results of the clustering. CLIPS rules are based on the expertise of Drs. Stewart, Duque, and Braylan. The rules incorporate certainty factors based on case histories.
Clustering in analytical chemistry.
Drab, Klaudia; Daszykowski, Michal
2014-01-01
Data clustering plays an important role in the exploratory analysis of analytical data, and the use of clustering methods has been acknowledged in different fields of science. In this paper, principles of data clustering are presented with a direct focus on clustering of analytical data. The role of the clustering process in the analytical workflow is underlined, and its potential impact on the analytical workflow is emphasized.
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.; Henson, Robert
2012-01-01
A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…
Characteristics of airflow and particle deposition in COPD current smokers
NASA Astrophysics Data System (ADS)
Zou, Chunrui; Choi, Jiwoong; Haghighi, Babak; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long
2017-11-01
A recent imaging-based cluster analysis of computed tomography (CT) lung images in a chronic obstructive pulmonary disease (COPD) cohort identified four clusters, viz. disease sub-populations. Cluster 1 had relatively normal airway structures; Cluster 2 had wall thickening; Cluster 3 exhibited decreased wall thickness and luminal narrowing; Cluster 4 had a significant decrease of luminal diameter and a significant reduction of lung deformation, thus having relatively low pulmonary functions. To better understand the characteristics of airflow and particle deposition in these clusters, we performed computational fluid and particle dynamics analyses on representative cluster patients and healthy controls using CT-based airway models and subject-specific 3D-1D coupled boundary conditions. The results show that particle deposition in central airways of cluster 4 patients was noticeably increased especially with increasing particle size despite reduced vital capacity as compared to other clusters and healthy controls. This may be attributable in part to significant airway constriction in cluster 4. This study demonstrates the potential application of cluster-guided CFD analysis in disease populations. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837.
From virtual clustering analysis to self-consistent clustering analysis: a mathematical study
NASA Astrophysics Data System (ADS)
Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam
2018-03-01
In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.
Zhang, Zhi-Guo; Song, Chang-Heng; Zhang, Fang-Zhen; Chen, Yan-Jing; Xiang, Li-Hua; Xiao, Gary Guishan; Ju, Da-Hong
2016-06-01
Rhizoma Dioscoreae extract (RDE) exhibits a protective effect on alveolar bone loss in ovariectomized (OVX) rats. The aim of this study was to predict the pathways or targets that are regulated by RDE, by re‑assessing our previously reported data and conducting a protein‑protein interaction (PPI) network analysis. In total, 383 differentially expressed genes (≥3‑fold) between alveolar bone samples from the RDE and OVX group rats were identified, and a PPI network was constructed based on these genes. Furthermore, four molecular clusters (A‑D) in the PPI network with the smallest P‑values were detected by molecular complex detection (MCODE) algorithm. Using Database for Annotation, Visualization and Integrated Discovery (DAVID) and Ingenuity Pathway Analysis (IPA) tools, two molecular clusters (A and B) were enriched for biological process in Gene Ontology (GO). Only cluster A was associated with biological pathways in the IPA database. GO and pathway analysis results showed that cluster A, associated with cell cycle regulation, was the most important molecular cluster in the PPI network. In addition, cyclin‑dependent kinase 1 (CDK1) may be a key molecule achieving the cell‑cycle‑regulatory function of cluster A. From the PPI network analysis, it was predicted that delayed cell cycle progression in excessive alveolar bone remodeling via downregulation of CDK1 may be another mechanism underling the anti‑osteopenic effect of RDE on alveolar bone.
Frejo, L; Martin-Sanz, E; Teggi, R; Trinidad, G; Soto-Varela, A; Santos-Perez, S; Manrique, R; Perez, N; Aran, I; Almeida-Branco, M S; Batuecas-Caletrio, A; Fraile, J; Espinosa-Sanchez, J M; Perez-Guillen, V; Perez-Garrigues, H; Oliva-Dominguez, M; Aleman, O; Benitez, J; Perez, P; Lopez-Escamez, J A
2017-12-01
To define clinical subgroups by cluster analysis in patients with unilateral Meniere disease (MD) and to compare them with the clinical subgroups found in bilateral MD. A cross-sectional study with a two-step cluster analysis. A tertiary referral multicenter study. Nine hundred and eighty-eight adult patients with unilateral MD. best predictors to define clinical subgroups with potential different aetiologies. We established five clusters in unilateral MD. Group 1 is the most frequently found, includes 53% of patients, and it is defined as the sporadic, classic MD without migraine and without autoimmune disorder (AD). Group 2 is found in 8% of patients, and it is defined by hearing loss, which antedates the vertigo episodes by months or years (delayed MD), without migraine or AD in most of cases. Group 3 involves 13% of patients, and it is considered familial MD, while group 4, which includes 15% of patients, is linked to the presence of migraine in all cases. Group 5 is found in 11% of patients and is defined by a comorbid AD. We found significant differences in the distribution of AD in clusters 3, 4 and 5 between patients with uni- and bilateral MD. Cluster analysis defines clinical subgroups in MD, and it extends the phenotype beyond audiovestibular symptoms. This classification will help to improve the phenotyping in MD and facilitate the selection of patients for randomised clinical trials. © 2017 John Wiley & Sons Ltd.
Clustering analysis of proteins from microbial genomes at multiple levels of resolution.
Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana
2016-08-31
Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.
Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne
2015-10-01
The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking <300 minutes per week of moderate activity was significantly greater in cluster 1 than in clusters 2 and 3. Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups.
Schacht, Julia; Gaston, Nicola
2016-10-18
The electronic properties of doped thiolate-protected gold clusters are often referred to as tunable, but their study to date, conducted at different levels of theory, does not allow a systematic evaluation of this claim. Here, using density functional theory, the applicability of the superatomic model to these clusters is critically evaluated, and related to the degree of structural distortion and electronic inhomogeneity in the differently doped clusters, with dopant atoms Pd, Pt, Cu, and Ag. The effect of electron number is systematically evaluated by varying the charge on the overall cluster, and the nominal number of delocalized electrons, employed in the superatomic model, is compared to the numbers obtained from Bader analysis of individual atomic charges. We find that the superatomic model is highly applicable to all of these clusters, and is able to predict and explain the changing electronic structure as a function of charge. However, significant perturbations of the model arise due to doping, due to distortions of the core structure of the Au 13 [RS(AuSR) 2 ] 6 - cluster. In addition, analysis of the electronic structure indicates that the superatomic character is distributed further across the ligand shell in the case of the doped clusters, which may have implications for the self-assembly of these clusters into materials. The prediction of appropriate clusters for such superatomic solids relies critically on such quantitative analysis of the tunability of the electronic structure. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Subgroups of advanced cancer patients clustered by their symptom profiles: quality-of-life outcomes.
Husain, Amna; Myers, Jeff; Selby, Debbie; Thomson, Barbara; Chow, Edward
2011-11-01
Symptom cluster analysis is a new frontier of research in symptom management. This study clustered patients by their symptom profiles to identify subgroups that may be at higher risk for poor quality of life (QOL) and that may, therefore, benefit most from targeted interventions. Longitudinal study of metastatic cancer patients using the Edmonton Symptom Assessment Scale (ESAS). We generated two-, three-, and four-cluster subgroups and examined the relationship of cluster membership with patient outcomes. To address the problem of missing longitudinal data, we developed a novel outcome variable (QualTime) that measures both QOL and time in study. Two hundred and twenty-one patients with a mean Palliative Performance Scale (PPS) of 59.1 were enrolled. The three-cluster model was chosen for further analysis. The low-burden subgroup had all low severity symptom scores. The intermediate subgroup separates from the low-burden group on the "debility" profile of fatigue, drowsiness, appetite, and well-being. The high-burden group separates from the intermediate-burden group on pain, depression, and anxiety. At baseline, PPS (p=0.0003) and cluster membership (p<0.0001) contributed significantly to global QOL. In univariate analysis, cluster membership was related to the longitudinal outcome, QualTime. In a multivariate model, the relationship of PPS to QualTime was still significant (p=0.0002), but subgroup membership was no longer significant (p=0.1009). PPS is a stronger predictor of the longitudinal variable than cluster subgroups; however, cluster subgroups provide a target for clinical interventions that may improve QOL.
ERIC Educational Resources Information Center
Kung, Hsiang-Te; And Others
1993-01-01
In spite of rapid progress achieved in the methodological research underlying environmental impact assessment (EIA), the problem of weighting various parameters has not yet been solved. This paper presents a new approach, fuzzy clustering analysis, which is illustrated with an EIA case study on Baoshan-Wusong District in Shanghai, China. (Author)
USDA-ARS?s Scientific Manuscript database
The mechanisms as well the genetics underlying bioavailability and metabolism of carotenoids in humans remains unclear. The individual temporal response of plasma carotenoids was analyzed in adults who consumed carotenoid-containing juices on a controlled-diet study using cluster analysis. Treatmen...
Profiles of More and Less Successful L2 Learners: A Cluster Analysis Study
ERIC Educational Resources Information Center
Sparks, Richard L.; Patton, Jon; Ganschow, Leonore
2012-01-01
This retrospective study examined L1 achievement, intelligence, L2 aptitude, and L2 proficiency profiles of 208 students completing two years of high school L2 courses. A cluster analysis was performed to determine whether distinct cognitive and achievement profiles of more and less successful L2 learners would emerge. The results of…
Cluster Analysis of Junior High School Students' Cognitive Structures
ERIC Educational Resources Information Center
Dan, Youngjun; Geng, Leisha; Li, Meng
2017-01-01
This study aimed to explore students' cognitive patterns based on their knowledge and levels. Participants were seventh graders from a junior high school in China. Three relatively distinct groups were specified by Cluster Analysis: high knowledge and low ability, low knowledge and low ability, and high knowledge and high ability. The group of low…
Cluster Analysis of Assessment in Anatomy and Physiology for Health Science Undergraduates
ERIC Educational Resources Information Center
Brown, Stephen; White, Sue; Power, Nicola
2016-01-01
Academic content common to health science programs is often taught to a mixed group of students; however, content assessment may be consistent for each discipline. This study used a retrospective cluster analysis on such a group, first to identify high and low achieving students, and second, to determine the distribution of students within…
Toward a Richer View of the Scientific Method: The Role of Conceptual Analysis
ERIC Educational Resources Information Center
Machado, Armando; Silva, Francisco J.
2007-01-01
Within the complex set of activities that comprise the scientific method, three clusters of activities can be recognized: experimentation, mathematization, and conceptual analysis. In psychology, the first two of these clusters are well-known and valued, but the third seems less known and valued. The authors show the value of these three clusters…
Lipoprotein lipase S447X variant associated with VLDL, LDL and HDL diameter clustering in the MetS
USDA-ARS?s Scientific Manuscript database
Previous analysis clustered 1,238 individuals from the general population Genetics of Lipid Lowering Drugs Network (GOLDN) study by the size of their fasting very low-density, low-density and high-density lipoproteins (VLDL, LDL, HDL) using latent class analysis. From two of the eight identified gro...
ERIC Educational Resources Information Center
Donoghue, John R.
A Monte Carlo study compared the usefulness of six variable weighting methods for cluster analysis. Data were 100 bivariate observations from 2 subgroups, generated according to a finite normal mixture model. Subgroup size, within-group correlation, within-group variance, and distance between subgroup centroids were manipulated. Of the clustering…
Use of LANDSAT imagery for wildlife habitat mapping in northeast and eastcentral Alaska
NASA Technical Reports Server (NTRS)
Lent, P. C. (Principal Investigator)
1976-01-01
The author has identified the following significant results. There is strong indication that spatially rare feature classes may be missed in clustering classifications based on 2% random sampling. Therefore, it seems advisable to augment random sampling for cluster analysis with directed sampling of any spatially rare features which are relevant to the analysis.
ERIC Educational Resources Information Center
Cuccaro, Michael L.; Tuchman, Roberto F.; Hamilton, Kara L.; Wright, Harry H.; Abramson, Ruth K.; Haines, Jonathan L.; Gilbert, John R.; Pericak-Vance, Margaret
2012-01-01
Epilepsy co-occurs frequently in autism spectrum disorders (ASD). Understanding this co-occurrence requires a better understanding of the ASD-epilepsy phenotype (or phenotypes). To address this, we conducted latent class cluster analysis (LCCA) on an ASD dataset (N = 577) which included 64 individuals with epilepsy. We identified a 5-cluster…
Student Motivational Profiles in an Introductory MIS Course: An Exploratory Cluster Analysis
ERIC Educational Resources Information Center
Nelson, Klara
2014-01-01
This study profiles students in an introductory MIS course according to a variety of variables associated with choice of academic major. The data were collected through a survey administered to 12 sections of the course. A two-step cluster analysis was performed with gender as a categorical variable and students' perceptions of task value…
2 x 2 Achievement Goals and Achievement Emotions: A Cluster Analysis of Students' Motivation
ERIC Educational Resources Information Center
Jang, Leong Yeok; Liu, Woon Chia
2012-01-01
This study sought to better understand the adoption of multiple achievement goals at an intra-individual level, and its links to emotional well-being, learning, and academic achievement. Participants were 480 Secondary Two students (aged between 13 and 14 years) from two coeducational government schools. Hierarchical cluster analysis revealed the…
NASA Technical Reports Server (NTRS)
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
Comparative analysis on the selection of number of clusters in community detection
NASA Astrophysics Data System (ADS)
Kawamoto, Tatsuro; Kabashima, Yoshiyuki
2018-02-01
We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
NASA Astrophysics Data System (ADS)
Pillepich, Annalisa; Porciani, Cristiano; Reiprich, Thomas H.
2012-05-01
Starting in late 2013, the eRosita telescope will survey the X-ray sky with unprecedented sensitivity. Assuming a detection limit of 50 photons in the (0.5-2.0) keV energy band with a typical exposure time of 1.6 ks, we predict that eRosita will detect ˜9.3 × 104 clusters of galaxies more massive than 5 × 1013 h-1 M⊙, with the currently planned all-sky survey. Their median redshift will be z≃ 0.35. We perform a Fisher-matrix analysis to forecast the constraining power of ? on the Λ cold dark matter (ΛCDM) cosmology and, simultaneously, on the X-ray scaling relations for galaxy clusters. Special attention is devoted to the possibility of detecting primordial non-Gaussianity. We consider two experimental probes: the number counts and the angular clustering of a photon-count limited sample of clusters. We discuss how the cluster sample should be split to optimize the analysis and we show that redshift information of the individual clusters is vital to break the strong degeneracies among the model parameters. For example, performing a 'tomographic' analysis based on photometric-redshift estimates and combining one- and two-point statistics will give marginal 1σ errors of Δσ8≃ 0.036 and ΔΩm≃ 0.012 without priors, and improve the current estimates on the slope of the luminosity-mass relation by a factor of 3. Regarding primordial non-Gaussianity, ? clusters alone will give ΔfNL≃ 9, 36 and 144 for the local, orthogonal and equilateral model, respectively. Measuring redshifts with spectroscopic accuracy would further tighten the constraints by nearly 40 per cent (barring fNL which displays smaller improvements). Finally, combining ? data with the analysis of temperature anisotropies in the cosmic microwave background by the Planck satellite should give sensational constraints on both the cosmology and the properties of the intracluster medium.
NASA Astrophysics Data System (ADS)
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.
A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.
The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in jobmore » queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.« less
Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario
2014-01-01
Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.
Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain
2015-10-01
Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia
2016-04-01
Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.
A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.
Brusco, Michael J; Shireman, Emilie; Steinley, Douglas
2017-09-01
The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Aftershock identification problem via the nearest-neighbor analysis for marked point processes
NASA Astrophysics Data System (ADS)
Gabrielov, A.; Zaliapin, I.; Wong, H.; Keilis-Borok, V.
2007-12-01
The centennial observations on the world seismicity have revealed a wide variety of clustering phenomena that unfold in the space-time-energy domain and provide most reliable information about the earthquake dynamics. However, there is neither a unifying theory nor a convenient statistical apparatus that would naturally account for the different types of seismic clustering. In this talk we present a theoretical framework for nearest-neighbor analysis of marked processes and obtain new results on hierarchical approach to studying seismic clustering introduced by Baiesi and Paczuski (2004). Recall that under this approach one defines an asymmetric distance D in space-time-energy domain such that the nearest-neighbor spanning graph with respect to D becomes a time- oriented tree. We demonstrate how this approach can be used to detect earthquake clustering. We apply our analysis to the observed seismicity of California and synthetic catalogs from ETAS model and show that the earthquake clustering part is statistically different from the homogeneous part. This finding may serve as a basis for an objective aftershock identification procedure.
Functional Connectivity Parcellation of the Human Thalamus by Independent Component Analysis.
Zhang, Sheng; Li, Chiang-Shan R
2017-11-01
As a key structure to relay and integrate information, the thalamus supports multiple cognitive and affective functions through the connectivity between its subnuclei and cortical and subcortical regions. Although extant studies have largely described thalamic regional functions in anatomical terms, evidence accumulates to suggest a more complex picture of subareal activities and connectivities of the thalamus. In this study, we aimed to parcellate the thalamus and examine whole-brain connectivity of its functional clusters. With resting state functional magnetic resonance imaging data from 96 adults, we used independent component analysis (ICA) to parcellate the thalamus into 10 components. On the basis of the independence assumption, ICA helps to identify how subclusters overlap spatially. Whole brain functional connectivity of each subdivision was computed for independent component's time course (ICtc), which is a unique time series to represent an IC. For comparison, we computed seed-region-based functional connectivity using the averaged time course across all voxels within a thalamic subdivision. The results showed that, at p < 10 -6 , corrected, 49% of voxels on average overlapped among subdivisions. Compared with seed-region analysis, ICtc analysis revealed patterns of connectivity that were more distinguished between thalamic clusters. ICtc analysis demonstrated thalamic connectivity to the primary motor cortex, which has eluded the analysis as well as previous studies based on averaged time series, and clarified thalamic connectivity to the hippocampus, caudate nucleus, and precuneus. The new findings elucidate functional organization of the thalamus and suggest that ICA clustering in combination with ICtc rather than seed-region analysis better distinguishes whole-brain connectivities among functional clusters of a brain region.
Generalized Analysis Tools for Multi-Spacecraft Missions
NASA Astrophysics Data System (ADS)
Chanteur, G. M.
2011-12-01
Analysis tools for multi-spacecraft missions like CLUSTER or MMS have been designed since the end of the 90's to estimate gradients of fields or to characterize discontinuities crossed by a cluster of spacecraft. Different approaches have been presented and discussed in the book "Analysis Methods for Multi-Spacecraft Data" published as Scientific Report 001 of the International Space Science Institute in Bern, Switzerland (G. Paschmann and P. Daly Eds., 1998). On one hand the approach using methods of least squares has the advantage to apply to any number of spacecraft [1] but is not convenient to perform analytical computation especially when considering the error analysis. On the other hand the barycentric approach is powerful as it provides simple analytical formulas involving the reciprocal vectors of the tetrahedron [2] but appears limited to clusters of four spacecraft. Moreover the barycentric approach allows to derive theoretical formulas for errors affecting the estimators built from the reciprocal vectors [2,3,4]. Following a first generalization of reciprocal vectors proposed by Vogt et al [4] and despite the present lack of projects with more than four spacecraft we present generalized reciprocal vectors for a cluster made of any number of spacecraft : each spacecraft is given a positive or nul weight. The non-coplanarity of at least four spacecraft with strictly positive weights is a necessary and sufficient condition for this analysis to be enabled. Weights given to spacecraft allow to minimize the influence of some spacecraft if its location or the quality of its data are not appropriate, or simply to extract subsets of spacecraft from the cluster. Estimators presented in [2] are generalized within this new frame except for the error analysis which is still under investigation. References [1] Harvey, C. C.: Spatial Gradients and the Volumetric Tensor, in: Analysis Methods for Multi-Spacecraft Data, G. Paschmann and P. Daly (eds.), pp. 307-322, ISSI SR-001, 1998. [2] Chanteur, G.: Spatial Interpolation for Four Spacecraft: Theory, in: Analysis Methods for Multi-Spacecraft Data, G. Paschmann and P. Daly (eds.), pp. 371-393, ISSI SR-001, 1998. [3] Chanteur, G.: Accuracy of field gradient estimations by Cluster: Explanation of its dependency upon elongation and planarity of the tetrahedron, pp. 265-268, ESA SP-449, 2000. [4] Vogt, J., Paschmann, G., and Chanteur, G.: Reciprocal Vectors, pp. 33-46, ISSI SR-008, 2008.
On the Distribution of Orbital Poles of Milky Way Satellites
NASA Astrophysics Data System (ADS)
Palma, Christopher; Majewski, Steven R.; Johnston, Kathryn V.
2002-01-01
In numerous studies of the outer Galactic halo some evidence for accretion has been found. If the outer halo did form in part or wholly through merger events, we might expect to find coherent streams of stars and globular clusters following orbits similar to those of their parent objects, which are assumed to be present or former Milky Way dwarf satellite galaxies. We present a study of this phenomenon by assessing the likelihood of potential descendant ``dynamical families'' in the outer halo. We conduct two analyses: one that involves a statistical analysis of the spatial distribution of all known Galactic dwarf satellite galaxies (DSGs) and globular clusters, and a second, more specific analysis of those globular clusters and DSGs for which full phase space dynamical data exist. In both cases our methodology is appropriate only to members of descendant dynamical families that retain nearly aligned orbital poles today. Since the Sagittarius dwarf (Sgr) is considered a paradigm for the type of merger/tidal interaction event for which we are searching, we also undertake a case study of the Sgr system and identify several globular clusters that may be members of its extended dynamical family. In our first analysis, the distribution of possible orbital poles for the entire sample of outer (Rgc>8 kpc) halo globular clusters is tested for statistically significant associations among globular clusters and DSGs. Our methodology for identifying possible associations is similar to that used by Lynden-Bell & Lynden-Bell, but we put the associations on a more statistical foundation. Moreover, we study the degree of possible dynamical clustering among various interesting ensembles of globular clusters and satellite galaxies. Among the ensembles studied, we find the globular cluster subpopulation with the highest statistical likelihood of association with one or more of the Galactic DSGs to be the distant, outer halo (Rgc>25 kpc), second-parameter globular clusters. The results of our orbital pole analysis are supported by the great circle cell count methodology of Johnston, Hernquist, & Bolte. The space motions of the clusters Pal 4, NGC 6229, NGC 7006, and Pyxis are predicted to be among those most likely to show the clusters to be following stream orbits, since these clusters are responsible for the majority of the statistical significance of the association between outer halo, second-parameter globular clusters and the Milky Way DSGs. In our second analysis, we study the orbits of the 41 globular clusters and six Milky Way-bound DSGs having measured proper motions to look for objects with both coplanar orbits and similar angular momenta. Unfortunately, the majority of globular clusters with measured proper motions are inner halo clusters that are less likely to retain memory of their original orbit. Although four potential globular cluster/DSG associations are found, we believe three of these associations involving inner halo clusters to be coincidental. While the present sample of objects with complete dynamical data is small and does not include many of the globular clusters that are more likely to have been captured by the Milky Way, the methodology we adopt will become increasingly powerful as more proper motions are measured for distant Galactic satellites and globular clusters, and especially as results from the Space Interferometry Mission (SIM) become available.
NASA Astrophysics Data System (ADS)
Atherton, Daniel
Early detection of disease and insect infestation within crops and precise application of pesticides can help reduce potential production losses, reduce environmental risk, and reduce the cost of farming. The goal of this study was the advanced detection of early blight (Alternaria solani) in potato (Solanum tuberosum) plants using hyperspectral remote sensing data captured with a handheld spectroradiometer. Hyperspectral reflectance spectra were captured 10 times over five weeks from plants grown to the vegetative and tuber bulking growth stages. The spectra were analyzed using principal component analysis (PCA), spectral change (ratio) analysis, partial least squares (PLS), cluster analysis, and vegetative indices. PCA successfully distinguished more heavily diseased plants from healthy and minimally diseased plants using two principal components. Spectral change (ratio) analysis provided wavelengths (490-510, 640, 665-670, 690, 740-750, and 935 nm) most sensitive to early blight infection followed by ANOVA results indicating a highly significant difference (p < 0.0001) between disease rating group means. In the majority of the experiments, comparisons of diseased plants with healthy plants using Fisher's LSD revealed more heavily diseased plants were significantly different from healthy plants. PLS analysis demonstrated the feasibility of detecting early blight infected plants, finding four optimal factors for raw spectra with the predictor variation explained ranging from 93.4% to 94.6% and the response variation explained ranging from 42.7% to 64.7%. Cluster analysis successfully distinguished healthy plants from all diseased plants except for the most mildly diseased plants, showing clustering analysis was an effective method for detection of early blight. Analysis of the reflectance spectra using the simple ratio (SR) and the normalized difference vegetative index (NDVI) was effective at differentiating all diseased plants from healthy plants, except for the most mildly diseased plants. Of the analysis methods attempted, cluster analysis and vegetative indices were the most promising. The results show the potential of hyperspectral remote sensing for the detection of early blight in potato plants.
Symptom clusters and quality of life among patients with advanced heart failure
Yu, Doris SF; Chan, Helen YL; Leung, Doris YP; Hui, Elsie; Sit, Janet WH
2016-01-01
Objectives To identify symptom clusters among patients with advanced heart failure (HF) and the independent relationships with their quality of life (QoL). Methods This is the secondary data analysis of a cross-sectional study which interviewed 119 patients with advanced HF in the geriatric unit of a regional hospital in Hong Kong. The symptom profile and QoL were assessed by using the Edmonton Symptom Assessment Scale (ESAS) and the McGill QoL Questionnaire. Exploratory factor analysis was used to identify the symptom clusters. Hierarchical regression analysis was used to examine the independent relationships with their QoL, after adjusting the effects of age, gender, and comorbidities. Results The patients were at an advanced age (82.9 ± 6.5 years). Three distinct symptom clusters were identified: they were the distress cluster (including shortness of breath, anxiety, and depression), the decondition cluster (fatigue, drowsiness, nausea, and reduced appetite), and the discomfort cluster (pain, and sense of generalized discomfort). These three symptom clusters accounted for 63.25% of variance of the patients' symptom experience. The small to moderate correlations between these symptom clusters indicated that they were rather independent of one another. After adjusting the age, gender and comorbidities, the distress (β = −0.635, P < 0.001), the decondition (β = −0.148, P = 0.01), and the discomfort (β = −0.258, P < 0.001) symptom clusters independently predicted their QoL. Conclusions This study identified the distinctive symptom clusters among patients with advanced HF. The results shed light on the need to develop palliative care interventions for optimizing the symptom control for this life-limiting disease. PMID:27403150
Onda, Kyle; Crocker, Jonny; Kayser, Georgia Lyn; Bartram, Jamie
2014-03-01
The fields of global health and international development commonly cluster countries by geography and income to target resources and describe progress. For any given sector of interest, a range of relevant indicators can serve as a more appropriate basis for classification. We create a new typology of country clusters specific to the water and sanitation (WatSan) sector based on similarities across multiple WatSan-related indicators. After a literature review and consultation with experts in the WatSan sector, nine indicators were selected. Indicator selection was based on relevance to and suggested influence on national water and sanitation service delivery, and to maximize data availability across as many countries as possible. A hierarchical clustering method and a gap statistic analysis were used to group countries into a natural number of relevant clusters. Two stages of clustering resulted in five clusters, representing 156 countries or 6.75 billion people. The five clusters were not well explained by income or geography, and were distinct from existing country clusters used in international development. Analysis of these five clusters revealed that they were more compact and well separated than United Nations and World Bank country clusters. This analysis and resulting country typology suggest that previous geography- or income-based country groupings can be improved upon for applications in the WatSan sector by utilizing globally available WatSan-related indicators. Potential applications include guiding and discussing research, informing policy, improving resource targeting, describing sector progress, and identifying critical knowledge gaps in the WatSan sector. Copyright © 2013 Elsevier GmbH. All rights reserved.
Anandakrishnan, Ramu; Onufriev, Alexey
2008-03-01
In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Vavougios, George D; George D, George; Pastaka, Chaido; Zarogiannis, Sotirios G; Gourgoulianis, Konstantinos I
2016-02-01
Phenotyping obstructive sleep apnea syndrome's comorbidity has been attempted for the first time only recently. The aim of our study was to determine phenotypes of comorbidity in obstructive sleep apnea syndrome patients employing a data-driven approach. Data from 1472 consecutive patient records were recovered from our hospital's database. Categorical principal component analysis and two-step clustering were employed to detect distinct clusters in the data. Univariate comparisons between clusters included one-way analysis of variance with Bonferroni correction and chi-square tests. Predictors of pairwise cluster membership were determined via a binary logistic regression model. The analyses revealed six distinct clusters: A, 'healthy, reporting sleeping related symptoms'; B, 'mild obstructive sleep apnea syndrome without significant comorbidities'; C1: 'moderate obstructive sleep apnea syndrome, obesity, without significant comorbidities'; C2: 'moderate obstructive sleep apnea syndrome with severe comorbidity, obesity and the exclusive inclusion of stroke'; D1: 'severe obstructive sleep apnea syndrome and obesity without comorbidity and a 33.8% prevalence of hypertension'; and D2: 'severe obstructive sleep apnea syndrome with severe comorbidities, along with the highest Epworth Sleepiness Scale score and highest body mass index'. Clusters differed significantly in apnea-hypopnea index, oxygen desaturation index; arousal index; age, body mass index, minimum oxygen saturation and daytime oxygen saturation (one-way analysis of variance P < 0.0001). Binary logistic regression indicated that older age, greater body mass index, lower daytime oxygen saturation and hypertension were associated independently with an increased risk of belonging in a comorbid cluster. Six distinct phenotypes of obstructive sleep apnea syndrome and its comorbidities were identified. Mapping the heterogeneity of the obstructive sleep apnea syndrome may help the early identification of at-risk groups. Finally, determining predictors of comorbidity for the moderate and severe strata of these phenotypes implies a need to take these factors into account when considering obstructive sleep apnea syndrome treatment options. © 2015 The Authors. Journal of Sleep Research published by John Wiley & Sons Ltd on behalf of European Sleep Research Society.
Chang, Ni-Bin; Wimberly, Brent; Xuan, Zhemin
2012-03-01
This study presents an integrated k-means clustering and gravity model (IKCGM) for investigating the spatiotemporal patterns of nutrient and associated dissolved oxygen levels in Tampa Bay, Florida. By using a k-means clustering analysis to first partition the nutrient data into a user-specified number of subsets, it is possible to discover the spatiotemporal patterns of nutrient distribution in the bay and capture the inherent linkages of hydrodynamic and biogeochemical features. Such patterns may then be combined with a gravity model to link the nutrient source contribution from each coastal watershed to the generated clusters in the bay to aid in the source proportion analysis for environmental management. The clustering analysis was carried out based on 1 year (2008) water quality data composed of 55 sample stations throughout Tampa Bay collected by the Environmental Protection Commission of Hillsborough County. In addition, hydrological and river water quality data of the same year were acquired from the United States Geological Survey's National Water Information System to support the gravity modeling analysis. The results show that the k-means model with 8 clusters is the optimal choice, in which cluster 2 at Lower Tampa Bay had the minimum values of total nitrogen (TN) concentrations, chlorophyll a (Chl-a) concentrations, and ocean color values in every season as well as the minimum concentration of total phosphorus (TP) in three consecutive seasons in 2008. The datasets indicate that Lower Tampa Bay is an area with limited nutrient input throughout the year. Cluster 5, located in Middle Tampa Bay, displayed elevated TN concentrations, ocean color values, and Chl-a concentrations, suggesting that high values of colored dissolved organic matter are linked with some nutrient sources. The data presented by the gravity modeling analysis indicate that the Alafia River Basin is the major contributor of nutrients in terms of both TP and TN values in all seasons. With this new integration, improvements for environmental monitoring and assessment were achieved to advance our understanding of sea-land interactions and nutrient cycling in a critical coastal bay, the Gulf of Mexico. This journal is © The Royal Society of Chemistry 2012
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110
Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.
Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey
2017-09-01
We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.
Bennett, Robert M; Russell, Jon; Cappelleri, Joseph C; Bushmakin, Andrew G; Zlateva, Gergana; Sadosky, Alesia
2010-06-28
The purpose of this study was to determine whether some of the clinical features of fibromyalgia (FM) that patients would like to see improved aggregate into definable clusters. Seven hundred and eighty-eight patients with clinically confirmed FM and baseline pain > or =40 mm on a 100 mm visual analogue scale ranked 5 FM clinical features that the subjects would most like to see improved after treatment (one for each priority quintile) from a list of 20 developed during focus groups. For each subject, clinical features were transformed into vectors with rankings assigned values 1-5 (lowest to highest ranking). Logistic analysis was used to create a distance matrix and hierarchical cluster analysis was applied to identify cluster structure. The frequency of cluster selection was determined, and cluster importance was ranked using cluster scores derived from rankings of the clinical features. Multidimensional scaling was used to visualize and conceptualize cluster relationships. Six clinical features clusters were identified and named based on their key characteristics. In order of selection frequency, the clusters were Pain (90%; 4 clinical features), Fatigue (89%; 4 clinical features), Domestic (42%; 4 clinical features), Impairment (29%; 3 functions), Affective (21%; 3 clinical features), and Social (9%; 2 functional). The "Pain Cluster" was ranked of greatest importance by 54% of subjects, followed by Fatigue, which was given the highest ranking by 28% of subjects. Multidimensional scaling mapped these clusters to two dimensions: Status (bounded by Physical and Emotional domains), and Setting (bounded by Individual and Group interactions). Common clinical features of FM could be grouped into 6 clusters (Pain, Fatigue, Domestic, Impairment, Affective, and Social) based on patient perception of relevance to treatment. Furthermore, these 6 clusters could be charted in the 2 dimensions of Status and Setting, thus providing a unique perspective for interpretation of FM symptomatology.