Sample records for comparative cluster analysis

  1. A generalized analysis of hydrophobic and loop clusters within globular protein sequences

    PubMed Central

    Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

    2007-01-01

    Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072

  2. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    PubMed

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  3. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    PubMed

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  4. Clinical Phenotype of Diabetic Peripheral Neuropathy and Relation to Symptom Patterns: Cluster and Factor Analysis in Patients with Type 2 Diabetes in Korea.

    PubMed

    Won, Jong Chul; Im, Yong-Jin; Lee, Ji-Hyun; Kim, Chong Hwa; Kwon, Hyuk Sang; Cha, Bong-Yun; Park, Tae Sun

    2017-01-01

    Patients with diabetic peripheral neuropathy (DPN) is the most common complication. However, patients are usually suffering from not only diverse sensory deficit but also neuropathy-related discomforts. The aim of this study is to identify distinct groups of patients with DPN with respect to its clinical impacts on symptom patterns and comorbidities. A hierarchical cluster analysis and factor analysis were performed to identify relevant subgroups of patients with DPN ( n = 1338) and symptom patterns. Patients with DPN were divided into three clusters: asymptomatic (cluster 1, n = 448, 33.5%), moderate symptoms with disturbed sleep (cluster 2, n = 562, 42.0%), and severe symptoms with decreased quality of life (cluster 3, n = 328, 24.5%). Patients in cluster 3, compared with clusters 1 and 2, were characterized by higher levels of HbA1c and more severe pain and physical impairments. Patients in cluster 2 had moderate pain levels but disturbed sleep patterns comparable to those in cluster 3. The frequency of symptoms on each item of MNSI by "painful" symptom pattern showed a similar distribution pattern with increasing intensities along the three clusters. Cluster and factor analysis endorsed the use of comprehensive and symptomatic subgrouping to individualize the evaluation of patients with DPN.

  5. Interactive visual exploration and refinement of cluster assignments.

    PubMed

    Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R

    2017-09-12

    With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.

  6. Subtypes of female juvenile offenders: a cluster analysis of the Millon Adolescent Clinical Inventory.

    PubMed

    Stefurak, Tres; Calhoun, Georgia B

    2007-01-01

    The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.

  7. Dietary patterns by cluster analysis in pregnant women: relationship with nutrient intakes and dietary patterns in 7-year-old offspring.

    PubMed

    Freitas-Vilela, Ana Amélia; Smith, Andrew D A C; Kac, Gilberto; Pearson, Rebecca M; Heron, Jon; Emond, Alan; Hibbeln, Joseph R; Castro, Maria Beatriz Trindade; Emmett, Pauline M

    2017-04-01

    Little is known about how dietary patterns of mothers and their children track over time. The objectives of this study are to obtain dietary patterns in pregnancy using cluster analysis, to examine women's mean nutrient intakes in each cluster and to compare the dietary patterns of mothers to those of their children. Pregnant women (n = 12 195) from the Avon Longitudinal Study of Parents and Children reported their frequency of consumption of 47 foods and food groups. These data were used to obtain dietary patterns during pregnancy by cluster analysis. The absolute and energy-adjusted nutrient intakes were compared between clusters. Women's dietary patterns were compared with previously derived clusters of their children at 7 years of age. Multinomial logistic regression was performed to evaluate relationships comparing maternal and offspring clusters. Three maternal clusters were identified: 'fruit and vegetables', 'meat and potatoes' and 'white bread and coffee'. After energy adjustment women in the 'fruit and vegetables' cluster had the highest mean nutrient intakes. Mothers in the 'fruit and vegetables' cluster were more likely than mothers in 'meat and potatoes' (adjusted odds ratio [OR]: 2.00; 95% Confidence Interval [CI]: 1.69-2.36) or 'white bread and coffee' (OR: 2.18; 95% CI: 1.87-2.53) clusters to have children in a 'plant-based' cluster. However the majority of children were in clusters unrelated to their mother dietary pattern. Three distinct dietary patterns were obtained in pregnancy; the 'fruit and vegetables' pattern being the most nutrient dense. Mothers' dietary patterns were associated with but did not dominate offspring dietary patterns. © 2016 The Authors. Maternal & Child Nutrition published by John Wiley & Sons Ltd.

  8. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles.

    PubMed

    Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G Michael; O'Connor, Christopher; Patel, Chetan B

    2016-01-01

    Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using simultaneous considerations of etiology, comorbid conditions, and biomarker levels, may be superior to bedside classifications.

  9. Cluster subgroups based on overall pressure pain sensitivity and psychosocial factors in chronic musculoskeletal pain: Differences in clinical outcomes.

    PubMed

    Almeida, Suzana C; George, Steven Z; Leite, Raquel D V; Oliveira, Anamaria S; Chaves, Thais C

    2018-05-17

    We aimed to empirically derive psychosocial and pain sensitivity subgroups using cluster analysis within a sample of individuals with chronic musculoskeletal pain (CMP) and to investigate derived subgroups for differences in pain and disability outcomes. Eighty female participants with CMP answered psychosocial and disability scales and were assessed for pressure pain sensitivity. A cluster analysis was used to derive subgroups, and analysis of variance (ANOVA) was used to investigate differences between subgroups. Psychosocial factors (kinesiophobia, pain catastrophizing, anxiety, and depression) and overall pressure pain threshold (PPT) were entered into the cluster analysis. Three subgroups were empirically derived: cluster 1 (high pain sensitivity and high psychosocial distress; n = 12) characterized by low overall PPT and high psychosocial scores; cluster 2 (high pain sensitivity and intermediate psychosocial distress; n = 39) characterized by low overall PPT and intermediate psychosocial scores; and cluster 3 (low pain sensitivity and low psychosocial distress; n = 29) characterized by high overall PPT and low psychosocial scores compared to the other subgroups. Cluster 1 showed higher values for mean pain intensity (F (2,77)  = 10.58, p < 0.001) compared with cluster 3, and cluster 1 showed higher values for disability (F (2,77)  = 3.81, p = 0.03) compared with both clusters 2 and 3. Only cluster 1 was distinct from cluster 3 according to both pain and disability outcomes. Pain catastrophizing, depression, and anxiety were the psychosocial variables that best differentiated the subgroups. Overall, these results call attention to the importance of considering pain sensitivity and psychosocial variables to obtain a more comprehensive characterization of CMP patients' subtypes.

  10. Cluster analysis and prediction of treatment outcomes for chronic rhinosinusitis.

    PubMed

    Soler, Zachary M; Hyer, J Madison; Rudmik, Luke; Ramakrishnan, Viswanathan; Smith, Timothy L; Schlosser, Rodney J

    2016-04-01

    Current clinical classifications of chronic rhinosinusitis (CRS) have weak prognostic utility regarding treatment outcomes. Simplified discriminant analysis based on unsupervised clustering has identified novel phenotypic subgroups of CRS, but prognostic utility is unknown. We sought to determine whether discriminant analysis allows prognostication in patients choosing surgery versus continued medical management. A multi-institutional prospective study of patients with CRS in whom initial medical therapy failed who then self-selected continued medical management or surgical treatment was used to separate patients into 5 clusters based on a previously described discriminant analysis using total Sino-Nasal Outcome Test-22 (SNOT-22) score, age, and missed productivity. Patients completed the SNOT-22 at baseline and for 18 months of follow-up. Baseline demographic and objective measures included olfactory testing, computed tomography, and endoscopy scoring. SNOT-22 outcomes for surgical versus continued medical treatment were compared across clusters. Data were available on 690 patients. Baseline differences in demographics, comorbidities, objective disease measures, and patient-reported outcomes were similar to previous clustering reports. Three of 5 clusters identified by means of discriminant analysis had improved SNOT-22 outcomes with surgical intervention when compared with continued medical management (surgery was a mean of 21.2 points better across these 3 clusters at 6 months, P < .05). These differences were sustained at 18 months of follow-up. Two of 5 clusters had similar outcomes when comparing surgery with continued medical management. A simplified discriminant analysis based on 3 common clinical variables is able to cluster patients and provide prognostic information regarding surgical treatment versus continued medical management in patients with CRS. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  11. clusterProfiler: an R package for comparing biological themes among gene clusters.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

    2012-05-01

    Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.

  12. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-07-01

    In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed yielding an explict cluster attribution for each particle, improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.

  13. Molecular Analysis of Bacterial Community Dynamics During Bioaugmentation Studies in a Soil Column and at a Field Test Site

    DTIC Science & Technology

    2004-06-03

    82 4.14 A GelComparII-generated UPGMA clustering dendrogram and corresponding normalized restriction...A GelComparII-generated UPGMA clustering dendrogram and corresponding normalized restriction profiles from the community...A GelComparII-generated UPGMA clustering dendrogram and corresponding normalized restriction profiles from the community

  14. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    PubMed

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  15. Comparative genomic analysis in the fungus Fusarium for production of toxins of concern to food safety

    USDA-ARS?s Scientific Manuscript database

    SUMMARY Comparative analysis of 207 genomes representing 159 species of the fungus Fusarium detected 9403 known and putative secondary metabolite (SM) biosynthetic gene clusters. The clusters included those responsible for synthesis of mycotoxins, plant hormones and pigments, and varied in distribut...

  16. Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Sanders, Elizabeth A.

    2011-01-01

    This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…

  17. Statistical analysis of activation and reaction energies with quasi-variational coupled-cluster theory

    NASA Astrophysics Data System (ADS)

    Black, Joshua A.; Knowles, Peter J.

    2018-06-01

    The performance of quasi-variational coupled-cluster (QV) theory applied to the calculation of activation and reaction energies has been investigated. A statistical analysis of results obtained for six different sets of reactions has been carried out, and the results have been compared to those from standard single-reference methods. In general, the QV methods lead to increased activation energies and larger absolute reaction energies compared to those obtained with traditional coupled-cluster theory.

  18. Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.

    PubMed

    Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban

    2017-05-01

    Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.

  19. ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data

    NASA Technical Reports Server (NTRS)

    Wharton, S. W.; Turner, B. J.

    1981-01-01

    An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.

  20. Customized recommendations for production management clusters of North American automatic milking systems.

    PubMed

    Tremblay, Marlène; Hess, Justin P; Christenson, Brock M; McIntyre, Kolby K; Smink, Ben; van der Kamp, Arjen J; de Jong, Lisanne G; Döpfer, Dörte

    2016-07-01

    Automatic milking systems (AMS) are implemented in a variety of situations and environments. Consequently, there is a need to characterize individual farming practices and regional challenges to streamline management advice and objectives for producers. Benchmarking is often used in the dairy industry to compare farms by computing percentile ranks of the production values of groups of farms. Grouping for conventional benchmarking is commonly limited to the use of a few factors such as farms' geographic region or breed of cattle. We hypothesized that herds' production data and management information could be clustered in a meaningful way using cluster analysis and that this clustering approach would yield better peer groups of farms than benchmarking methods based on criteria such as country, region, breed, or breed and region. By applying mixed latent-class model-based cluster analysis to 529 North American AMS dairy farms with respect to 18 significant risk factors, 6 clusters were identified. Each cluster (i.e., peer group) represented unique management styles, challenges, and production patterns. When compared with peer groups based on criteria similar to the conventional benchmarking standards, the 6 clusters better predicted milk produced (kilograms) per robot per day. Each cluster represented a unique management and production pattern that requires specialized advice. For example, cluster 1 farms were those that recently installed AMS robots, whereas cluster 3 farms (the most northern farms) fed high amounts of concentrates through the robot to compensate for low-energy feed in the bunk. In addition to general recommendations for farms within a cluster, individual farms can generate their own specific goals by comparing themselves to farms within their cluster. This is very comparable to benchmarking but adds the specific characteristics of the peer group, resulting in better farm management advice. The improvement that cluster analysis allows for is characterized by the multivariable approach and the fact that comparisons between production units can be accomplished within a cluster and between clusters as a choice. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  1. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  2. Differences Between Ward's and UPGMA Methods of Cluster Analysis: Implications for School Psychology.

    ERIC Educational Resources Information Center

    Hale, Robert L.; Dougherty, Donna

    1988-01-01

    Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…

  3. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-11-01

    In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misattribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed, yielding an explicit cluster attribution for each particle and improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.

  4. Two worlds collide: Image analysis methods for quantifying structural variation in cluster molecular dynamics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less

  5. Two worlds collide: image analysis methods for quantifying structural variation in cluster molecular dynamics.

    PubMed

    Steenbergen, K G; Gaston, N

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.

  6. XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data

    PubMed Central

    2015-01-01

    Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893

  7. Autoantibodies in pediatric systemic lupus erythematosus: ethnic grouping, cluster analysis, and clinical correlations.

    PubMed

    Jurencák, Roman; Fritzler, Marvin; Tyrrell, Pascal; Hiraki, Linda; Benseler, Susanne; Silverman, Earl

    2009-02-01

    (1) To evaluate the spectrum of serum autoantibodies in pediatric-onset systemic lupus erythematosus (pSLE) with a focus on ethnic differences; (2) using cluster analysis, to identify patients with similar autoantibody patterns and to determine their clinical associations. A single-center cohort study of all patients with newly diagnosed pSLE seen over an 8-year period was performed. Ethnicity, clinical, and serological data were prospectively collected from 156/169 patients (92%). The frequencies of 10 selected autoantibodies among ethnic groups were compared. Cluster analysis identified groups of patients with similar autoantibody profiles. Associations of these groups with clinical and laboratory features of pSLE were examined. Among our 5 ethnic groups, there were differences only in the prevalence of anti-U1RNP and anti-Sm antibodies, which occurred more frequently in non-Caucasian patients (p < 0.0001, p < 0.01, respectively). Cluster analysis revealed 3 autoantibody clusters. Cluster 1 consisted of anti-dsDNA antibodies. Cluster 2 consisted of anti-dsDNA, antichromatin, antiribosomal P, anti-U1RNP, anti-Sm, anti-Ro and anti-La autoantibody. Cluster 3 consisted of anti-dsDNA, anti-RNP, and anti-Sm autoantibody. The highest proportion of Caucasians was in cluster 1 (p < 0.05), which was characterized by a mild disease with infrequent major organ involvement compared to cluster 2, which had the highest frequency of nephritis, renal failure, serositis, and hemolytic anemia, or cluster 3, which was characterized by frequent neuropsychiatric disease and nephritis. We observed ethnic differences in autoantibody profiles in pSLE. Autoantibodies tended to cluster together and these clusters were associated with different clinical courses.

  8. Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

    PubMed

    Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

    2017-12-01

    To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  9. Cluster analysis of quantitative parametric maps from DCE-MRI: application in evaluating heterogeneity of tumor response to antiangiogenic treatment.

    PubMed

    Longo, Dario Livio; Dastrù, Walter; Consolino, Lorena; Espak, Miklos; Arigoni, Maddalena; Cavallo, Federica; Aime, Silvio

    2015-07-01

    The objective of this study was to compare a clustering approach to conventional analysis methods for assessing changes in pharmacokinetic parameters obtained from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) during antiangiogenic treatment in a breast cancer model. BALB/c mice bearing established transplantable her2+ tumors were treated with a DNA-based antiangiogenic vaccine or with an empty plasmid (untreated group). DCE-MRI was carried out by administering a dose of 0.05 mmol/kg of Gadocoletic acid trisodium salt, a Gd-based blood pool contrast agent (CA) at 1T. Changes in pharmacokinetic estimates (K(trans) and vp) in a nine-day interval were compared between treated and untreated groups on a voxel-by-voxel analysis. The tumor response to therapy was assessed by a clustering approach and compared with conventional summary statistics, with sub-regions analysis and with histogram analysis. Both the K(trans) and vp estimates, following blood-pool CA injection, showed marked and spatial heterogeneous changes with antiangiogenic treatment. Averaged values for the whole tumor region, as well as from the rim/core sub-regions analysis were unable to assess the antiangiogenic response. Histogram analysis resulted in significant changes only in the vp estimates (p<0.05). The proposed clustering approach depicted marked changes in both the K(trans) and vp estimates, with significant spatial heterogeneity in vp maps in response to treatment (p<0.05), provided that DCE-MRI data are properly clustered in three or four sub-regions. This study demonstrated the value of cluster analysis applied to pharmacokinetic DCE-MRI parametric maps for assessing tumor response to antiangiogenic therapy. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Cluster Correspondence Analysis.

    PubMed

    van de Velden, M; D'Enza, A Iodice; Palumbo, F

    2017-03-01

    A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.

  11. Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh

    2013-11-30

    Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less

  12. Application of microarray analysis on computer cluster and cloud platforms.

    PubMed

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  13. Comparative study of two protocols for quantitative image-analysis of serotonin transporter clustering in lymphocytes, a putative biomarker of therapeutic efficacy in major depression.

    PubMed

    Romay-Tallon, Raquel; Rivera-Baltanas, Tania; Allen, Josh; Olivares, Jose M; Kalynchuk, Lisa E; Caruncho, Hector J

    2017-01-01

    The pattern of serotonin transporter clustering on the plasma membrane of lymphocytes extracted from human whole blood samples has been identified as a putative biomarker of therapeutic efficacy in major depression. Here we evaluated the possibility of performing a similar analysis using blood smears obtained from rats, and from control human subjects and depression patients. We hypothesized that we could optimize a protocol to make the analysis of serotonin protein clustering in blood smears comparable to the analysis of serotonin protein clustering using isolated lymphocytes. Our data indicate that blood smears require a longer fixation time and longer times of incubation with primary and secondary antibodies. In addition, one needs to optimize the image analysis settings for the analysis of smears. When these steps are followed, the quantitative analysis of both the number and size of serotonin transporter clusters on the plasma membrane of lymphocytes is similar using both blood smears and isolated lymphocytes. The development of this novel protocol will greatly facilitate the collection of appropriate samples by eliminating the necessity and cost of specialized personnel for drawing blood samples, and by being a less invasive procedure. Therefore, this protocol will help us advance the validation of membrane protein clustering in lymphocytes as a biomarker of therapeutic efficacy in major depression, and bring it closer to its clinical application.

  14. Comparative analysis of prophages in Streptococcus mutans genomes

    PubMed Central

    Fu, Tiwei; Fan, Xiangyu; Long, Quanxin; Deng, Wanyan; Song, Jinlin

    2017-01-01

    Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages. PMID:29158986

  15. SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.

    PubMed

    Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A

    2018-01-01

    Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.

  16. A Technique of Two-Stage Clustering Applied to Environmental and Civil Engineering and Related Methods of Citation Analysis.

    ERIC Educational Resources Information Center

    Miyamoto, S.; Nakayama, K.

    1983-01-01

    A method of two-stage clustering of literature based on citation frequency is applied to 5,065 articles from 57 journals in environmental and civil engineering. Results of related methods of citation analysis (hierarchical graph, clustering of journals, multidimensional scaling) applied to same set of articles are compared. Ten references are…

  17. Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

    ERIC Educational Resources Information Center

    Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

    2013-01-01

    This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…

  18. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    PubMed

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  19. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    PubMed

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  20. The composite sequential clustering technique for analysis of multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  1. Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.

    PubMed

    Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut

    2015-03-01

    Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

    PubMed Central

    Diaz-Ordaz, Karla; Bartlett, Jonathan W

    2016-01-01

    Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885

  3. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

    PubMed

    Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

    2017-06-01

    Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.

  4. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  5. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    PubMed

    Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001); being on ART (highest proportions for clusters two and three, p=0.012); global distress (F=26.8, p<0.001), physical distress (F=36.3, p<0.001) and psychological distress subscale (F=21.8, p<0.001) (all subscales worst for cluster one, best for cluster four). The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.

  6. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    PubMed

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  7. Cluster Subcutaneous Allergen Specific Immunotherapy for the Treatment of Allergic Rhinitis: A Systematic Review and Meta-Analysis

    PubMed Central

    Sun, Yueqi; Luo, Xi; Li, Huabin

    2014-01-01

    Background Although allergen specific immunotherapy (SIT) represents the only immune- modifying and curative option available for patients with allergic rhinitis (AR), the optimal schedule for specific subcutaneous immunotherapy (SCIT) is still unknown. The objective of this study is to systematically assess the efficacy and safety of cluster SCIT for patients with AR. Methods By searching PubMed, EMBASE and the Cochrane clinical trials database from 1980 through May 10th, 2013, we collected and analyzed the randomized controlled trials (RCTs) of cluster SCIT to assess its efficacy and safety. Results Eight trials involving 567 participants were included in this systematic review. Our meta-analysis showed that cluster SCIT have similar effect in reduction of both rhinitis symptoms and the requirement for anti-allergic medication compared with conventional SCIT, but when comparing cluster SCIT with placebo, no statistic significance were found in reduction of symptom scores or medication scores. Some caution is required in this interpretation as there was significant heterogeneity between studies. Data relating to Rhinoconjunctivitis Quality of Life Questionnaire (RQLQ) in 3 included studies were analyzed, which consistently point to the efficacy of cluster SCIT in improving quality of life compared to placebo. To assess the safety of cluster SCIT, meta-analysis showed that no differences existed in the incidence of either local adverse reaction or systemic adverse reaction between the cluster group and control group. Conclusion Based on the current limited evidence, we still could not conclude affirmatively that cluster SCIT was a safe and efficacious option for the treatment of AR patients. Further large-scale, well-designed RCTs on this topic are still needed. PMID:24489740

  8. Clinical phenotypes and survival of pre-capillary pulmonary hypertension in systemic sclerosis.

    PubMed

    Launay, David; Montani, David; Hassoun, Paul M; Cottin, Vincent; Le Pavec, Jérôme; Clerson, Pierre; Sitbon, Olivier; Jaïs, Xavier; Savale, Laurent; Weatherald, Jason; Sobanski, Vincent; Mathai, Stephen C; Shafiq, Majid; Cordier, Jean-François; Hachulla, Eric; Simonneau, Gérald; Humbert, Marc

    2018-01-01

    Pre-capillary pulmonary hypertension (PH) in systemic sclerosis (SSc) is a heterogeneous condition with an overall bad prognosis. The objective of this study was to identify and characterize homogeneous phenotypes by a cluster analysis in SSc patients with PH. Patients were identified from two prospective cohorts from the US and France. Clinical, pulmonary function, high-resolution chest tomography, hemodynamic and survival data were extracted. We performed cluster analysis using the k-means method and compared survival between clusters using Cox regression analysis. Cluster analysis of 200 patients identified four homogenous phenotypes. Cluster C1 included patients with mild to moderate risk pulmonary arterial hypertension (PAH) with limited or no interstitial lung disease (ILD) and low DLCO with a 3-year survival of 81.5% (95% CI: 71.4-88.2). C2 had pre-capillary PH due to extensive ILD and worse 3-year survival compared to C1 (adjusted hazard ratio [HR] 3.14; 95% CI 1.66-5.94; p = 0.0004). C3 had severe PAH and a trend towards worse survival (HR 2.53; 95% CI 0.99-6.49; p = 0.052). Cluster C4 and C1 were similar with no difference in survival (HR 0.65; 95% CI 0.19-2.27, p = 0.507) but with a higher DLCO in C4. PH in SSc can be characterized into distinct clusters that differ in prognosis.

  9. The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions

    NASA Astrophysics Data System (ADS)

    Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.

    2018-04-01

    The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.

  10. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    PubMed

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  11. Finding Groups Using Model-based Cluster Analysis: Heterogeneous Emotional Self-regulatory Processes and Heavy Alcohol Use Risk

    PubMed Central

    Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2010-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138

  12. Comparative analysis on the selection of number of clusters in community detection

    NASA Astrophysics Data System (ADS)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  13. A graph-Laplacian-based feature extraction algorithm for neural spike sorting.

    PubMed

    Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos

    2009-01-01

    Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.

  14. The Cluster Sensitivity Index: A Basic Measure of Classification Robustness

    ERIC Educational Resources Information Center

    Hom, Willard C.

    2010-01-01

    Analysts of institutional performance have occasionally used a peer grouping approach in which they compared institutions only to other institutions with similar characteristics. Because analysts historically have used cluster analysis to define peer groups (i.e., the group of comparable institutions), the author proposes and demonstrates with…

  15. Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

    PubMed

    van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

    2017-01-01

    In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.

  16. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study.

    PubMed

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.

  17. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    PubMed

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  18. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2015-01-01

    Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745

  19. The adiposity of children is associated with their lifestyle behaviours: a cluster analysis of school-aged children from 12 nations.

    PubMed

    Dumuid, Dorothea; Olds, T; Lewis, L K; Martin-Fernández, J A; Barreira, T; Broyles, S; Chaput, J-P; Fogelholm, M; Hu, G; Kuriyan, R; Kurpad, A; Lambert, E V; Maia, J; Matsudo, V; Onywera, V O; Sarmiento, O L; Standage, M; Tremblay, M S; Tudor-Locke, C; Zhao, P; Katzmarzyk, P; Gillison, F; Maher, C

    2018-02-01

    The relationship between children's adiposity and lifestyle behaviour patterns is an area of growing interest. The objectives of this study are to identify clusters of children based on lifestyle behaviours and compare children's adiposity among clusters. Cross-sectional data from the International Study of Childhood Obesity, Lifestyle and the Environment were used. the participants were children (9-11 years) from 12 nations (n = 5710). 24-h accelerometry and self-reported diet and screen time were clustering input variables. Objectively measured adiposity indicators were waist-to-height ratio, percent body fat and body mass index z-scores. sex-stratified analyses were performed on the global sample and repeated on a site-wise basis. Cluster analysis (using isometric log ratios for compositional data) was used to identify common lifestyle behaviour patterns. Site representation and adiposity were compared across clusters using linear models. Four clusters emerged: (1) Junk Food Screenies, (2) Actives, (3) Sitters and (4) All-Rounders. Countries were represented differently among clusters. Chinese children were over-represented in Sitters and Colombian children in Actives. Adiposity varied across clusters, being highest in Sitters and lowest in Actives. Children from different sites clustered into groups of similar lifestyle behaviours. Cluster membership was linked with differing adiposity. Findings support the implementation of activity interventions in all countries, targeting both physical activity and sedentary time. © 2016 World Obesity Federation.

  20. Baseline adjustments for binary data in repeated cross-sectional cluster randomized trials.

    PubMed

    Nixon, R M; Thompson, S G

    2003-09-15

    Analysis of covariance models, which adjust for a baseline covariate, are often used to compare treatment groups in a controlled trial in which individuals are randomized. Such analysis adjusts for any baseline imbalance and usually increases the precision of the treatment effect estimate. We assess the value of such adjustments in the context of a cluster randomized trial with repeated cross-sectional design and a binary outcome. In such a design, a new sample of individuals is taken from the clusters at each measurement occasion, so that baseline adjustment has to be at the cluster level. Logistic regression models are used to analyse the data, with cluster level random effects to allow for different outcome probabilities in each cluster. We compare the estimated treatment effect and its precision in models that incorporate a covariate measuring the cluster level probabilities at baseline and those that do not. In two data sets, taken from a cluster randomized trial in the treatment of menorrhagia, the value of baseline adjustment is only evident when the number of subjects per cluster is large. We assess the generalizability of these findings by undertaking a simulation study, and find that increased precision of the treatment effect requires both large cluster sizes and substantial heterogeneity between clusters at baseline, but baseline imbalance arising by chance in a randomized study can always be effectively adjusted for. Copyright 2003 John Wiley & Sons, Ltd.

  1. Cluster-based upper body marker models for three-dimensional kinematic analysis: Comparison with an anatomical model and reliability analysis.

    PubMed

    Boser, Quinn A; Valevicius, Aïda M; Lavoie, Ewen B; Chapman, Craig S; Pilarski, Patrick M; Hebert, Jacqueline S; Vette, Albert H

    2018-04-27

    Quantifying angular joint kinematics of the upper body is a useful method for assessing upper limb function. Joint angles are commonly obtained via motion capture, tracking markers placed on anatomical landmarks. This method is associated with limitations including administrative burden, soft tissue artifacts, and intra- and inter-tester variability. An alternative method involves the tracking of rigid marker clusters affixed to body segments, calibrated relative to anatomical landmarks or known joint angles. The accuracy and reliability of applying this cluster method to the upper body has, however, not been comprehensively explored. Our objective was to compare three different upper body cluster models with an anatomical model, with respect to joint angles and reliability. Non-disabled participants performed two standardized functional upper limb tasks with anatomical and cluster markers applied concurrently. Joint angle curves obtained via the marker clusters with three different calibration methods were compared to those from an anatomical model, and between-session reliability was assessed for all models. The cluster models produced joint angle curves which were comparable to and highly correlated with those from the anatomical model, but exhibited notable offsets and differences in sensitivity for some degrees of freedom. Between-session reliability was comparable between all models, and good for most degrees of freedom. Overall, the cluster models produced reliable joint angles that, however, cannot be used interchangeably with anatomical model outputs to calculate kinematic metrics. Cluster models appear to be an adequate, and possibly advantageous alternative to anatomical models when the objective is to assess trends in movement behavior. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. Symptom clusters predict mortality among dialysis patients in Norway: a prospective observational cohort study.

    PubMed

    Amro, Amin; Waldum, Bård; von der Lippe, Nanna; Brekke, Fredrik Barth; Dammen, Toril; Miaskowski, Christine; Os, Ingrid

    2015-01-01

    Patients with end-stage renal disease on dialysis have reduced survival rates compared with the general population. Symptoms are frequent in dialysis patients, and a symptom cluster is defined as two or more related co-occurring symptoms. The aim of this study was to explore the associations between symptom clusters and mortality in dialysis patients. In a prospective observational cohort study of dialysis patients (n = 301), Kidney Disease and Quality of Life Short Form and Beck Depression Inventory questionnaires were administered. To generate symptom clusters, principal component analysis with varimax rotation was used on 11 kidney-specific self-reported physical symptoms. A Beck Depression Inventory score of 16 or greater was defined as clinically significant depressive symptoms. Physical and mental component summary scores were generated from Short Form-36. Multivariate Cox regression analysis was used for the survival analysis, Kaplan-Meier curves and log-rank statistics were applied to compare survival rates between the groups. Three different symptom clusters were identified; one included loading of several uremic symptoms. In multivariate analyses and after adjustment for health-related quality of life and depressive symptoms, the worst perceived quartile of the "uremic" symptom cluster independently predicted all-cause mortality (hazard ratio 2.47, 95% CI 1.44-4.22, P = 0.001) compared with the other quartiles during a follow-up period that ranged from four to 52 months. The two other symptom clusters ("neuromuscular" and "skin") or the individual symptoms did not predict mortality. Clustering of uremic symptoms predicted mortality. Assessing co-occurring symptoms rather than single symptoms may help to identify dialysis patients at high risk for mortality. Copyright © 2015 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  3. A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

    PubMed

    Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2015-01-01

    Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.

  4. Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.

    PubMed

    Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B

    2017-11-01

    Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  5. Retrospective Benefit-Cost Evaluation of DOE Investment in Photovoltaic Energy Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Connor, Alan C.; Loomis, Ross J.; Braun, Fern M.

    2010-08-01

    This study is a retrospective analysis of net benefits accruing from DOE's investment in photovoltaic (PV) technology development. The study employed a technology cluster approach. That is, benefits measured for a subset of technologies in a meaningful cluster, or portfolio, of technologies were compared to the total investment in the cluster to provide a lower bound measure of return for the entire cluster.

  6. An analysis of mortality inventory tally using large plots compared to tally using small plot clusters

    Treesearch

    Vernon J. LaBau; John W. Hazard

    2000-01-01

    During an inventory to assess spruce bark beetle impact on the Kenai Peninsula in south-central Alaska, 5-year mortality estimates were made for all growing-stock trees on 0.6 ha areas, on 0.4 ha areas, and on a cluster of four 1/60-ha subplots. The analysis of the results of the comparison between cluster data and the larger plot data highlighted some of the problems...

  7. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

    PubMed

    NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

    2017-08-01

    Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.

  8. An assessment of fatigue in patients with postural orthostatic tachycardia syndrome.

    PubMed

    Wise, Shelby; Ross, Amanda; Brown, Abigail; Evans, Meredyth; Jason, Leonard

    2017-05-01

    Individuals with postural orthostatic tachycardia syndrome share many symptoms with those who have chronic fatigue syndrome; one of which is severe fatigue. Previous literature found that those with chronic fatigue syndrome experience many forms of fatigue. The goal of this study was to investigate whether individuals with postural orthostatic tachycardia syndrome also experience multidimensional fatigue and whether these individuals can be clustered into subgroups based on the types of fatigue they endorse. A convenience sample of 138 participants (aged 14-29) with postural orthostatic tachycardia syndrome completed questionnaires that assessed fatigue, brain fog symptom severity, activities that improve brain fog, and brain fog-related disability. An exploratory factor analysis was conducted on the Fatigue Types Questionnaire, and a three-factor solution was produced. Factor scores were then used to cluster the patients into groups using a TwoStep cluster analysis. This resulted in two clusters, a high severity group and a low severity group. The clusters were then compared on a number of items related to symptom expression. Individuals within the more severe cluster had significantly more brain fog at the beginning and end of the survey when compared to cluster two. Those in the more severe cluster also described more activity impairment as well as more frequent, more severe, and more debilitation from postural orthostatic tachycardia syndrome and brain fog. The findings of the factor analysis suggest that patients with postural orthostatic tachycardia syndrome experience fatigue as a multidimensional construct and they also can be subgrouped based on symptom severity.

  9. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species

    USDA-ARS?s Scientific Manuscript database

    Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that i...

  10. Comparative analysis of ArnCl2 (2 ? n ? 30) clusters taking into account molecular relaxation effects

    NASA Astrophysics Data System (ADS)

    Ferreira, G. G.; Borges, E.; Braga, J. P.; Belchior, J. C.

    Cluster structures are discussed in a nonrigid analysis, using a modified minima search method based on stochastic processes and classical dynamics simulations. The relaxation process is taken into account considering the internal motion of the Cl2 molecule. Cluster structures are compared with previous works in which the Cl2 molecule is assumed to be rigid. The interactions are modeled using pair potentials: the Aziz and Lennard-Jones potentials for the Ar==Ar interaction, a Morse potential for the Cl==Cl interaction, and a fully spherical/anisotropic Morse-Spline-van der Waals (MSV) potential for the Ar==Cl interaction. As expected, all calculated energies are lower than those obtained in a rigid approximation; one reason may be attributed to the nonrigid contributions of the internal motion of the Cl2 molecule. Finally, the growing processes in molecular clusters are discussed, and it is pointed out that the growing mechanism can be affected due to the nonrigid initial conditions of smaller clusters such as ArnCl2 (n ? 4 or 5), which are seeds for higher-order clusters.

  11. Hierarchical Spatio-temporal Visual Analysis of Cluster Evolution in Electrocorticography Data

    DOE PAGES

    Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...

    2016-10-02

    Here, we present ECoG ClusterFlow, a novel interactive visual analysis tool for the exploration of high-resolution Electrocorticography (ECoG) data. Our system detects and visualizes dynamic high-level structures, such as communities, using the time-varying spatial connectivity network derived from the high-resolution ECoG data. ECoG ClusterFlow provides a multi-scale visualization of the spatio-temporal patterns underlying the time-varying communities using two views: 1) an overview summarizing the evolution of clusters over time and 2) a hierarchical glyph-based technique that uses data aggregation and small multiples techniques to visualize the propagation of clusters in their spatial domain. ECoG ClusterFlow makes it possible 1) tomore » compare the spatio-temporal evolution patterns across various time intervals, 2) to compare the temporal information at varying levels of granularity, and 3) to investigate the evolution of spatial patterns without occluding the spatial context information. Lastly, we present case studies done in collaboration with neuroscientists on our team for both simulated and real epileptic seizure data aimed at evaluating the effectiveness of our approach.« less

  12. Comparing the performance of biomedical clustering methods.

    PubMed

    Wiwie, Christian; Baumbach, Jan; Röttger, Richard

    2015-11-01

    Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.

  13. Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

    PubMed

    Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

    2018-07-01

    Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome.

    PubMed

    Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard

    2014-07-01

    Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.

  15. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). Conclusions: A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.« less

  16. Activation and comparative analysis of cryptic xiamycin gene cluster from marine-derived Streptomyces sp. FXJ 7.388.

    PubMed

    Uhong Lü, Yuhong; Liu, Xiaoli; Wang, Miao; Li, Yuanyuan; Liu, Ning; Bao, Yuxin; Liu, Minghao; Li, Xiaoqian; Wang, Yinyin; Qian, Shenyan; Yue, Changwu; Huang, Ying

    2016-09-01

    In order to obtain the natural products synthesized by the three putative xiamycin biosynthesis gene clusters which were predicted via antiSMASH during the genome mining of marine Streptomyces sp. FXJ 7.388, Streptomyces sp. FXJ 8.012, and Streptomyces olivaceus FXJ 7.023. Sixteen genes involved in xiamycin assembly, modification, and regulation with higher identity than the newest reported xiamycin biosynthetic gene cluster from marine Streptomyces sp. SCSIO 02999, Streptomyces sp. HKI0576, and Streptomyces sp. FXJ 7.388 were discovered via gene cluster comparative analysis. A ribosome engineering strategy was adopted to activate such cryptic gene clusters with different final concentrations antibiotics that act on the ribosome, and two indolosesquiterpenes were isolated from idlethaldose streptomycin-resistant Streptomyces sp. FXJ 7.388 strains. However, no such product was detected in Streptomyces sp. FXJ 8.012 and Streptomyces olivaceus FXJ 7.023 under the same treatment. This result suggested that these genes might hold the least gene content for xiamycin biosynthesis.

  17. Calibrating the Planck cluster mass scale with CLASH

    NASA Astrophysics Data System (ADS)

    Penna-Lima, M.; Bartlett, J. G.; Rozo, E.; Melin, J.-B.; Merten, J.; Evrard, A. E.; Postman, M.; Rykoff, E.

    2017-08-01

    We determine the mass scale of Planck galaxy clusters using gravitational lensing mass measurements from the Cluster Lensing And Supernova survey with Hubble (CLASH). We have compared the lensing masses to the Planck Sunyaev-Zeldovich (SZ) mass proxy for 21 clusters in common, employing a Bayesian analysis to simultaneously fit an idealized CLASH selection function and the distribution between the measured observables and true cluster mass. We used a tiered analysis strategy to explicitly demonstrate the importance of priors on weak lensing mass accuracy. In the case of an assumed constant bias, bSZ, between true cluster mass, M500, and the Planck mass proxy, MPL, our analysis constrains 1-bSZ = 0.73 ± 0.10 when moderate priors on weak lensing accuracy are used, including a zero-mean Gaussian with standard deviation of 8% to account for possible bias in lensing mass estimations. Our analysis explicitly accounts for possible selection bias effects in this calibration sourced by the CLASH selection function. Our constraint on the cluster mass scale is consistent with recent results from the Weighing the Giants program and the Canadian Cluster Comparison Project. It is also consistent, at 1.34σ, with the value needed to reconcile the Planck SZ cluster counts with Planck's base ΛCDM model fit to the primary cosmic microwave background anisotropies.

  18. Comparative Analysis of Enrollment and Financial Strength of Private Institutions. AIR 1989 Annual Forum Paper.

    ERIC Educational Resources Information Center

    Glover, Robert H.; Mills, Michael R.

    A research design, decision support system, and results of a comparative analysis of enrollment and financial strength (of private institutions granting masters and doctoral degrees) are presented. Cluster analysis, discriminant analysis, multiple regression, and an interactive decision support system are used to compare the enrollment and…

  19. Clusters of midlife women by physical activity and their racial/ethnic differences.

    PubMed

    Im, Eun-Ok; Ko, Young; Chee, Eunice; Chee, Wonshik; Mao, Jun James

    2017-04-01

    The purpose of this study was to identify clusters of midlife women by physical activity and to determine racial/ethnic differences in physical activities in each cluster. This was a secondary analysis of the data from 542 women (157 non-Hispanic [NH] Whites, 127 Hispanics, 135 NH African Americans, and 123 NH Asian) in a larger Internet study on midlife women's attitudes toward physical activity. The instruments included the Barriers to Health Activities Scale, the Physical Activity Assessment Inventory, the Questions on Attitudes toward Physical Activity, Subjective Norm, Perceived Behavioral Control, and Behavioral Intention, and the Kaiser Physical Activity Survey. The data were analyzed using hierarchical cluster analyses, analysis of variance, and multinominal logistic analyses. A three-cluster solution was adopted: cluster 1 (high active living and sports/exercise activity group; 48%), cluster 2 (high household/caregiving and occupational activity group; 27%), and cluster 3 (low active living and sports/exercise activity group; 26%). There were significant racial/ethnic differences in occupational activities of clusters 1 and 3 (all P < 0.01). Compared with cluster 1, cluster 2 tended to have lower family income, less access to health care, higher unemployment, higher perceived barriers scores, and lower social influences scores (all P < 0.01). Compared with cluster 1, cluster 3 tended to have greater obesity, less access to health care, higher perceived barriers scores, more negative attitudes toward physical activity, and lower self-efficacy scores (all P < 0.01). Midlife women's unique patterns of physical activity and their associated factors need to be considered in future intervention development.

  20. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    PubMed Central

    Liu, Jingxian; Wu, Kefeng

    2017-01-01

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with traditional spectral clustering and fast affinity propagation clustering. Experimental results have illustrated its superior performance in terms of quantitative and qualitative evaluations. PMID:28777353

  1. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    PubMed

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae

    PubMed Central

    Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira

    2011-01-01

    Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716

  3. Symptom clusters in women with breast cancer: an analysis of data from social media and a research study

    PubMed Central

    Marshall, Sarah A.; Yang, Christopher C.; Ping, Qing; Zhao, Mengnan; Avis, Nancy E.

    2016-01-01

    Purpose User-generated content on social media sites, such as health-related online forums, offers researchers a tantalizing amount of information, but concerns regarding scientific application of such data remain. This paper compares and contrasts symptom cluster patterns derived from messages on a breast cancer forum with those from a symptom checklist completed by breast cancer survivors participating in a research study. Methods Over 50,000 messages generated by 12,991 users of the breast cancer forum on MedHelp.org were transformed into a standard form and examined for the co-occurrence of 25 symptoms. The k-medoid clustering method was used to determine appropriate placement of symptoms within clusters. Findings were compared with a similar analysis of a symptom checklist administered to 653 breast cancer survivors participating in a research study. Results The following clusters were identified using forum data: menopausal/psychological, pain/fatigue, gastrointestinal, and miscellaneous. Study data generated the clusters: menopausal, pain, fatigue/sleep/gastrointestinal, psychological, and increased weight/appetite. Although the clusters are somewhat different, many symptoms that clustered together in the social media analysis remained together in the analysis of the study participants. Density of connections between symptoms, as reflected by rates of co-occurrence and similarity, was higher in the study data. Conclusions The copious amount of data generated by social media outlets can augment findings from traditional data sources. When different sources of information are combined, areas of overlap and discrepancy can be detected, perhaps giving researchers a more accurate picture of reality. However, data derived from social media must be used carefully and with understanding of its limitations. PMID:26476836

  4. Symptom clusters in women with breast cancer: an analysis of data from social media and a research study.

    PubMed

    Marshall, Sarah A; Yang, Christopher C; Ping, Qing; Zhao, Mengnan; Avis, Nancy E; Ip, Edward H

    2016-03-01

    User-generated content on social media sites, such as health-related online forums, offers researchers a tantalizing amount of information, but concerns regarding scientific application of such data remain. This paper compares and contrasts symptom cluster patterns derived from messages on a breast cancer forum with those from a symptom checklist completed by breast cancer survivors participating in a research study. Over 50,000 messages generated by 12,991 users of the breast cancer forum on MedHelp.org were transformed into a standard form and examined for the co-occurrence of 25 symptoms. The k-medoid clustering method was used to determine appropriate placement of symptoms within clusters. Findings were compared with a similar analysis of a symptom checklist administered to 653 breast cancer survivors participating in a research study. The following clusters were identified using forum data: menopausal/psychological, pain/fatigue, gastrointestinal, and miscellaneous. Study data generated the clusters: menopausal, pain, fatigue/sleep/gastrointestinal, psychological, and increased weight/appetite. Although the clusters are somewhat different, many symptoms that clustered together in the social media analysis remained together in the analysis of the study participants. Density of connections between symptoms, as reflected by rates of co-occurrence and similarity, was higher in the study data. The copious amount of data generated by social media outlets can augment findings from traditional data sources. When different sources of information are combined, areas of overlap and discrepancy can be detected, perhaps giving researchers a more accurate picture of reality. However, data derived from social media must be used carefully and with understanding of its limitations.

  5. An Approach to Cluster EU Member States into Groups According to Pathways of Salmonella in the Farm-to-Consumption Chain for Pork Products.

    PubMed

    Vigre, Håkan; Domingues, Ana Rita Coutinho Calado; Pedersen, Ulrik Bo; Hald, Tine

    2016-03-01

    The aim of the project as the cluster analysis was to in part to develop a generic structured quantitative microbiological risk assessment (QMRA) model of human salmonellosis due to pork consumption in EU member states (MSs), and the objective of the cluster analysis was to group the EU MSs according to the relative contribution of different pathways of Salmonella in the farm-to-consumption chain of pork products. In the development of the model, by selecting a case study MS from each cluster the model was developed to represent different aspects of pig production, pork production, and consumption of pork products across EU states. The objective of the cluster analysis was to aggregate MSs into groups of countries with similar importance of different pathways of Salmonella in the farm-to-consumption chain using available, and where possible, universal register data related to the pork production and consumption in each country. Based on MS-specific information about distribution of (i) small and large farms, (ii) small and large slaughterhouses, (iii) amount of pork meat consumed, and (iv) amount of sausages consumed we used nonhierarchical and hierarchical cluster analysis to group the MSs. The cluster solutions were validated internally using statistic measures and externally by comparing the clustered MSs with an estimated human incidence of salmonellosis due to pork products in the MSs. Finally, each cluster was characterized qualitatively using the centroids of the clusters. © 2016 Society for Risk Analysis.

  6. Density-cluster NMA: A new protein decomposition technique for coarse-grained normal mode analysis.

    PubMed

    Demerdash, Omar N A; Mitchell, Julie C

    2012-07-01

    Normal mode analysis has emerged as a useful technique for investigating protein motions on long time scales. This is largely due to the advent of coarse-graining techniques, particularly Hooke's Law-based potentials and the rotational-translational blocking (RTB) method for reducing the size of the force-constant matrix, the Hessian. Here we present a new method for domain decomposition for use in RTB that is based on hierarchical clustering of atomic density gradients, which we call Density-Cluster RTB (DCRTB). The method reduces the number of degrees of freedom by 85-90% compared with the standard blocking approaches. We compared the normal modes from DCRTB against standard RTB using 1-4 residues in sequence in a single block, with good agreement between the two methods. We also show that Density-Cluster RTB and standard RTB perform well in capturing the experimentally determined direction of conformational change. Significantly, we report superior correlation of DCRTB with B-factors compared with 1-4 residue per block RTB. Finally, we show significant reduction in computational cost for Density-Cluster RTB that is nearly 100-fold for many examples. Copyright © 2012 Wiley Periodicals, Inc.

  7. Impact of Sampling Density on the Extent of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2014-01-01

    Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430

  8. Molecular Eigensolution Symmetry Analysis and Fine Structure

    PubMed Central

    Harter, William G.; Mitchell, Justin C.

    2013-01-01

    Spectra of high-symmetry molecules contain fine and superfine level cluster structure related to J-tunneling between hills and valleys on rovibronic energy surfaces (RES). Such graphic visualizations help disentangle multi-level dynamics, selection rules, and state mixing effects including widespread violation of nuclear spin symmetry species. A review of RES analysis compares it to that of potential energy surfaces (PES) used in Born–Oppenheimer approximations. Both take advantage of adiabatic coupling in order to visualize Hamiltonian eigensolutions. RES of symmetric and D2 asymmetric top rank-2-tensor Hamiltonians are compared with Oh spherical top rank-4-tensor fine-structure clusters of 6-fold and 8-fold tunneling multiplets. Then extreme 12-fold and 24-fold multiplets are analyzed by RES plots of higher rank tensor Hamiltonians. Such extreme clustering is rare in fundamental bands but prevalent in hot bands, and analysis of its superfine structure requires more efficient labeling and a more powerful group theory. This is introduced using elementary examples involving two groups of order-6 (C6 and D3~C3v), then applied to families of Oh clusters in SF6 spectra and to extreme clusters. PMID:23344041

  9. Dimensional assessment of personality pathology in patients with eating disorders.

    PubMed

    Goldner, E M; Srikameswaran, S; Schroeder, M L; Livesley, W J; Birmingham, C L

    1999-02-22

    This study examined patients with eating disorders on personality pathology using a dimensional method. Female subjects who met DSM-IV diagnostic criteria for eating disorder (n = 136) were evaluated and compared to an age-controlled general population sample (n = 68). We assessed 18 features of personality disorder with the Dimensional Assessment of Personality Pathology - Basic Questionnaire (DAPP-BQ). Factor analysis and cluster analysis were used to derive three clusters of patients. A five-factor solution was obtained with limited intercorrelation between factors. Cluster analysis produced three clusters with the following characteristics: Cluster 1 members (constituting 49.3% of the sample and labelled 'rigid') had higher mean scores on factors denoting compulsivity and interpersonal difficulties; Cluster 2 (18.4% of the sample) showed highest scores in factors denoting psychopathy, neuroticism and impulsive features, and appeared to constitute a borderline psychopathology group; Cluster 3 (32.4% of the sample) was characterized by few differences in personality pathology in comparison to the normal population sample. Cluster membership was associated with DSM-IV diagnosis -- a large proportion of patients with anorexia nervosa were members of Cluster 1. An empirical classification of eating-disordered patients derived from dimensional assessment of personality pathology identified three groups with clinical relevance.

  10. A Preliminary Comparison of the Effectiveness of Cluster Analysis Weighting Procedures for Within-Group Covariance Structure.

    ERIC Educational Resources Information Center

    Donoghue, John R.

    A Monte Carlo study compared the usefulness of six variable weighting methods for cluster analysis. Data were 100 bivariate observations from 2 subgroups, generated according to a finite normal mixture model. Subgroup size, within-group correlation, within-group variance, and distance between subgroup centroids were manipulated. Of the clustering…

  11. Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

    NASA Astrophysics Data System (ADS)

    Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

    2015-03-01

    Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.

  12. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

    PubMed

    Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

    2016-11-01

    To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q -EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.

  13. Application of clustering analysis in the prediction of photovoltaic power generation based on neural network

    NASA Astrophysics Data System (ADS)

    Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.

    2017-11-01

    In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.

  14. A roadmap of clustering algorithms: finding a match for a biomedical application.

    PubMed

    Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

    2009-05-01

    Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

  15. Automated classification of mouse pup isolation syllables: from cluster analysis to an Excel-based "mouse pup syllable classification calculator".

    PubMed

    Grimsley, Jasmine M S; Gadziola, Marie A; Wenstrup, Jeffrey J

    2012-01-01

    Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified 10 syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.

  16. Micro-heterogeneity versus clustering in binary mixtures of ethanol with water or alkanes.

    PubMed

    Požar, Martina; Lovrinčević, Bernarda; Zoranić, Larisa; Primorać, Tomislav; Sokolić, Franjo; Perera, Aurélien

    2016-08-24

    Ethanol is a hydrogen bonding liquid. When mixed in small concentrations with water or alkanes, it forms aggregate structures reminiscent of, respectively, the direct and inverse micellar aggregates found in emulsions, albeit at much smaller sizes. At higher concentrations, micro-heterogeneous mixing with segregated domains is found. We examine how different statistical methods, namely correlation function analysis, structure factor analysis and cluster distribution analysis, can describe efficiently these morphological changes in these mixtures. In particular, we explain how the neat alcohol pre-peak of the structure factor evolves into the domain pre-peak under mixing conditions, and how this evolution differs whether the co-solvent is water or alkane. This study clearly establishes the heuristic superiority of the correlation function/structure factor analysis to study the micro-heterogeneity, since cluster distribution analysis is insensitive to domain segregation. Correlation functions detect the domains, with a clear structure factor pre-peak signature, while the cluster techniques detect the cluster hierarchy within domains. The main conclusion is that, in micro-segregated mixtures, the domain structure is a more fundamental statistical entity than the underlying cluster structures. These findings could help better understand comparatively the radiation scattering experiments, which are sensitive to domains, versus the spectroscopy-NMR experiments, which are sensitive to clusters.

  17. Identification of atypical flight patterns

    NASA Technical Reports Server (NTRS)

    Statler, Irving C. (Inventor); Ferryman, Thomas A. (Inventor); Amidan, Brett G. (Inventor); Whitney, Paul D. (Inventor); White, Amanda M. (Inventor); Willse, Alan R. (Inventor); Cooley, Scott K. (Inventor); Jay, Joseph Griffith (Inventor); Lawrence, Robert E. (Inventor); Mosbrucker, Chris (Inventor)

    2005-01-01

    Method and system for analyzing aircraft data, including multiple selected flight parameters for a selected phase of a selected flight, and for determining when the selected phase of the selected flight is atypical, when compared with corresponding data for the same phase for other similar flights. A flight signature is computed using continuous-valued and discrete-valued flight parameters for the selected flight parameters and is optionally compared with a statistical distribution of other observed flight signatures, yielding atypicality scores for the same phase for other similar flights. A cluster analysis is optionally applied to the flight signatures to define an optimal collection of clusters. A level of atypicality for a selected flight is estimated, based upon an index associated with the cluster analysis.

  18. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study

    PubMed Central

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637

  19. Clustering stocks using partial correlation coefficients

    NASA Astrophysics Data System (ADS)

    Jung, Sean S.; Chang, Woojin

    2016-11-01

    A partial correlation analysis is performed on the Korean stock market (KOSPI). The difference between Pearson correlation and the partial correlation is analyzed and it is found that when conditioned on the market return, Pearson correlation coefficients are generally greater than those of the partial correlation, which implies that the market return tends to drive up the correlation between stock returns. A clustering analysis is then performed to study the market structure given by the partial correlation analysis and the members of the clusters are compared with the Global Industry Classification Standard (GICS). The initial hypothesis is that the firms in the same GICS sector are clustered together since they are in a similar business and environment. However, the result is inconsistent with the hypothesis and most clusters are a mix of multiple sectors suggesting that the traditional approach of using sectors to determine the proximity between stocks may not be sufficient enough to diversify a portfolio.

  20. KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

    PubMed

    Laetsch, Dominik R; Blaxter, Mark L

    2017-10-05

    The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.

  1. Termination of seizure clusters is related to the duration of focal seizures.

    PubMed

    Ferastraoaru, Victor; Schulze-Bonhage, Andreas; Lipton, Richard B; Dümpelmann, Matthias; Legatt, Alan D; Blumberg, Julie; Haut, Sheryl R

    2016-06-01

    Clustered seizures are characterized by shorter than usual interseizure intervals and pose increased morbidity risk. This study examines the characteristics of seizures that cluster, with special attention to the final seizure in a cluster. This is a retrospective analysis of long-term inpatient monitoring data from the EPILEPSIAE project. Patients underwent presurgical evaluation from 2002 to 2009. Seizure clusters were defined by the occurrence of at least two consecutive seizures with interseizure intervals of <4 h. Other definitions of seizure clustering were examined in a sensitivity analysis. Seizures were classified into three contextually defined groups: isolated seizures (not meeting clustering criteria), terminal seizure (last seizure in a cluster), and intracluster seizures (any other seizures within a cluster). Seizure characteristics were compared among the three groups in terms of duration, type (focal seizures remaining restricted to one hemisphere vs. evolving bilaterally), seizure origin, and localization concordance among pairs of consecutive seizures. Among 92 subjects, 77 (83%) had at least one seizure cluster. The intracluster seizures were significantly shorter than the last seizure in a cluster (p = 0.011), whereas the last seizure in a cluster resembled the isolated seizures in terms of duration. Although focal only (unilateral), seizures were shorter than seizures that evolved bilaterally and there was no correlation between the seizure type and the seizure position in relation to a cluster (p = 0.762). Frontal and temporal lobe seizures were more likely to cluster compared with other localizations (p = 0.009). Seizure pairs that are part of a cluster were more likely to have a concordant origin than were isolated seizures. Results were similar for the 2 h definition of clustering, but not for the 8 h definition of clustering. We demonstrated that intracluster seizures are short relative to isolated seizures and terminal seizures. Frontal and temporal lobe seizures are more likely to cluster. Wiley Periodicals, Inc. © 2016 International League Against Epilepsy.

  2. Identification of five clusters of comorbidities in a longitudinal Japanese chronic obstructive pulmonary disease cohort.

    PubMed

    Chubachi, Shotaro; Sato, Minako; Kameyama, Naofumi; Tsutsumi, Akihiro; Sasaki, Mamoru; Tateno, Hiroki; Nakamura, Hidetoshi; Asano, Koichiro; Betsuyaku, Tomoko

    2016-08-01

    Patients with chronic obstructive pulmonary disease (COPD) frequently suffer from various comorbidities. Recently, cluster analysis has been proposed to examine the phenotypic heterogeneity in COPD. In order to comprehensively understand the comorbidities of COPD in Japan, we conducted multicenter, longitudinal cohort study, called the Keio COPD Comorbidity Research (K-CCR). In this cohort, comorbid diagnoses were established by both objective examination and review of clinical records, in addition to self-report. We aimed to investigate the clustering of nineteen clinically relevant comorbidities and the meaningful outcomes of the clusters over a two-year follow-up period. The present study analyzed data from COPD patients whose data of comorbidities were completed (n = 311). Cluster analysis was performed using Ward's minimum-variance method. Five comorbidity clusters were identified: less comorbidity; malignancy; metabolic and cardiovascular; gastroesophageal reflux disease (GERD) and psychological; and underweight and anemic. FEV1 did not differ among the clusters. GERD and psychological cluster had worse COPD assessment test (CAT) and Saint George's respiratory questionnaire (SGRQ) at baseline compared to the other clusters (CAT: p = 0.0003 and SGRQ: p = 0.00046). The rate of change in these scores did not differ within 2 years. The underweight and anemic cluster included subjects with lower baseline ratio of predicted diffusing capacity (DLco/VA) compared to the malignancy cluster (p = 0.036). Five clusters of comorbidities were identified in Japanese COPD patients. The clinical characteristics and health-related quality of life were different among these clusters during a follow-up of two years. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Automatic Clustering Using FSDE-Forced Strategy Differential Evolution

    NASA Astrophysics Data System (ADS)

    Yasid, A.

    2018-01-01

    Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.

  4. A Survey on Node Clustering in Cognitive Radio Wireless Sensor Networks.

    PubMed

    Joshi, Gyanendra Prasad; Kim, Sung Won

    2016-09-10

    Cognitive radio wireless sensor networks (CR-WSNs) have attracted a great deal of attention recently due to the emerging spectrum scarcity issue. This work attempts to provide a detailed analysis of the role of node clustering in CR-WSNs. We outline the objectives, requirements, and advantages of node clustering in CR-WSNs. We describe how a CR-WSN with node clustering differs from conventional wireless sensor networks, and we discuss its characteristics, architecture, and topologies. We survey the existing clustering algorithms and compare their objectives and features. We suggest how clustering issues and challenges can be handled.

  5. Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

    PubMed Central

    Liu, Wenfen

    2017-01-01

    Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447

  6. Transmission clustering among newly diagnosed HIV patients in Chicago, 2008 to 2011: using phylogenetics to expand knowledge of regional HIV transmission patterns

    PubMed Central

    Lubelchek, Ronald J.; Hoehnen, Sarah C.; Hotton, Anna L.; Kincaid, Stacey L.; Barker, David E.; French, Audrey L.

    2014-01-01

    Introduction HIV transmission cluster analyses can inform HIV prevention efforts. We describe the first such assessment for transmission clustering among HIV patients in Chicago. Methods We performed transmission cluster analyses using HIV pol sequences from newly diagnosed patients presenting to Chicago’s largest HIV clinic between 2008 and 2011. We compared sequences via progressive pairwise alignment, using neighbor joining to construct an un-rooted phylogenetic tree. We defined clusters as >2 sequences among which each sequence had at least one partner within a genetic distance of ≤ 1.5%. We used multivariable regression to examine factors associated with clustering and used geospatial analysis to assess geographic proximity of phylogenetically clustered patients. Results We compared sequences from 920 patients; median age 35 years; 75% male; 67% Black, 23% Hispanic; 8% had a Rapid Plasma Reagin (RPR) titer ≥ 1:16 concurrent with their HIV diagnosis. We had HIV transmission risk data for 54%; 43% identified as men who have sex with men (MSM). Phylogenetic analysis demonstrated 123 patients (13%) grouped into 26 clusters, the largest having 20 members. In multivariable regression, age < 25, Black race, MSM status, male gender, higher HIV viral load, and RPR ≥ 1:16 associated with clustering. We did not observe geographic grouping of genetically clustered patients. Discussion Our results demonstrate high rates of HIV transmission clustering, without local geographic foci, among young Black MSM in Chicago. Applied prospectively, phylogenetic analyses could guide prevention efforts and help break the cycle of transmission. PMID:25321182

  7. Suicide in the oldest old: an observational study and cluster analysis.

    PubMed

    Sinyor, Mark; Tan, Lynnette Pei Lin; Schaffer, Ayal; Gallagher, Damien; Shulman, Kenneth

    2016-01-01

    The older population are at a high risk for suicide. This study sought to learn more about the characteristics of suicide in the oldest-old and to use a cluster analysis to determine if oldest-old suicide victims assort into clinically meaningful subgroups. Data were collected from a coroner's chart review of suicide victims in Toronto from 1998 to 2011. We compared two age groups (65-79 year olds, n = 335, and 80+ year olds, n = 191) and then conducted a hierarchical agglomerative cluster analysis using Ward's method to identify distinct clusters in the 80+ group. The younger and older age groups differed according to marital status, living circumstances and pattern of stressors. The cluster analysis identified three distinct clusters in the 80+ group. Cluster 1 was the largest (n = 124) and included people who were either married or widowed who had significantly more depression and somewhat more medical health stressors. In contrast, cluster 2 (n = 50) comprised people who were almost all single and living alone with significantly less identified depression and slightly fewer medical health stressors. All members of cluster 3 (n = 17) lived in a retirement residence or nursing home, and this group had the highest rates of depression, dementia, other mental illness and past suicide attempts. This is the first study to use the cluster analysis technique to identify meaningful subgroups among suicide victims in the oldest-old. The results reveal different patterns of suicide in the older population that may be relevant for clinical care. Copyright © 2015 John Wiley & Sons, Ltd.

  8. Ten-year results of a ponderosa pine progeny test in the Black Hills

    Treesearch

    Wayne D. Shepperd; Sue E. McElderry

    1986-01-01

    Ten-year survival and growth of seedlings from 77 parent trees from throughout the Black Hills were compared, using a cluster-analysis technique. Five clusters were identified that account for most of the variability in survival and growth of the open-pollinated families. One cluster, containing 6 families, exhibited exceptional survival and growth. Another, containing...

  9. Associations of Streptococcus suis Serotype 2 Ribotype Profiles with Clinical Disease and Antimicrobial Resistance

    PubMed Central

    Rasmussen, S. R.; Aarestrup, F. M.; Jensen, N. E.; Jorsal, S. E.

    1999-01-01

    A total of 122 Streptococcus suis serotype 2 strains were characterized thoroughly by comparing clinical and pathological observations, ribotype profiles, and antimicrobial resistance. Twenty-one different ribotype profiles were found and compared by cluster analysis, resulting in the identification of three ribotype clusters. A total of 58% of all strains investigated were of two ribotypes belonging to different ribotype clusters. A remarkable relationship existed between the observed ribotype profiles and the clinical-pathological observations because strains of one of the two dominant ribotypes were almost exclusively isolated from pigs with meningitis, while strains of the other dominant ribotype were never associated with meningitis. This second ribotype was isolated only from pigs with pneumonia, endocarditis, pericarditis, or septicemia. Cluster analysis revealed that strains belonging to the same ribotype cluster as one of the dominant ribotypes came from pigs that showed clinical signs similar to those of pigs infected with strains with the respective dominant ribotype profiles. Furthermore, strains belonging to different ribotype clusters had totally different patterns of resistance to antibiotics because strains isolated from pigs with meningitis were resistant to sulfamethazoxazole and strains isolated from pigs with pneumonia, endocarditis, pericarditis, or septicemia were resistant to tetracycline. PMID:9889228

  10. Characteristics of airflow and particle deposition in COPD current smokers

    NASA Astrophysics Data System (ADS)

    Zou, Chunrui; Choi, Jiwoong; Haghighi, Babak; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long

    2017-11-01

    A recent imaging-based cluster analysis of computed tomography (CT) lung images in a chronic obstructive pulmonary disease (COPD) cohort identified four clusters, viz. disease sub-populations. Cluster 1 had relatively normal airway structures; Cluster 2 had wall thickening; Cluster 3 exhibited decreased wall thickness and luminal narrowing; Cluster 4 had a significant decrease of luminal diameter and a significant reduction of lung deformation, thus having relatively low pulmonary functions. To better understand the characteristics of airflow and particle deposition in these clusters, we performed computational fluid and particle dynamics analyses on representative cluster patients and healthy controls using CT-based airway models and subject-specific 3D-1D coupled boundary conditions. The results show that particle deposition in central airways of cluster 4 patients was noticeably increased especially with increasing particle size despite reduced vital capacity as compared to other clusters and healthy controls. This may be attributable in part to significant airway constriction in cluster 4. This study demonstrates the potential application of cluster-guided CFD analysis in disease populations. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837.

  11. An unsupervised classification technique for multispectral remote sensing data.

    NASA Technical Reports Server (NTRS)

    Su, M. Y.; Cummings, R. E.

    1973-01-01

    Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.

  12. Unsupervised classification of earth resources data.

    NASA Technical Reports Server (NTRS)

    Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.

    1972-01-01

    A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.

  13. Comparative genomics reveals phylogenetic distribution patterns of secondary metabolites in Amycolatopsis species.

    PubMed

    Adamek, Martina; Alanjary, Mohammad; Sales-Ortells, Helena; Goodfellow, Michael; Bull, Alan T; Winkler, Anika; Wibberg, Daniel; Kalinowski, Jörn; Ziemert, Nadine

    2018-06-01

    Genome mining tools have enabled us to predict biosynthetic gene clusters that might encode compounds with valuable functions for industrial and medical applications. With the continuously increasing number of genomes sequenced, we are confronted with an overwhelming number of predicted clusters. In order to guide the effective prioritization of biosynthetic gene clusters towards finding the most promising compounds, knowledge about diversity, phylogenetic relationships and distribution patterns of biosynthetic gene clusters is necessary. Here, we provide a comprehensive analysis of the model actinobacterial genus Amycolatopsis and its potential for the production of secondary metabolites. A phylogenetic characterization, together with a pan-genome analysis showed that within this highly diverse genus, four major lineages could be distinguished which differed in their potential to produce secondary metabolites. Furthermore, we were able to distinguish gene cluster families whose distribution correlated with phylogeny, indicating that vertical gene transfer plays a major role in the evolution of secondary metabolite gene clusters. Still, the vast majority of the diverse biosynthetic gene clusters were derived from clusters unique to the genus, and also unique in comparison to a database of known compounds. Our study on the locations of biosynthetic gene clusters in the genomes of Amycolatopsis' strains showed that clusters acquired by horizontal gene transfer tend to be incorporated into non-conserved regions of the genome thereby allowing us to distinguish core and hypervariable regions in Amycolatopsis genomes. Using a comparative genomics approach, it was possible to determine the potential of the genus Amycolatopsis to produce a huge diversity of secondary metabolites. Furthermore, the analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites. Our results cast light on the interconnections between secondary metabolite gene clusters and provide a way to prioritize biosynthetic pathways in the search and discovery of novel compounds.

  14. Toward An Understanding of Cluster Evolution: A Deep X-Ray Selected Cluster Catalog from ROSAT

    NASA Technical Reports Server (NTRS)

    Jones, Christine; Oliversen, Ronald (Technical Monitor)

    2002-01-01

    In the past year, we have focussed on studying individual clusters found in this sample with Chandra, as well as using Chandra to measure the luminosity-temperature relation for a sample of distant clusters identified through the ROSAT study, and finally we are continuing our study of fossil groups. For the luminosity-temperature study, we compared a sample of nearby clusters with a sample of distant clusters and, for the first time, measured a significant change in the relation as a function of redshift (Vikhlinin et al. in final preparation for submission to Cape). We also used our ROSAT analysis to select and propose for Chandra observations of individual clusters. We are now analyzing the Chandra observations of the distant cluster A520, which appears to have undergone a recent merger. Finally, we have completed the analysis of the fossil groups identified in ROM observations. In the past few months, we have derived X-ray fluxes and luminosities as well as X-ray extents for an initial sample of 89 objects. Based on the X-ray extents and the lack of bright galaxies, we have identified 16 fossil groups. We are comparing their X-ray and optical properties with those of optically rich groups. A paper is being readied for submission (Jones, Forman, and Vikhlinin in preparation).

  15. Scoring clustering solutions by their biological relevance.

    PubMed

    Gat-Viks, I; Sharan, R; Shamir, R

    2003-12-12

    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.

  16. Using data mining to segment healthcare markets from patients' preference perspectives.

    PubMed

    Liu, Sandra S; Chen, Jie

    2009-01-01

    This paper aims to provide an example of how to use data mining techniques to identify patient segments regarding preferences for healthcare attributes and their demographic characteristics. Data were derived from a number of individuals who received in-patient care at a health network in 2006. Data mining and conventional hierarchical clustering with average linkage and Pearson correlation procedures are employed and compared to show how each procedure best determines segmentation variables. Data mining tools identified three differentiable segments by means of cluster analysis. These three clusters have significantly different demographic profiles. The study reveals, when compared with traditional statistical methods, that data mining provides an efficient and effective tool for market segmentation. When there are numerous cluster variables involved, researchers and practitioners need to incorporate factor analysis for reducing variables to clearly and meaningfully understand clusters. Interests and applications in data mining are increasing in many businesses. However, this technology is seldom applied to healthcare customer experience management. The paper shows that efficient and effective application of data mining methods can aid the understanding of patient healthcare preferences.

  17. Preliminary Comparisons of the Information Content and Utility of TM Versus MSS Data

    NASA Technical Reports Server (NTRS)

    Markham, B. L.

    1984-01-01

    Comparisons were made between subscenes from the first TM scene acquired of the Washington, D.C. area and a MSS scene acquired approximately one year earlier. Three types of analyses were conducted to compare TM and MSS data: a water body analysis, a principal components analysis and a spectral clustering analysis. The water body analysis compared the capability of the TM to the MSS for detecting small uniform targets. Of the 59 ponds located on aerial photographs 34 (58%) were detected by the TM with six commission errors (15%) and 13 (22%) were detected by the MSS with three commission errors (19%). The smallest water body detected by the TM was 16 meters; the smallest detected by the MSS was 40 meters. For the principal components analysis, means and covariance matrices were calculated for each subscene, and principal components images generated and characterized. In the spectral clustering comparison each scene was independently clustered and the clusters were assigned to informational classes. The preliminary comparison indicated that TM data provides enhancements over MSS in terms of (1) small target detection and (2) data dimensionality (even with 4-band data). The extra dimension, partially resultant from TM band 1, appears useful for built-up/non-built-up area separation.

  18. Cluster analysis: a new approach for identification of underlying risk factors for coronary artery disease in essential hypertensive patients.

    PubMed

    Guo, Qi; Lu, Xiaoni; Gao, Ya; Zhang, Jingjing; Yan, Bin; Su, Dan; Song, Anqi; Zhao, Xi; Wang, Gang

    2017-03-07

    Grading of essential hypertension according to blood pressure (BP) level may not adequately reflect clinical heterogeneity of hypertensive patients. This study was carried out to explore clinical phenotypes in essential hypertensive patients using cluster analysis. This study recruited 513 hypertensive patients and evaluated BP variations with ambulatory blood pressure monitoring. Four distinct hypertension groups were identified using cluster analysis: (1) younger male smokers with relatively high BP had the most severe carotid plaque thickness but no coronary artery disease (CAD); (2) older women with relatively low diastolic BP had more diabetes; (3) non-smokers with a low systolic BP level had neither diabetes nor CAD; (4) hypertensive patients with BP reverse dipping were most likely to have CAD but had least severe carotid plaque thickness. In binary logistic analysis, reverse dipping was significantly associated with prevalence of CAD. Cluster analysis was shown to be a feasible approach for investigating the heterogeneity of essential hypertension in clinical studies. BP reverse dipping might be valuable for prediction of CAD in hypertensive patients when compared with carotid plaque thickness. However, large-scale prospective trials with more information of plaque morphology are necessary to further compare the predicative power between BP dipping pattern and carotid plaque.

  19. Cluster analysis: a new approach for identification of underlying risk factors for coronary artery disease in essential hypertensive patients

    PubMed Central

    Guo, Qi; Lu, Xiaoni; Gao, Ya; Zhang, Jingjing; Yan, Bin; Su, Dan; Song, Anqi; Zhao, Xi; Wang, Gang

    2017-01-01

    Grading of essential hypertension according to blood pressure (BP) level may not adequately reflect clinical heterogeneity of hypertensive patients. This study was carried out to explore clinical phenotypes in essential hypertensive patients using cluster analysis. This study recruited 513 hypertensive patients and evaluated BP variations with ambulatory blood pressure monitoring. Four distinct hypertension groups were identified using cluster analysis: (1) younger male smokers with relatively high BP had the most severe carotid plaque thickness but no coronary artery disease (CAD); (2) older women with relatively low diastolic BP had more diabetes; (3) non-smokers with a low systolic BP level had neither diabetes nor CAD; (4) hypertensive patients with BP reverse dipping were most likely to have CAD but had least severe carotid plaque thickness. In binary logistic analysis, reverse dipping was significantly associated with prevalence of CAD. Cluster analysis was shown to be a feasible approach for investigating the heterogeneity of essential hypertension in clinical studies. BP reverse dipping might be valuable for prediction of CAD in hypertensive patients when compared with carotid plaque thickness. However, large-scale prospective trials with more information of plaque morphology are necessary to further compare the predicative power between BP dipping pattern and carotid plaque. PMID:28266630

  20. Global detection approach for clustered microcalcifications in mammograms using a deep learning network.

    PubMed

    Wang, Juan; Nishikawa, Robert M; Yang, Yongyi

    2017-04-01

    In computerized detection of clustered microcalcifications (MCs) from mammograms, the traditional approach is to apply a pattern detector to locate the presence of individual MCs, which are subsequently grouped into clusters. Such an approach is often susceptible to the occurrence of false positives (FPs) caused by local image patterns that resemble MCs. We investigate the feasibility of a direct detection approach to determining whether an image region contains clustered MCs or not. Toward this goal, we develop a deep convolutional neural network (CNN) as the classifier model to which the input consists of a large image window ([Formula: see text] in size). The multiple layers in the CNN classifier are trained to automatically extract image features relevant to MCs at different spatial scales. In the experiments, we demonstrated this approach on a dataset consisting of both screen-film mammograms and full-field digital mammograms. We evaluated the detection performance both on classifying image regions of clustered MCs using a receiver operating characteristic (ROC) analysis and on detecting clustered MCs from full mammograms by a free-response receiver operating characteristic analysis. For comparison, we also considered a recently developed MC detector with FP suppression. In classifying image regions of clustered MCs, the CNN classifier achieved 0.971 in the area under the ROC curve, compared to 0.944 for the MC detector. In detecting clustered MCs from full mammograms, at 90% sensitivity, the CNN classifier obtained an FP rate of 0.69 clusters/image, compared to 1.17 clusters/image by the MC detector. These results indicate that using global image features can be more effective in discriminating clustered MCs from FPs caused by various sources, such as linear structures, thereby providing a more accurate detection of clustered MCs on mammograms.

  1. A ground truth based comparative study on clustering of gene expression data.

    PubMed

    Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue

    2008-05-01

    Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

  2. Functional clustering of time series gene expression data by Granger causality

    PubMed Central

    2012-01-01

    Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425

  3. Comparative genomic analysis of secondary metabolite biosynthetic gene clusters in 207 isolates of Fusarium

    USDA-ARS?s Scientific Manuscript database

    Fusarium species are known for their ability to produce secondary metabolites (SMs), including plant hormones, pigments, mycotoxins, and other compounds with potential agricultural, pharmaceutical, and biotechnological impact. Understanding the distribution of SM biosynthetic gene clusters across th...

  4. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

    PubMed Central

    Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

    2016-01-01

    Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q-EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Conclusions: Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796

  5. Spatial cluster analysis of nanoscopically mapped serotonin receptors for classification of fixed brain tissue

    NASA Astrophysics Data System (ADS)

    Sams, Michael; Silye, Rene; Göhring, Janett; Muresan, Leila; Schilcher, Kurt; Jacak, Jaroslaw

    2014-01-01

    We present a cluster spatial analysis method using nanoscopic dSTORM images to determine changes in protein cluster distributions within brain tissue. Such methods are suitable to investigate human brain tissue and will help to achieve a deeper understanding of brain disease along with aiding drug development. Human brain tissue samples are usually treated postmortem via standard fixation protocols, which are established in clinical laboratories. Therefore, our localization microscopy-based method was adapted to characterize protein density and protein cluster localization in samples fixed using different protocols followed by common fluorescent immunohistochemistry techniques. The localization microscopy allows nanoscopic mapping of serotonin 5-HT1A receptor groups within a two-dimensional image of a brain tissue slice. These nanoscopically mapped proteins can be confined to clusters by applying the proposed statistical spatial analysis. Selected features of such clusters were subsequently used to characterize and classify the tissue. Samples were obtained from different types of patients, fixed with different preparation methods, and finally stored in a human tissue bank. To verify the proposed method, samples of a cryopreserved healthy brain have been compared with epitope-retrieved and paraffin-fixed tissues. Furthermore, samples of healthy brain tissues were compared with data obtained from patients suffering from mental illnesses (e.g., major depressive disorder). Our work demonstrates the applicability of localization microscopy and image analysis methods for comparison and classification of human brain tissues at a nanoscopic level. Furthermore, the presented workflow marks a unique technological advance in the characterization of protein distributions in brain tissue sections.

  6. Spatial distribution and cluster analysis of retail drug shop characteristics and antimalarial behaviors as reported by private medicine retailers in western Kenya: informing future interventions.

    PubMed

    Rusk, Andria; Highfield, Linda; Wilkerson, J Michael; Harrell, Melissa; Obala, Andrew; Amick, Benjamin

    2016-02-19

    Efforts to improve malaria case management in sub-Saharan Africa have shifted focus to private antimalarial retailers to increase access to appropriate treatment. Demands to decrease intervention cost while increasing efficacy requires interventions tailored to geographic regions with demonstrated need. Cluster analysis presents an opportunity to meet this demand, but has not been applied to the retail sector or antimalarial retailer behaviors. This research conducted cluster analysis on medicine retailer behaviors in Kenya, to improve malaria case management and inform future interventions. Ninety-seven surveys were collected from medicine retailers working in the Webuye Health and Demographic Surveillance Site. Survey items included retailer training, education, antimalarial drug knowledge, recommending behavior, sales, and shop characteristics, and were analyzed using Kulldorff's spatial scan statistic. The Bernoulli purely spatial model for binomial data was used, comparing cases to controls. Statistical significance of found clusters was tested with a likelihood ratio test, using the null hypothesis of no clustering, and a p value based on 999 Monte Carlo simulations. The null hypothesis was rejected with p values of 0.05 or less. A statistically significant cluster of fewer than expected pharmacy-trained retailers was found (RR = .09, p = .001) when compared to the expected random distribution. Drug recommending behavior also yielded a statistically significant cluster, with fewer than expected retailers recommending the correct antimalarial medication to adults (RR = .018, p = .01), and fewer than expected shops selling that medication more often than outdated antimalarials when compared to random distribution (RR = 0.23, p = .007). All three of these clusters were co-located, overlapping in the northwest of the study area. Spatial clustering was found in the data. A concerning amount of correlation was found in one specific region in the study area where multiple behaviors converged in space, highlighting a prime target for interventions. These results also demonstrate the utility of applying geospatial methods in the study of medicine retailer behaviors, making the case for expanding this approach to other regions.

  7. Preliminary analysis of one year long space climate simulation

    NASA Astrophysics Data System (ADS)

    Facsko, G.; Honkonen, I. J.; Juusola, L.; Viljanen, A.; Vanhamäki, H.; Janhunen, P.; Palmroth, M.; Milan, S. E.

    2013-12-01

    One full year (155 Cluster orbits, from January 29, 2002 to February 2, 2003) is simulated using the Grand Unified Magnetosphere Ionosphere Coupling simulation (GUMICS) in the European Cluster Assimilation Technology project (ECLAT). This enables us to study the performance of a global magnetospheric model in an unprecedented scale both in terms of the amount of available observations and the length of the timeseries that can be compared. The solar wind for the simulated period, obtained from OMNIWeb, is used as input to GUMICS. We present an overview of various comparisons of GUMICS results to observations for the simulated year. Results along the Cluster reference spacecraft orbit to are compared to Cluster measurements. The Cross Polar Cap Potential (CPCP) results are compared to SuperDARN measurements. The IMAGE electrojet indicators (IU, IL) calculated from the ionospheric currents of GUMICS are compared to observations. Finally, Geomagnetically Induced Currents (GIC) calculated from GUMICS results along the Finnish mineral gas pipeline at Mätsälä are also compared to measurements.

  8. Relation between financial market structure and the real economy: comparison between clustering methods.

    PubMed

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].

  9. Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

    PubMed Central

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T.

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging. PMID:25786703

  10. FAST TRACK COMMUNICATION: Poly(methyl methacrylate)-palladium clusters nanocomposite formation by supersonic cluster beam deposition: a method for microstructured metallization of polymer surfaces

    NASA Astrophysics Data System (ADS)

    Ravagnan, Luca; Divitini, Giorgio; Rebasti, Sara; Marelli, Mattia; Piseri, Paolo; Milani, Paolo

    2009-04-01

    Nanocomposite films were fabricated by supersonic cluster beam deposition (SCBD) of palladium clusters on poly(methyl methacrylate) (PMMA) surfaces. The evolution of the electrical conductance with cluster coverage and microscopy analysis show that Pd clusters are implanted in the polymer and form a continuous layer extending for several tens of nanometres beneath the polymer surface. This allows the deposition, using stencil masks, of cluster-assembled Pd microstructures on PMMA showing a remarkably high adhesion compared with metallic films obtained by thermal evaporation. These results suggest that SCBD is a promising tool for the fabrication of metallic microstructures on flexible polymeric substrates.

  11. Structural and electronic properties Te62+ and Te82+: A DFT study

    NASA Astrophysics Data System (ADS)

    Sharma, Tamanna; Tamboli, Rohit; Kanhere, D. G.; Sharma, Raman

    2018-05-01

    Structural and electronic properties of Tellurium cluster (Ten) and their cations (Ten2+) (n = 6, 8) have been studied theoretically using VASP within generalized gradient approximation. Ground state geometries and higher energy isomers of these clusters have been examined on the basis of total free energy calculations. Lowest energy isomers of neutral clusters are ring like structures whereas the lowest energy isomers of cations are polyhedral cages. HOMO-LUMO gap in cationic clusters is small compared to its neutral clusters. Removal of two electrons from the neutral cluster raises the free energy. Analysis of free energy, HOMO-LUMO gap and density of states (DOS) show that neutral cluster are more stable than their cations.

  12. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.

    PubMed

    Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra

    2016-11-20

    The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Hierarchical clusters of phytoplankton variables in dammed water bodies

    NASA Astrophysics Data System (ADS)

    Silva, Eliana Costa e.; Lopes, Isabel Cristina; Correia, Aldina; Gonçalves, A. Manuela

    2017-06-01

    In this paper a dataset containing biological variables of the water column of several Portuguese reservoirs is analyzed. Hierarchical cluster analysis is used to obtain clusters of phytoplankton variables of the phylum Cyanophyta, with the objective of validating the classification of Portuguese reservoirs previewly presented in [1] which were divided into three clusters: (1) Interior Tagus and Aguieira; (2) Douro; and (3) Other rivers. Now three new clusters of Cyanophyta variables were found. Kruskal-Wallis and Mann-Whitney tests are used to compare the now obtained Cyanophyta clusters and the previous Reservoirs clusters, in order to validate the classification of the water quality of reservoirs. The amount of Cyanophyta algae present in the reservoirs from the three clusters is significantly different, which validates the previous classification.

  14. TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.

    PubMed

    Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun

    2017-12-01

    Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Bayesian network meta-analysis for cluster randomized trials with binary outcomes.

    PubMed

    Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard

    2017-06-01

    Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. Effect of functionalization of boron nitride flakes by main group metal clusters on their optoelectronic properties

    NASA Astrophysics Data System (ADS)

    Chakraborty, Debdutta; Chattaraj, Pratim Kumar

    2017-10-01

    The possibility of functionalizing boron nitride flakes (BNFs) with some selected main group metal clusters, viz. OLi4, NLi5, CLi6, BLI7 and Al12Be, has been analyzed with the aid of density functional theory (DFT) based computations. Thermochemical as well as energetic considerations suggest that all the metal clusters interact with the BNF moiety in a favorable fashion. As a result of functionalization, the static (first) hyperpolarizability (β ) values of the metal cluster supported BNF moieties increase quite significantly as compared to that in the case of pristine BNF. Time dependent DFT analysis reveals that the metal clusters can lower the transition energies associated with the dominant electronic transitions quite significantly thereby enabling the metal cluster supported BNF moieties to exhibit significant non-linear optical activity. Moreover, the studied systems demonstrate broad band absorption capability spanning the UV-visible as well as infra-red domains. Energy decomposition analysis reveals that the electrostatic interactions principally stabilize the metal cluster supported BNF moieties.

  17. Effect of functionalization of boron nitride flakes by main group metal clusters on their optoelectronic properties.

    PubMed

    Chakraborty, Debdutta; Chattaraj, Pratim Kumar

    2017-10-25

    The possibility of functionalizing boron nitride flakes (BNFs) with some selected main group metal clusters, viz. OLi 4 , NLi 5 , CLi 6 , BLI 7 and Al 12 Be, has been analyzed with the aid of density functional theory (DFT) based computations. Thermochemical as well as energetic considerations suggest that all the metal clusters interact with the BNF moiety in a favorable fashion. As a result of functionalization, the static (first) hyperpolarizability ([Formula: see text]) values of the metal cluster supported BNF moieties increase quite significantly as compared to that in the case of pristine BNF. Time dependent DFT analysis reveals that the metal clusters can lower the transition energies associated with the dominant electronic transitions quite significantly thereby enabling the metal cluster supported BNF moieties to exhibit significant non-linear optical activity. Moreover, the studied systems demonstrate broad band absorption capability spanning the UV-visible as well as infra-red domains. Energy decomposition analysis reveals that the electrostatic interactions principally stabilize the metal cluster supported BNF moieties.

  18. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review

    PubMed Central

    Morris, Tom; Gray, Laura

    2017-01-01

    Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637

  19. Analysis of Rainfall and PM2.5 Data Using Clustered Trajectory Analysis for National Park Sites in the Western U.S.

    NASA Astrophysics Data System (ADS)

    Solorzano, N. N.; Hafner, W.; Jaffe, D.

    2005-12-01

    We calculated daily kinematic back-trajectories using the NOAA-HYSPLIT model to analyze 7 years of PM2.5 data from National Park sites in the Western U.S. (Glacier N.P., Mount Rainier N.P., Sequoia N.P., Rocky Mountain N.P. and Denali N.P.) The back-trajectories were clustered using a k-means clustering algorithm to segregate the trajectories into 6 main transport patterns. We calculated trajectory clusters for 1, 5 and 10 days to represent short, medium and long-range flow patterns. Some trajectory types and clusters show marked seasonality. Generally faster flow patterns are more prevalent in winter and slower/stagnant patterns are more prevalent in summer. In addition, we found significant inter-annual variability that may be important for explaining variations in rainfall and/or pollutant concentrations. The 5 and 10-day analyses revealed that, for the 4 non-Alaskan sites, trajectories from Asia tend to be less frequent in the summer, compared to the rest of the year. The clusters of different duration show very different predictive power for rainfall and PM2.5. We found that the 1-day clusters are a better predictor for precipitation and PM2.5 concentrations, as compared to the 5 and 10-day clusters. At each of the sites, there is at least one cluster with an average PM2.5 concentration that is different than the average for the site, indicating distinctive transport patterns. The same is true for 5 and 10-day clusters. Interestingly, only one site, Mount Rainier N.P., shows seasonal differences in PM2.5 concentrations between the clusters that differ from the average.

  20. Diversity and evolution analysis of glycoprotein GP85 from avian leukosis virus subgroup J isolates from chickens of different genetic backgrounds during 1989-2016: Coexistence of five extremely different clusters.

    PubMed

    Wang, Peikun; Lin, Lulu; Li, Haijuan; Yang, Yongli; Huang, Teng; Wei, Ping

    2018-02-01

    ALV-J has caused the most serious losses to the poultry industry in China. The gp85-coding sequence of ALV-J is known to be prone to mutation, but any association between the gp85 gene and breed of chicken remains unclear. A comprehensive and systematic study of the evolutionary process of ALV-J in China is needed. In this study, we compared and analyzed gp85 gene sequences from 198 ALV-J isolates, originating from China, USA, UK and France during 1989-2016. These were sorted into five clusters. Cluster 1, 2, 3, 4 and 5 included isolates from chicken types of different genetic backgrounds, e.g. white-feather broiler, Guangxi indigenous chicken breeds, Yellow chickens and layer chickens respectively. A correlation comparison of amino acid sequence similarities in the gp85 protein among the five clusters showed significant differences (P < 0.01) with the exception being when the third and fifth cluster were compared (P > 0.05). Results of entropy analysis of the gp85 sequences revealed that cluster 3 had the largest variation and cluster 1 had the least variation. The N-glycosylation sites in the majority of isolates numbered 14, 16, 17, 16 and 16, respectively, with regards to clusters 1-5. In addition, 5 isolates from cluster 3 had one more glycosylation site than the other isolates from cluster 3. Our study provides evidence that there were five extremely different ALV-J clusters during 1989-2016 and that the gp85 genes isolated from indigenous chicken breed isolates had the largest variation.

  1. A time-series approach for clustering farms based on slaughterhouse health aberration data.

    PubMed

    Hulsegge, B; de Greef, K H

    2018-05-01

    A large amount of data is collected routinely in meat inspection in pig slaughterhouses. A time series clustering approach is presented and applied that groups farms based on similar statistical characteristics of meat inspection data over time. A three step characteristic-based clustering approach was used from the idea that the data contain more info than the incidence figures. A stratified subset containing 511,645 pigs was derived as a study set from 3.5 years of meat inspection data. The monthly averages of incidence of pleuritis and of pneumonia of 44 Dutch farms (delivering 5149 batches to 2 pig slaughterhouses) were subjected to 1) derivation of farm level data characteristics 2) factor analysis and 3) clustering into groups of farms. The characteristic-based clustering was able to cluster farms for both lung aberrations. Three groups of data characteristics were informative, describing incidence, time pattern and degree of autocorrelation. The consistency of clustering similar farms was confirmed by repetition of the analysis in a larger dataset. The robustness of the clustering was tested on a substantially extended dataset. This confirmed the earlier results, three data distribution aspects make up the majority of distinction between groups of farms and in these groups (clusters) the majority of the farms was allocated comparable to the earlier allocation (75% and 62% for pleuritis and pneumonia, respectively). The difference between pleuritis and pneumonia in their seasonal dependency was confirmed, supporting the biological relevance of the clustering. Comparison of the identified clusters of statistically comparable farms can be used to detect farm level risk factors causing the health aberrations beyond comparison on disease incidence and trend alone. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. COVARIATE-ADAPTIVE CLUSTERING OF EXPOSURES FOR AIR POLLUTION EPIDEMIOLOGY COHORTS*

    PubMed Central

    Keller, Joshua P.; Drton, Mathias; Larson, Timothy; Kaufman, Joel D.; Sandler, Dale P.; Szpiro, Adam A.

    2017-01-01

    Cohort studies in air pollution epidemiology aim to establish associations between health outcomes and air pollution exposures. Statistical analysis of such associations is complicated by the multivariate nature of the pollutant exposure data as well as the spatial misalignment that arises from the fact that exposure data are collected at regulatory monitoring network locations distinct from cohort locations. We present a novel clustering approach for addressing this challenge. Specifically, we present a method that uses geographic covariate information to cluster multi-pollutant observations and predict cluster membership at cohort locations. Our predictive k-means procedure identifies centers using a mixture model and is followed by multi-class spatial prediction. In simulations, we demonstrate that predictive k-means can reduce misclassification error by over 50% compared to ordinary k-means, with minimal loss in cluster representativeness. The improved prediction accuracy results in large gains of 30% or more in power for detecting effect modification by cluster in a simulated health analysis. In an analysis of the NIEHS Sister Study cohort using predictive k-means, we find that the association between systolic blood pressure (SBP) and long-term fine particulate matter (PM2.5) exposure varies significantly between different clusters of PM2.5 component profiles. Our cluster-based analysis shows that for subjects assigned to a cluster located in the Midwestern U.S., a 10 μg/m3 difference in exposure is associated with 4.37 mmHg (95% CI, 2.38, 6.35) higher SBP. PMID:28572869

  3. COGNAT: a web server for comparative analysis of genomic neighborhoods.

    PubMed

    Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y

    2017-11-22

    In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.

  4. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    PubMed

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.

  5. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda†

    PubMed Central

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-01-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882

  6. A Cross-Cultural Comparison of Symptom Reporting and Symptom Clusters in Heart Failure.

    PubMed

    Park, Jumin; Johantgen, Mary E

    2017-07-01

    An understanding of symptoms in heart failure (HF) among different cultural groups has become increasingly important. The purpose of this study was to compare symptom reporting and symptom clusters in HF patients between a Western (the United States) and an Eastern Asian sample (China and Taiwan). A secondary analysis of a cross-sectional observational study was conducted. The data were obtained from a matched HF patient sample from the United States and China/Taiwan ( N = 240 in each). Eight selective items related to HF symptoms from the Minnesota Living with Heart Failure Questionnaire were analyzed. Compared with the U.S. sample, HF patients from China/Taiwan reported a lower level of symptom distress. Analysis of two different regional groups did not result in the same number of clusters using latent class approach: the United States (four classes) and China/Taiwan (three classes). The study demonstrated that symptom reporting and identification of symptom clusters might be influenced by cultural factors.

  7. Unsupervised spike sorting based on discriminative subspace learning.

    PubMed

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2014-01-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.

  8. Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis.

    PubMed

    Gorzalczany, Marian B; Rudzinski, Filip

    2017-06-07

    This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain--during learning--to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network--working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)--to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.

  9. Disease clusters, exact distributions of maxima, and P-values.

    PubMed

    Grimson, R C

    1993-10-01

    This paper presents combinatorial (exact) methods that are useful in the analysis of disease cluster data obtained from small environments, such as buildings and neighbourhoods. Maxwell-Boltzmann and Fermi-Dirac occupancy models are compared in terms of appropriateness of representation of disease incidence patterns (space and/or time) in these environments. The methods are illustrated by a statistical analysis of the incidence pattern of bone fractures in a setting wherein fracture clustering was alleged to be occurring. One of the methodological results derived in this paper is the exact distribution of the maximum cell frequency in occupancy models.

  10. Ortholog-based screening and identification of genes related to intracellular survival.

    PubMed

    Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin

    2018-04-20

    Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.

  11. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    NASA Astrophysics Data System (ADS)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  12. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

    PubMed

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-03-13

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  13. Description and typology of intensive Chios dairy sheep farms in Greece.

    PubMed

    Gelasakis, A I; Valergakis, G E; Arsenos, G; Banos, G

    2012-06-01

    The aim was to assess the intensified dairy sheep farming systems of the Chios breed in Greece, establishing a typology that may properly describe and characterize them. The study included the total of the 66 farms of the Chios sheep breeders' cooperative Macedonia. Data were collected using a structured direct questionnaire for in-depth interviews, including questions properly selected to obtain a general description of farm characteristics and overall management practices. A multivariate statistical analysis was used on the data to obtain the most appropriate typology. Initially, principal component analysis was used to produce uncorrelated variables (principal components), which would be used for the consecutive cluster analysis. The number of clusters was decided using hierarchical cluster analysis, whereas, the farms were allocated in 4 clusters using k-means cluster analysis. The identified clusters were described and afterward compared using one-way ANOVA or a chi-squared test. The main differences were evident on land availability and use, facility and equipment availability and type, expansion rates, and application of preventive flock health programs. In general, cluster 1 included newly established, intensive, well-equipped, specialized farms and cluster 2 included well-established farms with balanced sheep and feed/crop production. In cluster 3 were assigned small flock farms focusing more on arable crops than on sheep farming with a tendency to evolve toward cluster 2, whereas cluster 4 included farms representing a rather conservative form of Chios sheep breeding with low/intermediate inputs and choosing not to focus on feed/crop production. In the studied set of farms, 4 different farmer attitudes were evident: 1) farming disrupts sheep breeding; feed should be purchased and economies of scale will decrease costs (mainly cluster 1), 2) only exercise/pasture land is necessary; at least part of the feed (pasture) must be home-grown to decrease costs (clusters 1 and 4), 3) providing pasture to sheep is essential; on-farm feed production decreases costs (mainly cluster 3), and 4) large-scale farming (feed production and cash crops) does not disrupt sheep breeding; all feed must be produced on-farm to decrease costs (mainly cluster 3). Conducting a profitability analysis among different clusters, exploring and discovering the most beneficial levels of intensified management and capital investment should now be considered. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  14. Personalized Medicine in Veterans with Traumatic Brain Injuries

    DTIC Science & Technology

    2013-05-01

    Pair-Group Method using Arithmetic averages ( UPGMA ) based on cosine correlation of row mean centered log2 signal values; this was the top 50%-tile...cluster- ing was performed by the UPGMA method using Cosine correlation as the similarity metric. For comparative purposes, clustered heat maps included...non-mTBI cases were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as the similarity

  15. Comparative Genomic Hybridization Analysis of Two Predominant Nordic Group I (Proteolytic) Clostridium botulinum Type B Clusters▿ †

    PubMed Central

    Lindström, Miia; Hinderink, Katja; Somervuo, Panu; Kiviniemi, Katri; Nevas, Mari; Chen, Ying; Auvinen, Petri; Carter, Andrew T.; Mason, David R.; Peck, Michael W.; Korkeala, Hannu

    2009-01-01

    Comparative genomic hybridization analysis of 32 Nordic group I Clostridium botulinum type B strains isolated from various sources revealed two homogeneous clusters, clusters BI and BII. The type B strains differed from reference strain ATCC 3502 by 413 coding sequence (CDS) probes, sharing 88% of all the ATCC 3502 genes represented on the microarray. The two Nordic type B clusters differed from each other by their response to 145 CDS probes related mainly to transport and binding, adaptive mechanisms, fatty acid biosynthesis, the cell membranes, bacteriophages, and transposon-related elements. The most prominent differences between the two clusters were related to resistance to toxic compounds frequently found in the environment, such as arsenic and cadmium, reflecting different adaptive responses in the evolution of the two clusters. Other relatively variable CDS groups were related to surface structures and the gram-positive cell wall, suggesting that the two clusters possess different antigenic properties. All the type B strains carried CDSs putatively related to capsule formation, which may play a role in adaptation to different environmental and clinical niches. Sequencing showed that representative strains of the two type B clusters both carried subtype B2 neurotoxin genes. As many of the type B strains studied have been isolated from foods or associated with botulism, it is expected that the two group I C. botulinum type B clusters present a public health hazard in Nordic countries. Knowing the genetic and physiological markers of these clusters will assist in targeting control measures against these pathogens. PMID:19270141

  16. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayedmore » embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  17. Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma.

    PubMed

    Schatz, Michael; Hsu, Jin-Wen Y; Zeiger, Robert S; Chen, Wansu; Dorenbaum, Alejandro; Chipps, Bradley E; Haselkorn, Tmirah

    2014-06-01

    Asthma phenotyping can facilitate understanding of disease pathogenesis and potential targeted therapies. To further characterize the distinguishing features of phenotypic groups in difficult-to-treat asthma. Children ages 6-11 years (n = 518) and adolescents and adults ages ≥12 years (n = 3612) with severe or difficult-to-treat asthma from The Epidemiology and Natural History of Asthma: Outcomes and Treatment Regimens (TENOR) study were evaluated in this post hoc cluster analysis. Analyzed variables included sex, race, atopy, age of asthma onset, smoking (adolescents and adults), passive smoke exposure (children), obesity, and aspirin sensitivity. Cluster analysis used the hierarchical clustering algorithm with the Ward minimum variance method. The results were compared among clusters by χ(2) analysis; variables with significant (P < .05) differences among clusters were considered as distinguishing feature candidates. Associations among clusters and asthma-related health outcomes were assessed in multivariable analyses by adjusting for socioeconomic status, environmental exposures, and intensity of therapy. Five clusters were identified in each age stratum. Sex, atopic status, and nonwhite race were distinguishing variables in both strata; passive smoke exposure was distinguishing in children and aspirin sensitivity in adolescents and adults. Clusters were not related to outcomes in children, but 2 adult and adolescent clusters distinguished by nonwhite race and aspirin sensitivity manifested poorer quality of life (P < .0001), and the aspirin-sensitive cluster experienced more frequent asthma exacerbations (P < .0001). Distinct phenotypes appear to exist in patients with severe or difficult-to-treat asthma, which is related to outcomes in adolescents and adults but not in children. The study of the therapeutic implications of these phenotypes is warranted. Copyright © 2013 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights reserved.

  18. Behavioral Health Risk Profiles of Undergraduate University Students in England, Wales, and Northern Ireland: A Cluster Analysis.

    PubMed

    El Ansari, Walid; Ssewanyana, Derrick; Stock, Christiane

    2018-01-01

    Limited research has explored clustering of lifestyle behavioral risk factors (BRFs) among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance. We assessed (BRFs), namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities. A two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious) with very high health awareness/consciousness, good nutrition, and physical activity (PA), and relatively low alcohol, tobacco, and other drug (ATOD) use. Cluster 2 (the abstinent) had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious) included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking) showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1), students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4) reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students. Our results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.

  19. Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.

    PubMed

    Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan

    2017-01-01

    Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.

  20. Clustering of Health Behaviors and Cardiorespiratory Fitness Among U.S. Adolescents.

    PubMed

    Hartz, Jacob; Yingling, Leah; Ayers, Colby; Adu-Brimpong, Joel; Rivers, Joshua; Ahuja, Chaarushi; Powell-Wiley, Tiffany M

    2018-05-01

    Decreased cardiorespiratory fitness (CRF) is associated with an increased risk of cardiovascular disease. However, little is known how the interaction of diet, physical activity (PA), and sedentary time (ST) affects CRF among adolescents. By using a nationally representative sample of U.S. adolescents, we used cluster analysis to investigate the interactions of these behaviors with CRF. We hypothesized that distinct clustering patterns exist and that less healthy clusters are associated with lower CRF. We used 2003-2004 National Health and Nutrition Examination Survey data for persons aged 12-19 years (N = 1,225). PA and ST were measured objectively by an accelerometer, and the American Heart Association Healthy Diet Score quantified diet quality. Maximal oxygen consumption (V˙O 2 ​max) was measured by submaximal treadmill exercise test. We performed cluster analysis to identify sex-specific clustering of diet, PA, and ST. Adjusting for accelerometer wear time, age, body mass index, race/ethnicity, and the poverty-to-income ratio, we performed sex-stratified linear regression analysis to evaluate the association of cluster with V˙O 2 ​max. Three clusters were identified for girls and boys. For girls, there was no difference across clusters for age (p = .1), weight (p = .3), and BMI (p = .5), and no relationship between clusters and V˙O 2 ​max. For boys, the youngest cluster (p < .01) had three healthy behaviors, weighed less, and was associated with a higher V˙O 2 ​max compared with the two older clusters. We observed clustering of diet, PA, and ST in U.S. adolescents. Specific patterns were associated with lower V˙O 2 ​max for boys, suggesting that our clusters may help identify adolescent boys most in need of interventions. Published by Elsevier Inc.

  1. A novel symptom cluster analysis among ambulatory HIV/AIDS patients in Uganda.

    PubMed

    Namisango, Eve; Harding, Richard; Katabira, Elly T; Siegert, Richard J; Powell, Richard A; Atuhaire, Leonard; Moens, Katrien; Taylor, Steve

    2015-01-01

    Symptom clusters are gaining importance given HIV/AIDS patients experience multiple, concurrent symptoms. This study aimed to: determine clusters of patients with similar symptom combinations; describe symptom combinations distinguishing the clusters; and evaluate the clusters regarding patient socio-demographic, disease and treatment characteristics, quality of life (QOL) and functional performance. This was a cross-sectional study of 302 adult HIV/AIDS outpatients consecutively recruited at two teaching and referral hospitals in Uganda. Socio-demographic and seven-day period symptom prevalence and distress data were self-reported using the Memorial Symptom Assessment Schedule. QOL was assessed using the Medical Outcome Scale and functional performance using the Karnofsky Performance Scale. Symptom clusters were established using hierarchical cluster analysis with squared Euclidean distances using Ward's clustering methods based on symptom occurrence. Analysis of variance compared clusters on mean QOL and functional performance scores. Patient subgroups were categorised based on symptom occurrence rates. Five symptom occurrence clusters were identified: Cluster 1 (n=107), high-low for sensory discomfort and eating difficulties symptoms; Cluster 2 (n=47), high-low for psycho-gastrointestinal symptoms; Cluster 3 (n=71), high for pain and sensory disturbance symptoms; Cluster 4 (n=35), all high for general HIV/AIDS symptoms; and Cluster 5 (n=48), all low for mood-cognitive symptoms. The all high occurrence cluster was associated with worst functional status, poorest QOL scores and highest symptom-associated distress. Use of antiretroviral therapy was associated with all high symptom occurrence rate (Fisher's exact=4, P<0.001). CD4 count group below 200 was associated with the all high occurrence rate symptom cluster (Fisher's exact=41, P<0.001). Symptom clusters have a differential, affect HIV/AIDS patients' self-reported outcomes, with the subgroup experiencing high-symptom occurrence rates having a higher risk of poorer outcomes. Identification of symptom clusters could provide insights into commonly co-occurring symptoms that should be jointly targeted for management in patients with multiple complaints.

  2. Differences in Pedaling Technique in Cycling: A Cluster Analysis.

    PubMed

    Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A

    2016-10-01

    To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (PO VT2 ) compared with cycling at their maximal power output (PO MAX ). Twenty athletes performed an incremental cycling test to determine their power output (PO MAX and PO VT2 ; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in PO MAX and PO VT2 . Athletes were assigned to 2 clusters based on the behavior of outcome variables at PO VT2 and PO MAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at PO VT2 vs PO MAX , cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.

  3. Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.

    PubMed

    Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D; Ober, Carole; Nicolae, Dan L; Barnes, Kathleen C; London, Stephanie J; Gilliland, Frank; Weiss, Scott T; Raby, Benjamin A; Cohn, Lauren; Chupp, Geoffrey L

    2015-05-15

    The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10(-6)) and hospitalization (P = 0.01), respectively. There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma.

  4. A stellar census in globular clusters with MUSE: The contribution of rotation to cluster dynamics studied with 200 000 stars

    NASA Astrophysics Data System (ADS)

    Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.

    2018-02-01

    This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.

  5. Noninvasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma

    PubMed Central

    Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F.; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D.; Ober, Carole; Nicolae, Dan L.; Barnes, Kathleen C.; London, Stephanie J.; Gilliland, Frank; Weiss, Scott T.; Raby, Benjamin A.; Cohn, Lauren

    2015-01-01

    Rationale: The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. Objectives: We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Methods: Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Measurements and Main Results: Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10−6) and hospitalization (P = 0.01), respectively. Conclusions: There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma. PMID:25763605

  6. Characterization of bafilomycin biosynthesis in Kitasatospora setae KM-6054 and comparative analysis of gene clusters in Actinomycetales microorganisms.

    PubMed

    Nara, Ayako; Hashimoto, Takuya; Komatsu, Mamoru; Nishiyama, Makoto; Kuzuyama, Tomohisa; Ikeda, Haruo

    2017-05-01

    Bafilomycins A 1 , C 1 and B 1 (setamycin) produced by Kitasatospora setae KM-6054 belong to the plecomacrolide family, which exhibit antibacterial, antifungal, antineoplastic and immunosuppressive activities. An analysis of gene clusters from K. setae KM-6054 governing the biosynthesis of bafilomycins revealed that it contains five large open reading frames (ORFs) encoding the multifunctional polypeptides of bafilomycin polyketide synthases (PKSs). These clustered PKS genes, which are responsible for bafilomycin biosynthesis, together encode 11 homologous sets of enzyme activities, each catalyzing a specific round of polyketide chain elongation. The region contains an additional 13 ORFs spanning a distance of 73 287 bp, some of which encode polypeptides governing other key steps in bafilomycin biosynthesis. Five ORFs, BfmB, BfmC, BfmD, BfmE and BfmF, were involved in the formation of methoxymalonyl-acyl carrier protein (ACP). Two possible regulatory genes, bfmR and bfmH, were found downstream of the above genes. A gene-knockout analysis revealed that BfmR was only a transcriptional regulator for the transcription of bafilomycin biosynthetic genes. Two genes, bfmI and bfmJ, were found downstream of bfmH. An analysis of these gene-disruption mutants in addition to an enzymatic analysis of BfmI and BfmJ revealed that BfmJ activated fumarate and BfmI functioned as a catalyst to form a fumaryl ester at the C21 hydroxyl residue of bafilomycin A 1 . A comparative analysis of bafilomycin gene clusters in K. setae KM-6054, Streptomyces lohii JCM 14114 and Streptomyces griseus DSM 2608 revealed that each ORF of both gene clusters in two Streptomyces strains were quite similar to each other. However, each ORF of gene cluster in K. setae KM-6054 was of lower similarity to that of corresponding ORF in the two Streptomyces species.

  7. Improved Ant Colony Clustering Algorithm and Its Performance Study

    PubMed Central

    Gao, Wei

    2016-01-01

    Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533

  8. Comparative study of cluster Ag17Cu2 by instantaneous normal mode analysis and by isothermal Brownian-type molecular dynamics simulation.

    PubMed

    Tang, Ping-Han; Wu, Ten-Ming; Yen, Tsung-Wen; Lai, S K; Hsu, P J

    2011-09-07

    We perform isothermal Brownian-type molecular dynamics simulations to obtain the velocity autocorrelation function and its time Fourier-transformed power spectral density for the metallic cluster Ag(17)Cu(2). The temperature dependences of these dynamical quantities from T = 0 to 1500 K were examined and across this temperature range the cluster melting temperature T(m), which we define to be the principal maximum position of the specific heat is determined. The instantaneous normal mode analysis is then used to dissect the cluster dynamics by calculating the vibrational instantaneous normal mode density of states and hence its frequency integrated value I(j) which is an ensemble average of all vibrational projection operators for the jth atom in the cluster. In addition to comparing the results with simulation data, we look more closely at the entities I(j) of all atoms using the point group symmetry and diagnose their temperature variations. We find that I(j) exhibit features that may be used to deduce T(m), which turns out to agree very well with those inferred from the power spectral density and specific heat. © 2011 American Institute of Physics

  9. Cost/Performance Ratio Achieved by Using a Commodity-Based Cluster

    NASA Technical Reports Server (NTRS)

    Lopez, Isaac

    2001-01-01

    Researchers at the NASA Glenn Research Center acquired a commodity cluster based on Intel Corporation processors to compare its performance with a traditional UNIX cluster in the execution of aeropropulsion applications. Since the cost differential of the clusters was significant, a cost/performance ratio was calculated. After executing a propulsion application on both clusters, the researchers demonstrated a 9.4 cost/performance ratio in favor of the Intel-based cluster. These researchers utilize the Aeroshark cluster as one of the primary testbeds for developing NPSS parallel application codes and system software. The Aero-shark cluster provides 64 Intel Pentium II 400-MHz processors, housed in 32 nodes. Recently, APNASA - a code developed by a Government/industry team for the design and analysis of turbomachinery systems was used for a simulation on Glenn's Aeroshark cluster.

  10. Segmenting Student Markets with a Student Satisfaction and Priorities Survey.

    ERIC Educational Resources Information Center

    Borden, Victor M. H.

    1995-01-01

    A market segmentation analysis of 872 university students compared 2 hierarchical clustering procedures for deriving market segments: 1 using matching-type measures and an agglomerative clustering algorithm, and 1 using the chi-square based automatic interaction detection. Results and implications for planning, evaluating, and improving academic…

  11. Is antibody clustering predictive of clinical subsets and damage in systemic lupus erythematosus?

    PubMed

    To, C H; Petri, M

    2005-12-01

    To examine autoantibody clusters and their associations with clinical features and organ damage accrual in patients with systemic lupus erythematosus (SLE). The study group comprised 1,357 consecutive patients with SLE who were recruited to participate in a prospective longitudinal cohort study. In the cohort, 92.6% of the patients were women, the mean +/- SD age of the patients was 41.3 +/- 12.7 years, 55.9% were Caucasian, 39.1% were African American, and 5% were Asian. Seven autoantibodies (anti-double-stranded DNA [anti-dsDNA], anti-Sm, anti-Ro, anti-La, anti-RNP, lupus anticoagulant (LAC), and anticardiolipin antibody [aCL]) were selected for cluster analysis using the K-means cluster analysis procedure. Three distinct autoantibody clusters were identified: cluster 1 (anti-Sm and anti-RNP), cluster 2 (anti-dsDNA, anti-Ro, and anti-La), and cluster 3 (anti-dsDNA, LAC, and aCL). Patients in cluster 1 (n = 451), when compared with patients in clusters 2 (n = 470) and 3 (n = 436), had the lowest incidence of proteinuria (39.7%), anemia (52.8%), lymphopenia (33.9%), and thrombocytopenia (13.7%). The incidence of nephrotic syndrome and leukopenia was also lower in cluster 1 than in cluster 2. Cluster 2 had the highest female-to-male ratio (22:1) and the greatest proportion of Asian patients. Among the 3 clusters, cluster 2 had significantly more patients presenting with secondary Sjögren's syndrome (15.7%). Cluster 3, when compared with the other 2 clusters, consisted of more Caucasian and fewer African American patients and was characterized by the highest incidence of arterial thrombosis (17.4%), venous thrombosis (25.7%), and livedo reticularis (31.4%). By using the Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index, the greatest frequency of nephrotic syndrome (8.9%) was observed in patients in cluster 2, whereas cluster 3 patients had the highest percentage of damage due to cerebrovascular accident (12.8%) and venous thrombosis (7.8%). Osteoporotic fracture (11.9%) was also more common in cluster 3 than in cluster 2. Autoantibody clustering is a valuable tool to differentiate between various subsets of SLE, allowing prediction of subsequent clinical course and organ damage.

  12. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species.

    PubMed

    Wang, Yi; Coleman-Derr, Devin; Chen, Guoping; Gu, Yong Q

    2015-07-01

    Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Generating a Magellanic star cluster catalog with ASteCA

    NASA Astrophysics Data System (ADS)

    Perren, G. I.; Piatti, A. E.; Vázquez, R. A.

    2016-08-01

    An increasing number of software tools have been employed in the recent years for the automated or semi-automated processing of astronomical data. The main advantages of using these tools over a standard by-eye analysis include: speed (particularly for large databases), homogeneity, reproducibility, and precision. At the same time, they enable a statistically correct study of the uncertainties associated with the analysis, in contrast with manually set errors, or the still widespread practice of simply not assigning errors. We present a catalog comprising 210 star clusters located in the Large and Small Magellanic Clouds, observed with Washington photometry. Their fundamental parameters were estimated through an homogeneous, automatized and completely unassisted process, via the Automated Stellar Cluster Analysis package ( ASteCA). Our results are compared with two types of studies on these clusters: one where the photometry is the same, and another where the photometric system is different than that employed by ASteCA.

  14. Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

    PubMed

    Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

    2012-01-01

    Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.

  15. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    NASA Astrophysics Data System (ADS)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  16. Analysis of infant isolates of Bifidobacterium breve by comparative genome hybridization indicates the existence of new subspecies with marked infant specificity.

    PubMed

    Boesten, Rolf; Schuren, Frank; Wind, Richèle D; Knol, Jan; de Vos, Willem M

    2011-09-01

    A total of 20 Bifidobacterium strains were isolated from fecal samples of 4 breast- and bottle-fed infants and all were characterized as Bifidobacterium breve based on 16S rRNA gene sequence and metabolic analysis. These isolates were further characterized and compared to the type strains of B. breve and 7 other Bifidobacterium spp. by comparative genome hybridization. For this purpose, we constructed and used a DNA-based microarray containing over 2000 randomly cloned DNA fragments from B. breve type strain LMG13208. This molecular analysis revealed a high degree of genomic variation between the isolated strains and allowed the vast majority to be grouped into 4 clusters. One cluster contained a single isolate that was virtually indistinguishable from the B. breve type strain. The 3 other clusters included 19 B. breve strains that differed considerably from all type strains. Remarkably, each of the 4 clusters included strains that were isolated from a single infant, indicating that a niche adaptation may contribute to variation within the B. breve species. Based on genomic hybridization data, the new B. breve isolates were estimated to contain approximately 60-90% of the genes of the B. breve type strain, attesting to the existence of various subspecies within the species B. breve. Further bioinformatic analysis identified several hundred diagnostic clones specific to the genomic clustering of the B. breve isolates. Molecular analysis of representatives of these revealed that annotated genes from the conserved B. breve core encoded mainly housekeeping functions, while the strain-specific genes were predicted to code for functions related to life style, such as carbohydrate metabolism and transport. This is compatible with genetic adaptation of the strains to their niche, a combination of infants and diet. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  17. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    PubMed

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  18. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches

    PubMed Central

    Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683

  19. Development of small scale cluster computer for numerical analysis

    NASA Astrophysics Data System (ADS)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  20. Variability in body size and shape of UK offshore workers: A cluster analysis approach.

    PubMed

    Stewart, Arthur; Ledingham, Robert; Williams, Hector

    2017-01-01

    Male UK offshore workers have enlarged dimensions compared with UK norms and knowledge of specific sizes and shapes typifying their physiques will assist a range of functions related to health and ergonomics. A representative sample of the UK offshore workforce (n = 588) underwent 3D photonic scanning, from which 19 extracted dimensional measures were used in k-means cluster analysis to characterise physique groups. Of the 11 resulting clusters four somatotype groups were expressed: one cluster was muscular and lean, four had greater muscularity than adiposity, three had equal adiposity and muscularity and three had greater adiposity than muscularity. Some clusters appeared constitutionally similar to others, differing only in absolute size. These cluster centroids represent an evidence-base for future designs in apparel and other applications where body size and proportions affect functional performance. They also constitute phenotypic evidence providing insight into the 'offshore culture' which may underpin the enlarged dimensions of offshore workers. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Evolution of coding and non-coding genes in HOX clusters of a marsupial.

    PubMed

    Yu, Hongshi; Lindsay, James; Feng, Zhi-Ping; Frankenberg, Stephen; Hu, Yanqiu; Carone, Dawn; Shaw, Geoff; Pask, Andrew J; O'Neill, Rachel; Papenfuss, Anthony T; Renfree, Marilyn B

    2012-06-18

    The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.

  2. Evolution of coding and non-coding genes in HOX clusters of a marsupial

    PubMed Central

    2012-01-01

    Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial. PMID:22708672

  3. Advanced analysis of forest fire clustering

    NASA Astrophysics Data System (ADS)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index. Pattern Recognition, 48, 4070-4081.

  4. A comparison of IQ and memory cluster solutions in moderate and severe pediatric traumatic brain injury.

    PubMed

    Thaler, Nicholas S; Terranova, Jennifer; Turner, Alisa; Mayfield, Joan; Allen, Daniel N

    2015-01-01

    Recent studies have examined heterogeneous neuropsychological outcomes in childhood traumatic brain injury (TBI) using cluster analysis. These studies have identified homogeneous subgroups based on tests of IQ, memory, and other cognitive abilities that show some degree of association with specific cognitive, emotional, and behavioral outcomes, and have demonstrated that the clusters derived for children with TBI are different from those observed in normal populations. However, the extent to which these subgroups are stable across abilities has not been examined, and this has significant implications for the generalizability and clinical utility of TBI clusters. The current study addressed this by comparing IQ and memory profiles of 137 children who sustained moderate-to-severe TBI. Cluster analysis of IQ and memory scores indicated that a four-cluster solution was optimal for the IQ scores and a five-cluster solution was optimal for the memory scores. Three clusters on each battery differed primarily by level of performance, while the others had pattern variations. Cross-plotting the clusters across respective IQ and memory test scores indicated that clusters defined by level were generally stable, while clusters defined by pattern differed. Notably, children with slower processing speed exhibited low-average to below-average performance on memory indexes. These results provide some support for the stability of previously identified memory and IQ clusters and provide information about the relationship between IQ and memory in children with TBI.

  5. Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.

    PubMed

    Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J

    2017-12-01

    Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the limitations, trade-offs, and considerations for the sensitivities of variables and the biological interpretations of results. The Cluster app is available as a free download for Apple computers at iTunes, with a link to a user guide website.

  6. Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques

    NASA Astrophysics Data System (ADS)

    Gulgundi, Mohammad Shahid; Shetty, Amba

    2018-03-01

    Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.

  7. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    PubMed

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-05

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (P<0.001), independent of age, height, weight, and running speed. When these two groups were compared to PFP runners, one cluster exhibited greater while the other exhibited reduced peak knee abduction angles (P<0.05). The variability observed in running patterns across this sample could be the result of different gait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Does objective cluster analysis serve as a useful precursor to seasonal precipitation prediction at local scale? Application to western Ethiopia

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Moges, Semu; Block, Paul

    2018-01-01

    Prediction of seasonal precipitation can provide actionable information to guide management of various sectoral activities. For instance, it is often translated into hydrological forecasts for better water resources management. However, many studies assume homogeneity in precipitation across an entire study region, which may prove ineffective for operational and local-level decisions, particularly for locations with high spatial variability. This study proposes advancing local-level seasonal precipitation predictions by first conditioning on regional-level predictions, as defined through objective cluster analysis, for western Ethiopia. To our knowledge, this is the first study predicting seasonal precipitation at high resolution in this region, where lives and livelihoods are vulnerable to precipitation variability given the high reliance on rain-fed agriculture and limited water resources infrastructure. The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances in prediction skill and resolution, as compared with previous studies. The statistical model improves versus the non-clustered case or dynamical models for a number of specific clusters in northwestern Ethiopia, with clusters having regional average correlation and ranked probability skill score (RPSS) values of up to 0.5 and 33 %, respectively. The general skill (after bias correction) of the two best-performing dynamical models over the entire study region is superior to that of the statistical models, although the dynamical models issue predictions at a lower resolution and the raw predictions require bias correction to guarantee comparable skills.

  9. Manifold Learning in MR spectroscopy using nonlinear dimensionality reduction and unsupervised clustering.

    PubMed

    Yang, Guang; Raschke, Felix; Barrick, Thomas R; Howe, Franklyn A

    2015-09-01

    To investigate whether nonlinear dimensionality reduction improves unsupervised classification of (1) H MRS brain tumor data compared with a linear method. In vivo single-voxel (1) H magnetic resonance spectroscopy (55 patients) and (1) H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With (1) H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. The LE method is promising for unsupervised clustering to separate brain and tumor tissue with automated color-coding for visualization of (1) H MRSI data after cluster analysis. © 2014 Wiley Periodicals, Inc.

  10. Identifying clusters of falls-related hospital admissions to inform population targets for prioritising falls prevention programmes

    PubMed Central

    Finch, Caroline F; Stephan, Karen; Shee, Anna Wong; Hill, Keith; Haines, Terry P; Clemson, Lindy; Day, Lesley

    2015-01-01

    Background There has been limited research investigating the relationship between injurious falls and hospital resource use. The aims of this study were to identify clusters of community-dwelling older people in the general population who are at increased risk of being admitted to hospital following a fall and how those clusters differed in their use of hospital resources. Methods Analysis of routinely collected hospital admissions data relating to 45 374 fall-related admissions in Victorian community-dwelling older adults aged ≥65 years that occurred during 2008/2009 to 2010/2011. Fall-related admission episodes were identified based on being admitted from a private residence to hospital with a principal diagnosis of injury (International Classification of Diseases (ICD)-10-AM codes S00 to T75) and having a first external cause of a fall (ICD-10-AM codes W00 to W19). A cluster analysis was performed to identify homogeneous groups using demographic details of patients and information on the presence of comorbidities. Hospital length of stay (LOS) was compared across clusters using competing risks regression. Results Clusters based on area of residence, demographic factors (age, gender, marital status, country of birth) and the presence of comorbidities were identified. Clusters representing hospitalised fallers with comorbidities were associated with longer LOS compared with other cluster groups. Clusters delineated by demographic factors were also associated with increased LOS. Conclusions All patients with comorbidity, and older women without comorbidities, stay in hospital longer following a fall and hence consume a disproportionate share of hospital resources. These findings have important implications for the targeting of falls prevention interventions for community-dwelling older people. PMID:25618735

  11. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. Results We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene,more » and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Conclusions Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  12. Linear clusters of galaxies - A999 and A1016

    NASA Astrophysics Data System (ADS)

    Chapman, G. N. F.; Geller, M. J.; Huchra, J. P.

    1987-09-01

    The authors have measured 44 new redshifts in A 999 and 40 in A 1016: these clusters are both "linear" according to Rood and Sastry (1971) and Struble and Rood (1982, 1984). With 20 cluster members in A 999 and 22 in A 1016, the authors can estimate the probability that these clusters are actually drawn from spherically symmetric distributions. By comparing the clusters with Monte Carlo King models, they find that A 999 is probably intrinsically spherically symmetric, but A 1016 is probably linear. The authors estimate that ⪆2% of a catalog of spherically symmetric clusters might be erroneously classified as linear. They use the data to estimate the virial masses for these systems. The authors reassess the cluster-galaxy alignment analysis of Adams, Strom, and Strom (1980) and examine the relationship between the luminosity and morphological type of the cluster members and the cluster itself.

  13. Study on transport infrastructure as mechanism of long-term urban planning strategies

    NASA Astrophysics Data System (ADS)

    Popova, Olga; Martynov, Kirill; Khusnutdinov, Rinat

    2017-10-01

    In this article, the authors carry out the research of the transport infrastructure. The authors have developed an algorithm for quality assessment of transport networks and connectivity of urban development areas. The results of the research are presented on the example of several central city quarters of Arkhangelsk city. The analysis was carried out by clustering objects (separate quarters of the Arkhangelsk city) using of SOM in comparable groups with a high level of similarity of characteristics inside each group. The result of clustering was 5 clusters with different levels of transport infrastructure. The novelty of the study is to justification for advantages of applying structural analysis for qualitative ranking of areas. The advantage of the proposed methodology is that it gives the opportunity both to compare the transport infrastructure quality of different city quarters and to determine the strategy for its development with a list of specific activities.

  14. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    PubMed

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  15. A segmentation/clustering model for the analysis of array CGH data.

    PubMed

    Picard, F; Robin, S; Lebarbier, E; Daudin, J-J

    2007-09-01

    Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.

  16. Novel approach to characterising individuals with low back-related leg pain: cluster identification with latent class analysis and 12-month follow-up.

    PubMed

    Stynes, Siobhán; Konstantinou, Kika; Ogollah, Reuben; Hay, Elaine M; Dunn, Kate M

    2018-04-01

    Traditionally, low back-related leg pain (LBLP) is diagnosed clinically as referred leg pain or sciatica (nerve root involvement). However, within the spectrum of LBLP, we hypothesised that there may be other unrecognised patient subgroups. This study aimed to identify clusters of patients with LBLP using latent class analysis and describe their clinical course. The study population was 609 LBLP primary care consulters. Variables from clinical assessment were included in the latent class analysis. Characteristics of the statistically identified clusters were compared, and their clinical course over 1 year was described. A 5 cluster solution was optimal. Cluster 1 (n = 104) had mild leg pain severity and was considered to represent a referred leg pain group with no clinical signs, suggesting nerve root involvement (sciatica). Cluster 2 (n = 122), cluster 3 (n = 188), and cluster 4 (n = 69) had mild, moderate, and severe pain and disability, respectively, and response to clinical assessment items suggested categories of mild, moderate, and severe sciatica. Cluster 5 (n = 126) had high pain and disability, longer pain duration, and more comorbidities and was difficult to map to a clinical diagnosis. Most improvement for pain and disability was seen in the first 4 months for all clusters. At 12 months, the proportion of patients reporting recovery ranged from 27% for cluster 5 to 45% for cluster 2 (mild sciatica). This is the first study that empirically shows the variability in profile and clinical course of patients with LBLP including sciatica. More homogenous groups were identified, which could be considered in future clinical and research settings.

  17. Type 2 diabetes mellitus: distribution of genetic markers in Kazakh population.

    PubMed

    Sikhayeva, Nurgul; Talzhanov, Yerkebulan; Iskakova, Aisha; Dzharmukhanov, Jarkyn; Nugmanova, Raushan; Zholdybaeva, Elena; Ramanculov, Erlan

    2018-01-01

    Ethnic differences exist in the frequencies of genetic variations that contribute to the risk of common disease. This study aimed to analyse the distribution of several genes, previously associated with susceptibility to type 2 diabetes and obesity-related phenotypes, in a Kazakh population. A total of 966 individuals belonging to the Kazakh ethnicity were recruited from an outpatient clinic. We genotyped 41 common single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes in other ethnic groups and 31 of these were in Hardy-Weinberg equilibrium. The obtained allele frequencies were further compared to publicly available data from other ethnic populations. Allele frequencies for other (compared) populations were pooled from the haplotype map (HapMap) database. Principal component analysis (PCA), cluster analysis, and multidimensional scaling (MDS) were used for the analysis of genetic relationship between the populations. Comparative analysis of allele frequencies of the studied SNPs showed significant differentiation among the studied populations. The Kazakh population was grouped with Asian populations according to the cluster analysis and with the Caucasian populations according to PCA. According to MDS, results of the current study show that the Kazakh population holds an intermediate position between Caucasian and Asian populations. A high percentage of population differentiation was observed between Kazakh and world populations. The Kazakh population was clustered with Caucasian populations, and this result may indicate a significant Caucasian component in the Kazakh gene pool.

  18. Water clustering in glassy polymers.

    PubMed

    Davis, Eric M; Elabd, Yossef A

    2013-09-12

    In this study, water solubility and water clustering in several glassy polymers, including poly(methyl methacrylate) (PMMA), poly(styrene) (PS), and poly(vinylpyrrolidone) (PVP), were measured using both quartz spring microbalance (QSM) and Fourier transform infrared-attenuated total reflectance (FTIR-ATR) spectroscopy. Specifically, QSM was used to determine water solubility, while FTIR-ATR spectroscopy provided a direct, molecular-level measurement of water clustering. The Flory-Huggins theory was employed to obtain a measure of water-polymer interaction and water solubility, through both prediction and regression, where the theory failed to predict water solubility in both PMMA and PVP. Furthermore, a comparison of water clustering between direct FTIR-ATR spectroscopy measurements and predictions from the Zimm-Lundberg clustering analysis produced contradictory results. The failure of the Flory-Huggins theory and Zimm-Lundberg clustering analysis to describe water solubility and water clustering, respectively, in these glassy polymers is in part due to the equilibrium constraints under which these models are derived in contrast to the nonequilibrium state of glassy polymers. Additionally, FTIR-ATR spectroscopy results were compared to temperature-dependent diffusivity data, where a correlation between the activation energy for diffusion and the measured water clustering was observed.

  19. Cluster Masses Derived from X-ray and Sunyaev-Zeldovich Effect Measurements

    NASA Technical Reports Server (NTRS)

    Laroque, S.; Joy, Marshall; Bonamente, M.; Carlstrom, J.; Dawson, K.

    2003-01-01

    We infer the gas mass and total gravitational mass of 11 clusters using two different methods; analysis of X-ray data from the Chandra X-ray Observatory and analysis of centimeter-wave Sunyaev-Zel'dovich Effect (SZE) data from the BIMA and OVRO interferometers. This flux-limited sample of clusters from the BCS cluster catalogue was chosen so as to be well above the surface brightness limit of the ROSAT All Sky Survey; this is therefore an orientation unbiased sample. The gas mass fraction, f_g, is calculated for each cluster using both X-ray and SZE data, and the results are compared at a fiducial radius of r_500. Comparison of the X-ray and SZE results for this orientation unbiased sample allows us to constrain cluster systematics, such as clumping of the intracluster medium. We derive an upper limit on Omega_M assuming that the mass composition of clusters within r_500 reflects the universal mass composition Omega_M h_100 is greater than Omega _B / f-g. We also demonstrate how the mean f_g derived from the sample can be used to estimate the masses of clusters discovered by upcoming deep SZE surveys.

  20. Low Back Pain Subgroups using Fear-Avoidance Model Measures: Results of a Cluster Analysis

    PubMed Central

    Beneciuk, Jason M.; Robinson, Michael E.; George, Steven Z.

    2012-01-01

    Objectives The purpose of this secondary analysis was to test the hypothesis that an empirically derived psychological subgrouping scheme based on multiple Fear-Avoidance Model (FAM) constructs would provide additional capabilities for clinical outcomes in comparison to a single FAM construct. Methods Patients (n = 108) with acute or sub-acute low back pain (LBP) enrolled in a clinical trial comparing behavioral physical therapy interventions to classification based physical therapy completed baseline questionnaires for pain catastrophizing (PCS), fear-avoidance beliefs (FABQ-PA, FABQ-W), and patient-specific fear (FDAQ). Clinical outcomes were pain intensity and disability measured at baseline, 4-weeks, and 6-months. A hierarchical agglomerative cluster analysis was used to create distinct cluster profiles among FAM measures and discriminant analysis was used to interpret clusters. Changes in clinical outcomes were investigated with repeated measures ANOVA and differences in results based on cluster membership were compared to FABQ-PA subgrouping used in the original trial. Results Three distinct FAM subgroups (Low Risk, High Specific Fear, and High Fear & Catastrophizing) emerged from cluster analysis. Subgroups differed on baseline pain and disability (p’s<.01) with the High Fear & Catastrophizing subgroup associated with greater pain than the Low Risk subgroup (p<.01) and the greatest disability (p’s<.05). Subgroup × time interactions were detected for both pain and disability (p’s<.05) with the High Fear & Catastrophizing subgroup reporting greater changes in pain and disability than other subgroups (p’s<.05). In contrast, FABQ-PA subgroups used in the original trial were not associated with interactions for clinical outcomes. Discussion These data suggest that subgrouping based on multiple FAM measures may provide additional information on clinical outcomes in comparison to determining subgroup status by FABQ-PA alone. Subgrouping methods for patients with LBP should include multiple psychological factors to further explore if patients can be matched with appropriate interventions. PMID:22510537

  1. Network module detection: Affinity search technique with the multi-node topological overlap measure

    PubMed Central

    Li, Ai; Horvath, Steve

    2009-01-01

    Background Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. Findings We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Conclusion Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: PMID:19619323

  2. Network module detection: Affinity search technique with the multi-node topological overlap measure.

    PubMed

    Li, Ai; Horvath, Steve

    2009-07-20

    Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/

  3. Combinations of elevated tissue miRNA-17-92 cluster expression and serum prostate-specific antigen as potential diagnostic biomarkers for prostate cancer.

    PubMed

    Feng, Sujuan; Qian, Xiaosong; Li, Han; Zhang, Xiaodong

    2017-12-01

    The aim of the present study was to investigate the effectiveness of the miR-17-92 cluster as a disease progression marker in prostate cancer (PCa). Reverse transcription-quantitative polymerase chain reaction analysis was used to detect the microRNA (miR)-17-92 cluster expression levels in tissues from patients with PCa or benign prostatic hyperplasia (BPH), in addition to in PCa and BPH cell lines. Spearman correlation was used for comparison and estimation of correlations between miRNA expression levels and clinicopathological characteristics such as the Gleason score and prostate-specific antigen (PSA). Receiver operating curve (ROC) analysis was performed for evaluation of specificity and sensitivity of miR-17-92 cluster expression levels for discriminating patients with PCa from patients with BPH. Kaplan-Meier analysis was plotted to investigate the predictive potential of miR-17-92 cluster for PCa biochemical recurrence. Expression of the majority of miRNAs in the miR-17-92 cluster was identified to be significantly increased in PCa tissues and cell lines. Bivariate correlation analysis indicated that the high expression of unregulated miRNAs was positively correlated with Gleason grade, but had no significant association with PSA. ROC curves demonstrated that high expression of miR-17-92 cluster predicted a higher diagnostic accuracy compared with PSA. Improved discriminating quotients were observed when combinations of unregulated miRNAs with PSA were used. Survival analysis confirmed a high combined miRNA score of miR-17-92 cluster was associated with shorter biochemical recurrence interval. miR-17-92 cluster could be a potential diagnostic and prognostic biomarker for PCa, and the combination of the miR-17-92 cluster and serum PSA may enhance the accuracy for diagnosis of PCa.

  4. Clustering of samples and variables with mixed-type data

    PubMed Central

    Edelmann, Dominic; Kopp-Schneider, Annette

    2017-01-01

    Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix. PMID:29182671

  5. The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

    NASA Astrophysics Data System (ADS)

    Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

    2014-05-01

    Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the minority samples and generated high total accuracy meanwhile. The proposed approach makes CBR useful in imbalanced forecasting.

  6. Psychological profiles derived by cluster analysis of Minnesota Multiphasic Personality Inventory and long term clinical outcome after coronary artery by pass grafting.

    PubMed

    Modica, Maddalena; Carabalona, Roberta; Spezzaferri, Rosa; Tavanelli, Monica; Torri, A; Ripamonti, Vittorino; Castiglioni, Paolo; De Maria, Renata; Ferratini, Maurizio

    2012-03-01

    To evaluate the psychological characteristics of coronary heart disease (CHD) patients after coronary artery bypass grafting (CABG) by cluster analysis of Minnesota Multiphasic Personality Inventory (MMPI-2) questionnaires and to assess the impact of the profiles obtained on long-term outcome. 229 CHD patients admitted to cardiac rehabilitation filled in self-administered MMPI-2 questionnaires early after CABG. We assessed the relation between MMPI-2 profiles derived by cluster analysis, clinical characteristics and outcome at 3-year follow-up. Among the 215 patients (76% men, median age 66 years) with valid criteria in control scales, we identified 3 clusters (G) with homogenous psychological characteristics: G1 patients (N = 75) presented somatoform complaints but overall minimal psychological distress. G2 patients (N=72) presented type D personality traits. G3 subjects (N=68) showed a trend to cynicism, mild increases in anger, social introversion and hostility. Clusters overlapped for clinical characteristics such as smoking (G1 21%, G2 24%, G3 24%, p ns), previous myocardial infarction (G1 43%, G2 47%, G3 49% p ns), LV ejection fraction (G1 60 [51-60]; G2 58 [49-60]; G3 60 [55-60], p ns), 3-vessel-disease prevalence (G1 69%, G2 65%, G3 71%, p ns). Three-year event rates were comparable (G1 15%; G2 18%; G3 15%) and Kaplan-Meier curves overlapped among clusters (p ns). After CABG, the interpretation of MMPI-2 by cluster analysis is useful for the psychological and personological diagnosis to direct psychological assistance. Conversely, results from cluster analysis of MMPI-2 do not seem helpful to the clinician to predict long term outcome.

  7. A pyrosequencing assay for the quantitative methylation analysis of the PCDHB gene cluster, the major factor in neuroblastoma methylator phenotype.

    PubMed

    Banelli, Barbara; Brigati, Claudio; Di Vinci, Angela; Casciano, Ida; Forlani, Alessandra; Borzì, Luana; Allemanni, Giorgio; Romani, Massimo

    2012-03-01

    Epigenetic alterations are hallmarks of cancer and powerful biomarkers, whose clinical utilization is made difficult by the absence of standardization and of common methods of data interpretation. The coordinate methylation of many loci in cancer is defined as 'CpG island methylator phenotype' (CIMP) and identifies clinically distinct groups of patients. In neuroblastoma (NB), CIMP is defined by a methylation signature, which includes different loci, but its predictive power on outcome is entirely recapitulated by the PCDHB cluster only. We have developed a robust and cost-effective pyrosequencing-based assay that could facilitate the clinical application of CIMP in NB. This assay permits the unbiased simultaneous amplification and sequencing of 17 out of 19 genes of the PCDHB cluster for quantitative methylation analysis, taking into account all the sequence variations. As some of these variations were at CpG doublets, we bypassed the data interpretation conducted by the methylation analysis software to assign the corrected methylation value at these sites. The final result of the assay is the mean methylation level of 17 gene fragments in the protocadherin B cluster (PCDHB) cluster. We have utilized this assay to compare the methylation levels of the PCDHB cluster between high-risk and very low-risk NB patients, confirming the predictive value of CIMP. Our results demonstrate that the pyrosequencing-based assay herein described is a powerful instrument for the analysis of this gene cluster that may simplify the data comparison between different laboratories and, in perspective, could facilitate its clinical application. Furthermore, our results demonstrate that, in principle, pyrosequencing can be efficiently utilized for the methylation analysis of gene clusters with high internal homologies.

  8. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    PubMed Central

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  9. Structure and Stability of GeAu{sub n}, n = 1-10 clusters: A Density Functional Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Priyanka,; Dharamvir, Keya; Sharma, Hitesh

    2011-12-12

    The structures of Germanium doped gold clusters GeAu{sub n} (n = 1-10) have been investigated using ab initio calculations based on density functional theory (DFT). We have obtained ground state geometries of GeAu{sub n} clusters and have it compared with Silicon doped gold clusters and pure gold clusters. The ground state geometries of the GeAu{sub n} clusters show patterns similar to silicon doped gold clusters except for n = 5, 6 and 9. The introduction of germanium atom increases the binding energy of gold clusters. The binding energy per atom of germanium doped cluster is smaller than the corresponding siliconmore » doped gold cluster. The HUMO-LOMO gap for Au{sub n}Ge clusters have been found to vary between 0.46 eV-2.09 eV. The mullikan charge analysis indicates that charge of order of 0.1e always transfers from germanium atom to gold atom.« less

  10. The Wilcoxon signed rank test for paired comparisons of clustered data.

    PubMed

    Rosner, Bernard; Glynn, Robert J; Lee, Mei-Ling T

    2006-03-01

    The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.

  11. The peculiar velocities of rich clusters in the hot and cold dark matter scenarios

    NASA Technical Reports Server (NTRS)

    Rhee, George F.; West, Michael J.; Villumsen, Jens V.

    1993-01-01

    We present the results of a study of the peculiar velocities of rich clusters of galaxies. The peculiar motion of rich clusters in various cosmological scenarios is of interest for a number of reasons. Observationally, one can measure the peculiar motion of clusters to greater distances than galaxies because cluster peculiar motions can be determined to greater accuracy. One can also test the slope of distance indicator relations using clusters to see if galaxy properties vary with environment. We have used N-body simulations to measure the amplitude and rms cluster peculiar velocity as a function of bias parameter in the hot and cold dark matter scenarios. In addition to measuring the mean and rms peculiar velocity of clusters in the two models, we determined whether the peculiar velocity vector of a given cluster is well aligned with the gravity vector due to all the particles in the simulation and the gravity vector due to the particles present only in the clusters. We have investigated the peculiar velocities of rich clusters of galaxies in the cold dark matter and hot dark matter galaxy formation scenarios. We have derived peculiar velocities and associated errors for the scenarios using four values of the bias parameter ranging from b = 1 to b = 2.5. The growth of the mean peculiar velocity with scale factor has been determined and compared to that predicted by linear theory. In addition, we have compared the orientation of force and velocity in these simulations to see if a program such as that proposed by Bertschinger and Dekel (1989) for elliptical galaxy peculiar motions can be applied to clusters. The method they describe enables one to recover the density field from large scale redshift distance samples. The method makes it possible to do this when only radial velocities are known by assuming that the velocity field is curl free. Our analysis suggests that this program if applied to clusters is only realizable for models with a low value of the bias parameter, i.e., models in which the peculiar velocities of clusters are large enough that the errors do not render the analysis impracticable.

  12. From the Superatom Model to a Diverse Array of Super-Elements: A Systematic Study of Dopant Influence on the Electronic Structure of Thiolate-Protected Gold Clusters.

    PubMed

    Schacht, Julia; Gaston, Nicola

    2016-10-18

    The electronic properties of doped thiolate-protected gold clusters are often referred to as tunable, but their study to date, conducted at different levels of theory, does not allow a systematic evaluation of this claim. Here, using density functional theory, the applicability of the superatomic model to these clusters is critically evaluated, and related to the degree of structural distortion and electronic inhomogeneity in the differently doped clusters, with dopant atoms Pd, Pt, Cu, and Ag. The effect of electron number is systematically evaluated by varying the charge on the overall cluster, and the nominal number of delocalized electrons, employed in the superatomic model, is compared to the numbers obtained from Bader analysis of individual atomic charges. We find that the superatomic model is highly applicable to all of these clusters, and is able to predict and explain the changing electronic structure as a function of charge. However, significant perturbations of the model arise due to doping, due to distortions of the core structure of the Au 13 [RS(AuSR) 2 ] 6 - cluster. In addition, analysis of the electronic structure indicates that the superatomic character is distributed further across the ligand shell in the case of the doped clusters, which may have implications for the self-assembly of these clusters into materials. The prediction of appropriate clusters for such superatomic solids relies critically on such quantitative analysis of the tunability of the electronic structure. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Phylogenetic diversity in the genus Bacillus as seen by 16S rRNA sequencing studies

    NASA Technical Reports Server (NTRS)

    Rossler, D.; Ludwig, W.; Schleifer, K. H.; Lin, C.; McGill, T. J.; Wisotzkey, J. D.; Jurtshuk, P. Jr; Fox, G. E.

    1991-01-01

    Comparative sequence analysis of 16S ribosomal (r)RNAs or DNAs of Bacillus alvei, B. laterosporus, B. macerans, B. macquariensis, B. polymyxa and B. stearothermophilus revealed the phylogenetic diversity of the genus Bacillus. Based on the presently available data set of 16S rRNA sequences from bacilli and relatives at least four major "Bacillus clusters" can be defined: a "Bacillus subtilis cluster" including B. stearothermophilus, a "B. brevis cluster" including B. laterosporus, a "B. alvei cluster" including B. macerans, B. maquariensis and B. polymyxa and a "B. cycloheptanicus branch".

  14. Galaxies in X-ray Selected Clusters and Groups in Dark Energy Survey Data: Stellar Mass Growth of Bright Central Galaxies Since z~1.2

    DOE PAGES

    Zhang, Y.; Miller, C.; McKay, T.; ...

    2016-01-10

    Using the science verification data of the Dark Energy Survey for a new sample of 106 X-ray selected clusters and groups, we study the stellar mass growth of bright central galaxies (BCGs) since redshift z ~ 1.2. Compared with the expectation in a semi-analytical model applied to the Millennium Simulation, the observed BCGs become under-massive/under-luminous with decreasing redshift. We incorporate the uncertainties associated with cluster mass, redshift, and BCG stellar mass measurements into analysis of a redshift-dependent BCG-cluster mass relation.

  15. Identification and Characterization of Unique Subgroups of Chronic Pain Individuals with Dispositional Personality Traits.

    PubMed

    Mehta, S; Rice, D; McIntyre, A; Getty, H; Speechley, M; Sequeira, K; Shapiro, A P; Morley-Forster, P; Teasell, R W

    2016-01-01

    Objective. The current study attempted to identify and characterize distinct CP subgroups based on their level of dispositional personality traits. The secondary objective was to compare the difference among the subgroups in mood, coping, and disability. Methods. Individuals with chronic pain were assessed for demographic, psychosocial, and personality measures. A two-step cluster analysis was conducted in order to identify distinct subgroups of patients based on their level of personality traits. Differences in clinical outcomes were compared using the multivariate analysis of variance based on cluster membership. Results. In 229 participants, three clusters were formed. No significant difference was seen among the clusters on patient demographic factors including age, sex, relationship status, duration of pain, and pain intensity. Those with high levels of dispositional personality traits had greater levels of mood impairment compared to the other two groups (p < 0.05). Significant difference in disability was seen between the subgroups. Conclusions. The study identified a high risk group of CP individuals whose level of personality traits significantly correlated with impaired mood and coping. Use of pharmacological treatment alone may not be successful in improving clinical outcomes among these individuals. Instead, a more comprehensive treatment involving psychological treatments may be important in managing the personality traits that interfere with recovery.

  16. Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials

    PubMed Central

    Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.

    2012-01-01

    Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450

  17. Latent profile and cluster analysis of infant temperament: Comparisons across person-centered approaches.

    PubMed

    Gartstein, Maria A; Prokasky, Amanda; Bell, Martha Ann; Calkins, Susan; Bridgett, David J; Braungart-Rieker, Julia; Leerkes, Esther; Cheatham, Carol L; Eiden, Rina D; Mize, Krystal D; Jones, Nancy Aaron; Mireault, Gina; Seamon, Erich

    2017-10-01

    There is renewed interest in person-centered approaches to understanding the structure of temperament. However, questions concerning temperament types are not frequently framed in a developmental context, especially during infancy. In addition, the most common person-centered techniques, cluster analysis (CA) and latent profile analysis (LPA), have not been compared with respect to derived temperament types. To address these gaps, we set out to identify temperament types for younger and older infants, comparing LPA and CA techniques. Multiple data sets (N = 1,356; 672 girls, 677 boys) with maternal ratings of infant temperament obtained using the Infant Behavior Questionnaire-Revised (Gartstein & Rothbart, 2003) were combined. All infants were between 3 and 12 months of age (M = 7.85; SD = 3.00). Due to rapid development in the first year of life, LPA and CA were performed separately for younger (n = 731; 3 to 8 months of age) and older (n = 625; 9 to 12 months of age) infants. Results supported 3-profile/cluster solutions as optimal for younger infants, and 5-profile/cluster solutions for the older subsample, indicating considerable differences between early/mid and late infancy. LPA and CA solutions produced relatively comparable types for younger and older infants. Results are discussed in the context of developmental changes unique to the end of the first year of life, which likely account for the present findings. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  18. Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

    NASA Astrophysics Data System (ADS)

    Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.

  19. Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials

    PubMed Central

    Andridge, Rebecca. R.

    2011-01-01

    In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. PMID:21259309

  20. Small traveling clusters in attractive and repulsive Hamiltonian mean-field models.

    PubMed

    Barré, Julien; Yamaguchi, Yoshiyuki Y

    2009-03-01

    Long-lasting small traveling clusters are studied in the Hamiltonian mean-field model by comparing between attractive and repulsive interactions. Nonlinear Landau damping theory predicts that a Gaussian momentum distribution on a spatially homogeneous background permits the existence of traveling clusters in the repulsive case, as in plasma systems, but not in the attractive case. Nevertheless, extending the analysis to a two-parameter family of momentum distributions of Fermi-Dirac type, we theoretically predict the existence of traveling clusters in the attractive case; these findings are confirmed by direct N -body numerical simulations. The parameter region with the traveling clusters is much reduced in the attractive case with respect to the repulsive case.

  1. Electronic structure and optical properties of the thiolate-protected Au28(SMe)20 cluster.

    PubMed

    Knoppe, Stefan; Malola, Sami; Lehtovaara, Lauri; Bürgi, Thomas; Häkkinen, Hannu

    2013-10-10

    The recently reported crystal structure of the Au28(TBBT)20 cluster (TBBT: p-tert-butylbenzenethiolate) is analyzed with (time-dependent) density functional theory (TD-DFT). Bader charge analysis reveals a novel trimeric Au3(SR)4 binding motif. The cluster can be formulated as Au14(Au2(SR)3)4(Au3(SR)4)2. The electronic structure of the Au14(6+) core and the ligand-protected cluster were analyzed, and their stability can be explained by formation of distorted eight-electron superatoms. Optical absorption and circular dichroism (CD) spectra were calculated and compared to the experiment. Assignment of handedness of the intrinsically chiral cluster is possible.

  2. Clinical evaluation of a novel population-based regression analysis for detecting glaucomatous visual field progression.

    PubMed

    Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C

    2011-04-01

    The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF clusters. © Georg Thieme Verlag KG Stuttgart · New York.

  3. Using Cluster Bootstrapping to Analyze Nested Data With a Few Clusters.

    PubMed

    Huang, Francis L

    2018-04-01

    Cluster randomized trials involving participants nested within intact treatment and control groups are commonly performed in various educational, psychological, and biomedical studies. However, recruiting and retaining intact groups present various practical, financial, and logistical challenges to evaluators and often, cluster randomized trials are performed with a low number of clusters (~20 groups). Although multilevel models are often used to analyze nested data, researchers may be concerned of potentially biased results due to having only a few groups under study. Cluster bootstrapping has been suggested as an alternative procedure when analyzing clustered data though it has seen very little use in educational and psychological studies. Using a Monte Carlo simulation that varied the number of clusters, average cluster size, and intraclass correlations, we compared standard errors using cluster bootstrapping with those derived using ordinary least squares regression and multilevel models. Results indicate that cluster bootstrapping, though more computationally demanding, can be used as an alternative procedure for the analysis of clustered data when treatment effects at the group level are of primary interest. Supplementary material showing how to perform cluster bootstrapped regressions using R is also provided.

  4. Constraining the mass–richness relationship of redMaPPer clusters with angular clustering

    DOE PAGES

    Baxter, Eric J.; Rozo, Eduardo; Jain, Bhuvnesh; ...

    2016-08-04

    The potential of using cluster clustering for calibrating the mass–richness relation of galaxy clusters has been recognized theoretically for over a decade. In this paper, we demonstrate the feasibility of this technique to achieve high-precision mass calibration using redMaPPer clusters in the Sloan Digital Sky Survey North Galactic Cap. By including cross-correlations between several richness bins in our analysis, we significantly improve the statistical precision of our mass constraints. The amplitude of the mass–richness relation is constrained to 7 per cent statistical precision by our analysis. However, the error budget is systematics dominated, reaching a 19 per cent total errormore » that is dominated by theoretical uncertainty in the bias–mass relation for dark matter haloes. We confirm the result from Miyatake et al. that the clustering amplitude of redMaPPer clusters depends on galaxy concentration as defined therein, and we provide additional evidence that this dependence cannot be sourced by mass dependences: some other effect must account for the observed variation in clustering amplitude with galaxy concentration. Assuming that the observed dependence of redMaPPer clustering on galaxy concentration is a form of assembly bias, we find that such effects introduce a systematic error on the amplitude of the mass–richness relation that is comparable to the error bar from statistical noise. Finally, the results presented here demonstrate the power of cluster clustering for mass calibration and cosmology provided the current theoretical systematics can be ameliorated.« less

  5. Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

    PubMed

    Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

    2017-09-01

    We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.

  6. Percolation analyses of observed and simulated galaxy clustering

    NASA Astrophysics Data System (ADS)

    Bhavsar, S. P.; Barrow, J. D.

    1983-11-01

    A percolation cluster analysis is performed on equivalent regions of the CFA redshift survey of galaxies and the 4000 body simulations of gravitational clustering made by Aarseth, Gott and Turner (1979). The observed and simulated percolation properties are compared and, unlike correlation and multiplicity function analyses, favour high density (Omega = 1) models with n = - 1 initial data. The present results show that the three-dimensional data are consistent with the degree of filamentary structure present in isothermal models of galaxy formation at the level of percolation analysis. It is also found that the percolation structure of the CFA data is a function of depth. Percolation structure does not appear to be a sensitive probe of intrinsic filamentary structure.

  7. First principles study of vibrational dynamics of ceria-titania hybrid clusters

    NASA Astrophysics Data System (ADS)

    Majid, Abdul; Bibi, Maryam

    2017-04-01

    Density functional theory based calculations were performed to study vibrational properties of ceria, titania, and ceria-titania hybrid clusters. The findings revealed the dominance of vibrations related to oxygen when compared to those of metallic atoms in the clusters. In case of hybrid cluster, the softening of normal modes related to exterior oxygen atoms in ceria and softening/hardening of high/low frequency modes related to titania dimmers are observed. The results calculated for monomers conform to symmetry predictions according to which three IR and three Raman active modes were detected for TiO2, whereas two IR active and one Raman active modes were observed for CeO2. The comparative analysis indicates that the hybrid cluster CeTiO4 contains simultaneous vibrational fingerprints of the component dimmers. The symmetry, nature of vibrations, IR and Raman activity, intensities, and atomic involvement in different modes of the clusters are described in detail. The study points to engineering of CeTiO4 to tailor its properties for technological visible region applications in photocatalytic and electrochemical devices.

  8. Variable number of tandem repeats and pulsed-field gel electrophoresis cluster analysis of enterohemorrhagic Escherichia coli serovar O157 strains.

    PubMed

    Yokoyama, Eiji; Uchimura, Masako

    2007-11-01

    Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.

  9. Emergy-based comparative analysis on industrial clusters: economic and technological development zone of Shenyang area, China.

    PubMed

    Liu, Zhe; Geng, Yong; Zhang, Pan; Dong, Huijuan; Liu, Zuoxi

    2014-09-01

    In China, local governments of many areas prefer to give priority to the development of heavy industrial clusters in pursuit of high value of gross domestic production (GDP) growth to get political achievements, which usually results in higher costs from ecological degradation and environmental pollution. Therefore, effective methods and reasonable evaluation system are urgently needed to evaluate the overall efficiency of industrial clusters. Emergy methods links economic and ecological systems together, which can evaluate the contribution of ecological products and services as well as the load placed on environmental systems. This method has been successfully applied in many case studies of ecosystem but seldom in industrial clusters. This study applied the methodology of emergy analysis to perform the efficiency of industrial clusters through a series of emergy-based indices as well as the proposed indicators. A case study of Shenyang Economic Technological Development Area (SETDA) was investigated to show the emergy method's practical potential to evaluate industrial clusters to inform environmental policy making. The results of our study showed that the industrial cluster of electric equipment and electronic manufacturing produced the most economic value and had the highest efficiency of energy utilization among the four industrial clusters. However, the sustainability index of the industrial cluster of food and beverage processing was better than the other industrial clusters.

  10. Vaccines for preventing anthrax.

    PubMed

    Donegan, Sarah; Bellamy, Richard; Gamble, Carrol L

    2009-04-15

    Anthrax is a bacterial zoonosis that occasionally causes human disease and is potentially fatal. Anthrax vaccines include a live-attenuated vaccine, an alum-precipitated cell-free filtrate vaccine, and a recombinant protein vaccine. To evaluate the effectiveness, immunogenicity, and safety of vaccines for preventing anthrax. We searched the following databases (November 2008): Cochrane Infectious Diseases Group Specialized Register; CENTRAL (The Cochrane Library 2008, Issue 4); MEDLINE; EMBASE; LILACS; and mRCT. We also searched reference lists. We included randomized controlled trials (RCTs) of individuals and cluster-RCTs comparing anthrax vaccine with placebo, other (non-anthrax) vaccines, or no intervention; or comparing administration routes or treatment regimens of anthrax vaccine. Two authors independently considered trial eligibility, assessed risk of bias, and extracted data. We presented cases of anthrax and seroconversion rates using risk ratios (RR) and 95% confidence intervals (CI). We summarized immunoglobulin G (IgG) concentrations using geometric means. We carried out a sensitivity analysis to investigate the effect of clustering on the results from one cluster-RCT. No meta-analysis was undertaken. One cluster-RCT (with 157,259 participants) and four RCTs of individuals (1917 participants) met the inclusion criteria. The cluster-RCT from the former USSR showed that, compared with no vaccine, a live-attenuated vaccine (called STI) protected against clinical anthrax whether given by a needleless device (RR 0.16; 102,737 participants, 154 clusters) or the scarification method (RR 0.25; 104,496 participants, 151 clusters). Confidence intervals were statistically significant in unadjusted calculations, but when a small amount of association within clusters was assumed, the differences were not statistically significant. The four RCTs (of individuals) of inactivated vaccines (anthrax vaccine absorbed and recombinant protective antigen) showed a dose response relationship for the anti-protective antigen IgG antibody titre. Intramuscular administration was associated with fewer injection site reactions than subcutaneous injection, and injection site reaction rates were lower when the dosage interval was longer. One cluster-RCT provides limited evidence that a live-attenuated vaccine is effective in preventing cutaneous anthrax. Vaccines based on anthrax antigens are immunogenic in most vaccinees with few adverse events or reactions. Ongoing randomized controlled trials are investigating the immunogenicity and safety of anthrax vaccines.

  11. Combining Multiobjective Optimization and Cluster Analysis to Study Vocal Fold Functional Morphology

    PubMed Central

    Palaparthi, Anil; Riede, Tobias

    2017-01-01

    Morphological design and the relationship between form and function have great influence on the functionality of a biological organ. However, the simultaneous investigation of morphological diversity and function is difficult in complex natural systems. We have developed a multiobjective optimization (MOO) approach in association with cluster analysis to study the form-function relation in vocal folds. An evolutionary algorithm (NSGA-II) was used to integrate MOO with an existing finite element model of the laryngeal sound source. Vocal fold morphology parameters served as decision variables and acoustic requirements (fundamental frequency, sound pressure level) as objective functions. A two-layer and a three-layer vocal fold configuration were explored to produce the targeted acoustic requirements. The mutation and crossover parameters of the NSGA-II algorithm were chosen to maximize a hypervolume indicator. The results were expressed using cluster analysis and were validated against a brute force method. Results from the MOO and the brute force approaches were comparable. The MOO approach demonstrated greater resolution in the exploration of the morphological space. In association with cluster analysis, MOO can efficiently explore vocal fold functional morphology. PMID:24771563

  12. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

    PubMed Central

    Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369

  13. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    PubMed

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  14. [Predicting Incidence of Hepatitis E in Chinausing Fuzzy Time Series Based on Fuzzy C-Means Clustering Analysis].

    PubMed

    Luo, Yi; Zhang, Tao; Li, Xiao-song

    2016-05-01

    To explore the application of fuzzy time series model based on fuzzy c-means clustering in forecasting monthly incidence of Hepatitis E in mainland China. Apredictive model (fuzzy time series method based on fuzzy c-means clustering) was developed using Hepatitis E incidence data in mainland China between January 2004 and July 2014. The incidence datafrom August 2014 to November 2014 were used to test the fitness of the predictive model. The forecasting results were compared with those resulted from traditional fuzzy time series models. The fuzzy time series model based on fuzzy c-means clustering had 0.001 1 mean squared error (MSE) of fitting and 6.977 5 x 10⁻⁴ MSE of forecasting, compared with 0.0017 and 0.0014 from the traditional forecasting model. The results indicate that the fuzzy time series model based on fuzzy c-means clustering has a better performance in forecasting incidence of Hepatitis E.

  15. Hebbian self-organizing integrate-and-fire networks for data clustering.

    PubMed

    Landis, Florian; Ott, Thomas; Stoop, Ruedi

    2010-01-01

    We propose a Hebbian learning-based data clustering algorithm using spiking neurons. The algorithm is capable of distinguishing between clusters and noisy background data and finds an arbitrary number of clusters of arbitrary shape. These properties render the approach particularly useful for visual scene segmentation into arbitrarily shaped homogeneous regions. We present several application examples, and in order to highlight the advantages and the weaknesses of our method, we systematically compare the results with those from standard methods such as the k-means and Ward's linkage clustering. The analysis demonstrates that not only the clustering ability of the proposed algorithm is more powerful than those of the two concurrent methods, the time complexity of the method is also more modest than that of its generally used strongest competitor.

  16. Cluster analysis of novel isometric strength measures produces a valid and evidence-based classification structure for wheelchair track racing.

    PubMed

    Connick, Mark J; Beckman, Emma; Vanlandewijck, Yves; Malone, Laurie A; Blomqvist, Sven; Tweedy, Sean M

    2017-11-25

    The Para athletics wheelchair-racing classification system employs best practice to ensure that classes comprise athletes whose impairments cause a comparable degree of activity limitation. However, decision-making is largely subjective and scientific evidence which reduces this subjectivity is required. To evaluate whether isometric strength tests were valid for the purposes of classifying wheelchair racers and whether cluster analysis of the strength measures produced a valid classification structure. Thirty-two international level, male wheelchair racers from classes T51-54 completed six isometric strength tests evaluating elbow extensors, shoulder flexors, trunk flexors and forearm pronators and two wheelchair performance tests-Top-Speed (0-15 m) and Top-Speed (absolute). Strength tests significantly correlated with wheelchair performance were included in a cluster analysis and the validity of the resulting clusters was assessed. All six strength tests correlated with performance (r=0.54-0.88). Cluster analysis yielded four clusters with reasonable overall structure (mean silhouette coefficient=0.58) and large intercluster strength differences. Six athletes (19%) were allocated to clusters that did not align with their current class. While the mean wheelchair racing performance of the resulting clusters was unequivocally hierarchical, the mean performance of current classes was not, with no difference between current classes T53 and T54. Cluster analysis of isometric strength tests produced classes comprising athletes who experienced a similar degree of activity limitation. The strength tests reported can provide the basis for a new, more transparent, less subjective wheelchair racing classification system, pending replication of these findings in a larger, representative sample. This paper also provides guidance for development of evidence-based systems in other Para sports. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  17. Ambiguity and judgments of obese individuals: no news could be bad news.

    PubMed

    Ross, Kathryn M; Shivy, Victoria A; Mazzeo, Suzanne E

    2009-08-01

    Stigmatization towards obese individuals has not decreased despite the increasing prevalence of obesity. Nonetheless, stigmatization remains difficult to study, given concerns about social desirability. To address this issue, this study used paired comparisons and cluster analysis to examine how undergraduates (n=189) categorized scenarios describing the health-related behaviors of obese individuals. The cluster analysis found that the scenarios were categorized into two distinct clusters. The first cluster included all scenarios with health behaviors indicating high responsibility for body weight. These individuals were perceived as unattractive, lazy, less likeable, less disciplined, and more deserving of their condition compared to individuals in the second cluster, which included all scenarios with health behaviors indicating low responsibility for body weight. Four scenarios depicted obese individuals with ambiguous information regarding health behaviors; three out of these four individuals were categorized in the high-responsibility cluster. These findings suggested that participants viewed these individuals as negatively as those who were responsible for their condition. These results have practical implications for reducing obesity bias, as the etiology of obesity is typically not known in real-life situations.

  18. Statistical analysis of atom probe data: detecting the early stages of solute clustering and/or co-segregation.

    PubMed

    Hyde, J M; Cerezo, A; Williams, T J

    2009-04-01

    Statistical analysis of atom probe data has improved dramatically in the last decade and it is now possible to determine the size, the number density and the composition of individual clusters or precipitates such as those formed in reactor pressure vessel (RPV) steels during irradiation. However, the characterisation of the onset of clustering or co-segregation is more difficult and has traditionally focused on the use of composition frequency distributions (for detecting clustering) and contingency tables (for detecting co-segregation). In this work, the authors investigate the possibility of directly examining the neighbourhood of each individual solute atom as a means of identifying the onset of solute clustering and/or co-segregation. The methodology involves comparing the mean observed composition around a particular type of solute with that expected from the overall composition of the material. The methodology has been applied to atom probe data obtained from several irradiated RPV steels. The results show that the new approach is more sensitive to fine scale clustering and co-segregation than that achievable using composition frequency distribution and contingency table analyses.

  19. Electrical Load Profile Analysis Using Clustering Techniques

    NASA Astrophysics Data System (ADS)

    Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.

    2017-03-01

    Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.

  20. Identification of the Main Regulator Responsible for Synthesis of the Typical Yellow Pigment Produced by Trichoderma reesei

    PubMed Central

    Derntl, Christian; Rassinger, Alice; Srebotnik, Ewald; Mach, Robert L.

    2016-01-01

    ABSTRACT The industrially used ascomycete Trichoderma reesei secretes a typical yellow pigment during cultivation, while other Trichoderma species do not. A comparative genomic analysis suggested that a putative secondary metabolism cluster, containing two polyketide-synthase encoding genes, is responsible for the yellow pigment synthesis. This cluster is conserved in a set of rather distantly related fungi, including Acremonium chrysogenum and Penicillium chrysogenum. In an attempt to silence the cluster in T. reesei, two genes of the cluster encoding transcription factors were individually deleted. For a complete genetic proof-of-function, the genes were reinserted into the genomes of the respective deletion strains. The deletion of the first transcription factor (termed yellow pigment regulator 1 [Ypr1]) resulted in the full abolishment of the yellow pigment formation and the expression of most genes of this cluster. A comparative high-pressure liquid chromatography (HPLC) analysis of supernatants of the ypr1 deletion and its parent strain suggested the presence of several yellow compounds in T. reesei that are all derived from the same cluster. A subsequent gas chromatography/mass spectrometry analysis strongly indicated the presence of sorbicillin in the major HPLC peak. The presence of the second transcription factor, termed yellow pigment regulator 2 (Ypr2), reduces the yellow pigment formation and the expression of most cluster genes, including the gene encoding the activator Ypr1. IMPORTANCE Trichoderma reesei is used for industry-scale production of carbohydrate-active enzymes. During growth, it secretes a typical yellow pigment. This is not favorable for industrial enzyme production because it makes the downstream process more complicated and thus increases operating costs. In this study, we demonstrate which regulators influence the synthesis of the yellow pigment. Based on these data, we also provide indication as to which genes are under the control of these regulators and are finally responsible for the biosynthesis of the yellow pigment. These genes are organized in a cluster that is also found in other industrially relevant fungi, such as the two antibiotic producers Penicillium chrysogenum and Acremonium chrysogenum. The targeted manipulation of a secondary metabolism cluster is an important option for any biotechnologically applied microorganism. PMID:27520818

  1. On selecting a prior for the precision parameter of Dirichlet process mixture models

    USGS Publications Warehouse

    Dorazio, R.M.

    2009-01-01

    In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter ?? and a base probability measure G0. In problems where ?? is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for ??. In this paper an approach is developed for computing a prior for the precision parameter ?? that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.

  2. Comparison of the DiversiLab Repetitive Element PCR System with spa Typing and Pulsed-Field Gel Electrophoresis for Clonal Characterization of Methicillin-Resistant Staphylococcus aureus▿

    PubMed Central

    Babouee, B.; Frei, R.; Schultheiss, E.; Widmer, A. F.; Goldenberger, D.

    2011-01-01

    The emergence of methicillin-resistant Staphylococcus aureus (MRSA) has become an increasing problem worldwide in recent decades. Molecular typing methods have been developed to identify clonality of strains and monitor spread of MRSA. We compared a new commercially available DiversiLab (DL) repetitive element PCR system with spa typing, spa clonal cluster analysis, and pulsed-field gel electrophoresis (PFGE) in terms of discriminatory power and concordance. A collection of 106 well-defined MRSA strains from our hospital was analyzed, isolated between 1994 and 2006. In addition, we analyzed 6 USA300 strains collected in our institution. DL typing separated the 106 MRSA isolates in 10 distinct clusters and 8 singleton patterns. Clustering analysis into spa clonal complexes resulted in 3 clusters: spa-CC 067/548, spa-CC 008, and spa-CC 012. The discriminatory powers (Simpson's index of diversity) were 0.982, 0.950, 0.846, and 0.757 for PFGE, spa typing, DL typing, and spa clonal clustering, respectively. DL typing and spa clonal clustering showed the highest concordance, calculated by adjusted Rand's coefficients. The 6 USA300 isolates grouped homogeneously into distinct PFGE and DL clusters, and all belonged to spa type t008 and spa-CC 008. Among the three methods, DL proved to be rapid and easy to perform. DL typing qualifies for initial screening during outbreak investigation. However, compared to PFGE and spa typing, DL typing has limited discriminatory power and therefore should be complemented by more discriminative methods in isolates that share identical DL patterns. PMID:21307215

  3. Efficacy of community-based physiotherapy networks for patients with Parkinson's disease: a cluster-randomised trial.

    PubMed

    Munneke, Marten; Nijkrake, Maarten J; Keus, Samyra Hj; Kwakkel, Gert; Berendse, Henk W; Roos, Raymund Ac; Borm, George F; Adang, Eddy M; Overeem, Sebastiaan; Bloem, Bastiaan R

    2010-01-01

    Many patients with Parkinson's disease are treated with physiotherapy. We have developed a community-based professional network (ParkinsonNet) that involves training of a selected number of expert physiotherapists to work according to evidence-based recommendations, and structured referrals to these trained physiotherapists to increase the numbers of patients they treat. We aimed to assess the efficacy of this approach for improving health-care outcomes. Between February, 2005, and August, 2007, we did a cluster-randomised trial with 16 clusters (defined as community hospitals and their catchment area). Clusters were randomly allocated by use of a variance minimisation algorithm to ParkinsonNet care (n=8) or usual care (n=8). Patients were assessed at baseline and at 8, 16, and 24 weeks of follow-up. The primary outcome was a patient preference disability score, the patient-specific index score, at 16 weeks. Health secondary outcomes were functional mobility, mobility-related quality of life, and total societal costs over 24 weeks. Analysis was by intention to treat. This trial is registered, number NCT00330694. We included 699 patients. Baseline characteristics of the patients were comparable between the ParkinsonNet clusters (n=358) and usual-care clusters (n=341). The primary endpoint was similar for patients within the ParkinsonNet clusters (mean 47.7, SD 21.9) and control clusters (48.3, 22.4). Health secondary endpoints were also similar for patients in both study groups. Total costs over 24 weeks were lower in ParkinsonNet clusters compared with usual-care clusters (difference euro727; 95% CI 56-1399). Implementation of ParkinsonNet networks did not change health outcomes for patients living in ParkinsonNet clusters. However, health-care costs were reduced in ParkinsonNet clusters compared with usual-care clusters. ZonMw; Netherlands Organisation for Scientific Research; Dutch Parkinson's Disease Society; National Parkinson Foundation; Stichting Robuust. Copyright 2010 Elsevier Ltd. All rights reserved.

  4. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  5. Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    PubMed Central

    Boyack, Kevin W.; Newman, David; Duhon, Russell J.; Klavans, Richard; Patek, Michael; Biberstine, Joseph R.; Schijvenaars, Bob; Skupin, André; Ma, Nianli; Börner, Katy

    2011-01-01

    Background We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. Methodology We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE. Conclusions PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts. PMID:21437291

  6. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    PubMed

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  7. The study of structures and properties of PdnHm(n=1-10, m=1,2) clusters by density functional theory

    NASA Astrophysics Data System (ADS)

    Wen, Jun-Qing; Chen, Guo-Xiang; Zhang, Jian-Min; Wu, Hua

    2018-04-01

    The geometrical evolution, local relative stability, magnetism and charge transfer characteristics of PdnHm(n = 1-10, m = 1,2) have been systematically calculated by using density functional theory. The studied results show that the most stable geometries of PdnH and PdnH2 (n = 1-10) can be got by doping one or two H atoms on the sides of Pdn clusters except Pd6H and Pd6H2. It is found that doping one or two H atoms on Pdn clusters cannot change the basic framework of Pdn. The analysis of stability shows that Pd2H, Pd4H, Pd7H, Pd2H2, Pd4H2 and Pd7H2 clusters have higher local relative stability than neighboring clusters. The analysis of magnetic properties demonstrates that absorption of hydrogen atoms decreases the average atomic magnetic moments compared with pure Pdn clusters. More charges transfer from H atoms to Pd atoms for Pd6H and Pd6H2 clusters, demonstrating the adsorption of hydrogen atoms change from side adsorption to surface adsorption.

  8. Photometric Calibrations of Gemini Images of NGC 6253

    NASA Astrophysics Data System (ADS)

    Pearce, Sean; Jeffery, Elizabeth

    2017-01-01

    We present preliminary results of our analysis of the metal-rich open cluster NGC 6253 using imaging data from GMOS on the Gemini-South Observatory. These data are part of a larger project to observe the effects of high metallicity on white dwarf cooling processes, especially the white dwarf cooling age, which have important implications on the processes of stellar evolution. To standardize the Gemini photometry, we have also secured imaging data of both the cluster and standard star fields using the 0.6-m SARA Observatory at CTIO. By analyzing and comparing the standard star fields of both the SARA data and the published Gemini zero-points of the standard star fields, we will calibrate the data obtained for the cluster. These calibrations are an important part of the project to obtain a standardized deep color-magnitude diagram to analyze the cluster. We present the process of verifying our standardization process. With a standardized CMD, we also present an analysis of the cluster's main sequence turn off age.

  9. Unraveling the efficiency of RAPD and SSR markers in diversity analysis and population structure estimation in common bean.

    PubMed

    Zargar, Sajad Majeed; Farhat, Sufia; Mahajan, Reetika; Bhakhri, Ayushi; Sharma, Arjun

    2016-01-01

    Increase in food production viz-a-viz quality of food is important to feed the growing human population to attain food as well as nutritional security. The availability of diverse germplasm of any crop is an important genetic resource to mine the genes that may assist in attaining food as well as nutritional security. Here we used 15 RAPD and 23 SSR markers to elucidate diversity among 51 common bean genotypes mostly landraces collected from the Himalayan region of Jammu and Kashmir, India. We observed that both the markers are highly polymorphic. The discriminatory power of these markers was determined using various parameters like; percent polymorphism, PIC, resolving power and marker index. 15 RAPDs produced 171 polymorphic bands, while 23 SSRs produced 268 polymorphic bands. SSRs showed a higher PIC value (0.300) compared to RAPDs (0.243). Further the resolving power of SSRs was 5.241 compared to 3.86 for RAPDs. However, RAPDs showed a higher marker index (2.69) compared to SSRs (1.279) that may be attributed to their higher multiplex ratio. The dendrograms generated with hierarchical UPGMA cluster analysis grouped genotypes into two main clusters with various degrees of sub clustering within the cluster. Here we observed that both the marker systems showed comparable accuracy in grouping genotypes of common bean according to their area of cultivation. The model based STRUCTURE analysis using 15 RAPD and 23 SSR markers identified a population with 3 sub-populations which corresponds to distance based groupings. High level of genetic diversity was observed within the population. These findings have further implications in common bean breeding as well as conservation programs.

  10. Neuro- and social-cognitive clustering highlights distinct profiles in adults with anorexia nervosa.

    PubMed

    Renwick, Beth; Musiat, Peter; Lose, Anna; DeJong, Hannah; Broadbent, Hannah; Kenyon, Martha; Loomes, Rachel; Watson, Charlotte; Ghelani, Shreena; Serpell, Lucy; Richards, Lorna; Johnson-Sabine, Eric; Boughton, Nicky; Treasure, Janet; Schmidt, Ulrike

    2015-01-01

    This study aimed to explore the neuro- and social-cognitive profile of a consecutive series of adult outpatients with anorexia nervosa (AN) when compared with widely available age and gender matched historical control data. The relationship between performance profiles, clinical characteristics, service utilization, and treatment adherence was also investigated. Consecutively recruited outpatients with a broad diagnosis of AN (restricting subtype AN-R: n = 44, binge-purge subtype AN-BP: n = 33 or Eating Disorder Not Otherwise Specified-AN subtype EDNOS-AN: n = 23) completed a comprehensive set of neurocognitive (set-shifting, central coherence) and social-cognitive measures (Emotional Theory of Mind). Data were subjected to hierarchical cluster analysis and a discriminant function analysis. Three separate, meaningful clusters emerged. Cluster 1 (n = 45) showed overall average to high average neuro- and social- cognitive performance, Cluster 2 (n = 38) showed mixed performance characterized by distinct strengths and weaknesses, and Cluster 3 (n = 17) showed poor overall performance (Autism Spectrum disorder (ASD) like cluster). The three clusters did not differ in terms of eating disorder symptoms, comorbid features or service utilization and treatment adherence. A discriminant function analysis confirmed that the clusters were best characterized by performance in perseveration and set-shifting measures. The findings suggest that considerable neuro- and social-cognitive heterogeneity exists in patients with AN, with a subset showing ASD-like features. The value of this method of profiling in predicting longer term patient outcomes and in guiding development of etiologically targeted treatments remains to be seen. © 2014 Wiley Periodicals, Inc.

  11. Clusters of Midlife Women by Physical Activity and Their Racial/Ethnic Differences

    PubMed Central

    Im, Eun-Ok; Ko, Young; Chee, Eunice; Chee, Wonshik; Mao, Jun James

    2016-01-01

    Objective The purpose of this study was to identify clusters of midlife women by physical activity and to determine racial/ethnic differences in physical activities in each cluster. Methods This was a secondary analysis of the data from 542 women (157 Non-Hispanic [NH] Whites, 127 Hispanics, 135 NH African Americans, and 123 NH Asian) in a larger Internet study on midlife women’s attitudes toward physical activity. The instruments included the Barriers to Health Activities Scale, the Physical Activity Assessment Inventory, the Questions on Attitudes toward Physical Activity, Subjective Norm, Perceived Behavioral Control, and Behavioral Intention, and the Kaiser Physical Activity Survey. The data were analyzed using hierarchical cluster analyses, ANOVA, and multinominal logistic analyses. Results A three cluster solution was adopted: Cluster 1 (high active living and sports/exercise activity group; 48%), Cluster 2 (high household/caregiving and occupational activity group; 27%), and Cluster 3 (low active living and sports/exercise activity group; 26%). There were significant racial/ethnic differences in occupational activities of Clusters 1 and 3 (all p<.01). Compared with Cluster 1, Cluster 2 tended to have lower family income, less access to health care, higher unemployment, higher perceived barriers scores, and lower social influences scores (all p<.01). Compared with Cluster 1, Cluster 3 tended to have greater obesity, less access to health care, higher perceived barriers scores, more negative attutides toward physical activity, and lower self-efficacy scores (all p<.01). Conclusions Midlife women’s unique patterns of physical activity and their associated factors need to be considered in future intervention development. PMID:27846052

  12. Characterizing cognitive heterogeneity on the schizophrenia-bipolar disorder spectrum.

    PubMed

    Van Rheenen, T E; Lewandowski, K E; Tan, E J; Ospina, L H; Ongur, D; Neill, E; Gurvich, C; Pantelis, C; Malhotra, A K; Rossell, S L; Burdick, K E

    2017-07-01

    Current group-average analysis suggests quantitative but not qualitative cognitive differences between schizophrenia (SZ) and bipolar disorder (BD). There is increasing recognition that cognitive within-group heterogeneity exists in both disorders, but it remains unclear as to whether between-group comparisons of performance in cognitive subgroups emerging from within each of these nosological categories uphold group-average findings. We addressed this by identifying cognitive subgroups in large samples of SZ and BD patients independently, and comparing their cognitive profiles. The utility of a cross-diagnostic clustering approach to understanding cognitive heterogeneity in these patients was also explored. Hierarchical clustering analyses were conducted using cognitive data from 1541 participants (SZ n = 564, BD n = 402, healthy control n = 575). Three qualitatively and quantitatively similar clusters emerged within each clinical group: a severely impaired cluster, a mild-moderately impaired cluster and a relatively intact cognitive cluster. A cross-diagnostic clustering solution also resulted in three subgroups and was superior in reducing cognitive heterogeneity compared with disorder clustering independently. Quantitative SZ-BD cognitive differences commonly seen using group averages did not hold when cognitive heterogeneity was factored into our sample. Members of each corresponding subgroup, irrespective of diagnosis, might be manifesting the outcome of differences in shared cognitive risk factors.

  13. Identification of homogeneous regions for regionalization of watersheds by two-level self-organizing feature maps

    NASA Astrophysics Data System (ADS)

    Farsadnia, F.; Rostami Kamrood, M.; Moghaddam Nia, A.; Modarres, R.; Bray, M. T.; Han, D.; Sadatinejad, J.

    2014-02-01

    One of the several methods in estimating flood quantiles in ungauged or data-scarce watersheds is regional frequency analysis. Amongst the approaches to regional frequency analysis, different clustering techniques have been proposed to determine hydrologically homogeneous regions in the literature. Recently, Self-Organization feature Map (SOM), a modern hydroinformatic tool, has been applied in several studies for clustering watersheds. However, further studies are still needed with SOM on the interpretation of SOM output map for identifying hydrologically homogeneous regions. In this study, two-level SOM and three clustering methods (fuzzy c-mean, K-mean, and Ward's Agglomerative hierarchical clustering) are applied in an effort to identify hydrologically homogeneous regions in Mazandaran province watersheds in the north of Iran, and their results are compared with each other. Firstly the SOM is used to form a two-dimensional feature map. Next, the output nodes of the SOM are clustered by using unified distance matrix algorithm and three clustering methods to form regions for flood frequency analysis. The heterogeneity test indicates the four regions achieved by the two-level SOM and Ward approach after adjustments are sufficiently homogeneous. The results suggest that the combination of SOM and Ward is much better than the combination of either SOM and FCM or SOM and K-mean.

  14. Vitamin and mineral supplement users. Do they have healthy or unhealthy dietary behaviours?

    PubMed

    van der Horst, Klazine; Siegrist, Michael

    2011-12-01

    It is unknown whether people use vitamin and mineral supplements (VMS) to compensate for unhealthy diets, or people whom already have a healthy diet use VMS. Therefore, this study aimed to examine correlates of VMS use and whether VMS users can be categorised into specific clusters based on dietary lifestyle variables. The data used came from the Swiss Food Panel questionnaire for 2010. The sample consisted of 6189 respondents, mean age was 54 years and 47.6% were males. Data was analysed with logistic regression analysis and hierarchical cluster analysis. The results revealed that for VMS use, gender, age, education, chronic illness, health consciousness, benefits of fortification, convenience food and sugar-sweetened beverage consumption were of importance. Cluster analysis revealed three clusters (1) healthy diet, (2) unhealthy diet and (3) modest diet. Compared to non-users a higher percentage of VMS users was categorised in the healthy cluster and a lower percentage in the unhealthy cluster. More VMS-users were categorised as having an unhealthy diet (31.4%) than having a healthy diet (20.6%). The results suggest that both hypotheses-VMS are used by people with unhealthy diets and by people who least need them-hold true meaning. Copyright © 2011. Published by Elsevier Ltd.

  15. Minimum number of clusters and comparison of analysis methods for cross sectional stepped wedge cluster randomised trials with binary outcomes: A simulation study.

    PubMed

    Barker, Daniel; D'Este, Catherine; Campbell, Michael J; McElduff, Patrick

    2017-03-09

    Stepped wedge cluster randomised trials frequently involve a relatively small number of clusters. The most common frameworks used to analyse data from these types of trials are generalised estimating equations and generalised linear mixed models. A topic of much research into these methods has been their application to cluster randomised trial data and, in particular, the number of clusters required to make reasonable inferences about the intervention effect. However, for stepped wedge trials, which have been claimed by many researchers to have a statistical power advantage over the parallel cluster randomised trial, the minimum number of clusters required has not been investigated. We conducted a simulation study where we considered the most commonly used methods suggested in the literature to analyse cross-sectional stepped wedge cluster randomised trial data. We compared the per cent bias, the type I error rate and power of these methods in a stepped wedge trial setting with a binary outcome, where there are few clusters available and when the appropriate adjustment for a time trend is made, which by design may be confounding the intervention effect. We found that the generalised linear mixed modelling approach is the most consistent when few clusters are available. We also found that none of the common analysis methods for stepped wedge trials were both unbiased and maintained a 5% type I error rate when there were only three clusters. Of the commonly used analysis approaches, we recommend the generalised linear mixed model for small stepped wedge trials with binary outcomes. We also suggest that in a stepped wedge design with three steps, at least two clusters be randomised at each step, to ensure that the intervention effect estimator maintains the nominal 5% significance level and is also reasonably unbiased.

  16. Exploring the individual patterns of spiritual well-being in people newly diagnosed with advanced cancer: a cluster analysis.

    PubMed

    Bai, Mei; Dixon, Jane; Williams, Anna-Leila; Jeon, Sangchoon; Lazenby, Mark; McCorkle, Ruth

    2016-11-01

    Research shows that spiritual well-being correlates positively with quality of life (QOL) for people with cancer, whereas contradictory findings are frequently reported with respect to the differentiated associations between dimensions of spiritual well-being, namely peace, meaning and faith, and QOL. This study aimed to examine individual patterns of spiritual well-being among patients newly diagnosed with advanced cancer. Cluster analysis was based on the twelve items of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale at Time 1. A combination of hierarchical and k-means (non-hierarchical) clustering methods was employed to jointly determine the number of clusters. Self-rated health, depressive symptoms, peace, meaning and faith, and overall QOL were compared at Time 1 and Time 2. Hierarchical and k-means clustering methods both suggested four clusters. Comparison of the four clusters supported statistically significant and clinically meaningful differences in QOL outcomes among clusters while revealing contrasting relations of faith with QOL. Cluster 1, Cluster 3, and Cluster 4 represented high, medium, and low levels of overall QOL, respectively, with correspondingly high, medium, and low levels of peace, meaning, and faith. Cluster 2 was distinguished from other clusters by its medium levels of overall QOL, peace, and meaning and low level of faith. This study provides empirical support for individual difference in response to a newly diagnosed cancer and brings into focus conceptual and methodological challenges associated with the measure of spiritual well-being, which may partly contribute to the attenuated relation between faith and QOL.

  17. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality

    PubMed Central

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; MacEachren, Alan M

    2008-01-01

    Background Kulldorff's spatial scan statistic and its software implementation – SaTScan – are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. Results We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. Conclusion The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. Method We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit. PMID:18992163

  18. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality.

    PubMed

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; Maceachren, Alan M

    2008-11-07

    Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit.

  19. Comparing Dark Energy Survey and HST-CLASH observations of the galaxy cluster RXC J2248.7-4431: implications for stellar mass versus dark matter

    NASA Astrophysics Data System (ADS)

    Palmese, A.; Lahav, O.; Banerji, M.; Gruen, D.; Jouvel, S.; Melchior, P.; Aleksić, J.; Annis, J.; Diehl, H. T.; Hartley, W. G.; Jeltema, T.; Romer, A. K.; Rozo, E.; Rykoff, E. S.; Seitz, S.; Suchyta, E.; Zhang, Y.; Abbott, T. M. C.; Abdalla, F. B.; Allam, S.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Buckley-Geer, E.; Burke, D. L.; Capozzi, D.; Carnero Rosell, A.; Carrasco Kind, M.; Carretero, J.; Crocce, M.; Cunha, C. E.; D'Andrea, C. B.; da Costa, L. N.; Desai, S.; Dietrich, J. P.; Doel, P.; Estrada, J.; Evrard, A. E.; Flaugher, B.; Frieman, J.; Gerdes, D. W.; Goldstein, D. A.; Gruendl, R. A.; Gutierrez, G.; Honscheid, K.; James, D. J.; Kuehn, K.; Kuropatkin, N.; Li, T. S.; Lima, M.; Maia, M. A. G.; Marshall, J. L.; Miller, C. J.; Miquel, R.; Nord, B.; Ogando, R.; Plazas, A. A.; Roodman, A.; Sanchez, E.; Scarpine, V.; Sevilla-Noarbe, I.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Tucker, D.; Vikram, V.

    2016-12-01

    We derive the stellar mass fraction in the galaxy cluster RXC J2248.7-4431 observed with the Dark Energy Survey (DES) during the Science Verification period. We compare the stellar mass results from DES (five filters) with those from the Hubble Space Telescope Cluster Lensing And Supernova Survey (CLASH; 17 filters). When the cluster spectroscopic redshift is assumed, we show that stellar masses from DES can be estimated within 25 per cent of CLASH values. We compute the stellar mass contribution coming from red and blue galaxies, and study the relation between stellar mass and the underlying dark matter using weak lensing studies with DES and CLASH. An analysis of the radial profiles of the DES total and stellar mass yields a stellar-to-total fraction of f⋆ = (6.8 ± 1.7) × 10-3 within a radius of r200c ≃ 2 Mpc. Our analysis also includes a comparison of photometric redshifts and star/galaxy separation efficiency for both data sets. We conclude that space-based small field imaging can be used to calibrate the galaxy properties in DES for the much wider field of view. The technique developed to derive the stellar mass fraction in galaxy clusters can be applied to the ˜100 000 clusters that will be observed within this survey and yield important information about galaxy evolution.

  20. Comparing Dark Energy Survey and HST –CLASH observations of the galaxy cluster RXC J2248.7-4431: implications for stellar mass versus dark matter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palmese, A.; Lahav, O.; Banerji, M.

    We derive the stellar mass fraction in the galaxy cluster RXC J2248.7-4431 observed with the Dark Energy Survey (DES) during the Science Verification period. We compare the stellar mass results from DES (five filters) with those from the Hubble Space Telescope Cluster Lensing And Supernova Survey (CLASH; 17 filters). When the cluster spectroscopic redshift is assumed, we show that stellar masses from DES can be estimated within 25 per cent of CLASH values. We compute the stellar mass contribution coming from red and blue galaxies, and study the relation between stellar mass and the underlying dark matter using weak lensingmore » studies with DES and CLASH. An analysis of the radial profiles of the DES total and stellar mass yields a stellar-to-total fraction of f(star) = (6.8 +/- 1.7) x 10(-3) within a radius of r(200c) similar or equal to 2 Mpc. Our analysis also includes a comparison of photometric redshifts and star/galaxy separation efficiency for both data sets. We conclude that space-based small field imaging can be used to calibrate the galaxy properties in DES for the much wider field of view. The technique developed to derive the stellar mass fraction in galaxy clusters can be applied to the similar to 100 000 clusters that will be observed within this survey and yield important information about galaxy evolution.« less

  1. Cluster analysis of fasciolosis in dairy cow herds in Munster province of Ireland and detection of major climatic and environmental predictors of the exposure risk.

    PubMed

    Selemetas, Nikolaos; Phelan, Paul; O'Kiely, Padraig; de Waal, Theo

    2015-03-19

    Fasciolosis caused by Fasciola hepatica is a widespread parasitic disease in cattle farms. The aim of this study was to detect clusters of fasciolosis in dairy cow herds in Munster Province, Ireland and to identify significant climatic and environmental predictors of the exposure risk. In total, 1,292 dairy herds across Munster was sampled in September 2012 providing a single bulk tank milk (BTM) sample. The analysis of samples by an in-house antibody-detection enzyme-linked immunosorbent assay (ELISA), showed that 65% of the dairy herds (n = 842) had been exposed to F. hepatica. Using the Getis-Ord Gi* statistic, 16 high-risk and 24 low-risk (P <0.01) clusters of fasciolosis were identified. The spatial distribution of high-risk clusters was more dispersed and mainly located in the northern and western regions of Munster compared to the low-risk clusters that were mostly concentrated in the southern and eastern regions. The most significant classes of variables that could reflect the difference between high-risk and low-risk clusters were the total number of wet-days and rain-days, rainfall, the normalized difference vegetation index (NDVI), temperature and soil type. There was a bigger proportion of well-drained soils among the low-risk clusters, whereas poorly drained soils were more common among the high-risk clusters. These results stress the role of precipitation, grazing, temperature and drainage on the life cycle of F. hepatica in the temperate Irish climate. The findings of this study highlight the importance of cluster analysis for identifying significant differences in climatic and environmental variables between high-risk and low-risk clusters of fasciolosis in Irish dairy herds.

  2. Fatality rate of pedestrians and fatal crash involvement rate of drivers in pedestrian crashes: a case study of Iran.

    PubMed

    Kashani, Ali Tavakoli; Besharati, Mohammad Mehdi

    2017-06-01

    The aim of this study was to uncover patterns of pedestrian crashes. In the first stage, 34,178 pedestrian-involved crashes occurred in Iran during a four-year period were grouped into homogeneous clusters using a clustering analysis. Next, some in-cluster and inter-cluster crash patterns were analysed. The clustering analysis yielded six pedestrian crash groups. Car/van/pickup crashes on rural roads as well as heavy vehicle crashes were found to be less frequent but more likely to be fatal compared to other crash clusters. In addition, after controlling for crash frequency in each cluster, it was found that the fatality rate of each pedestrian age group as well as the fatal crash involvement rate of each driver age group varies across the six clusters. Results of present study has some policy implications including, promoting pedestrian safety training sessions for heavy vehicle drivers, imposing limitations over elderly heavy vehicle drivers, reinforcing penalties toward under 19 drivers and motorcyclists. In addition, road safety campaigns in rural areas may be promoted to inform people about the higher fatality rate of pedestrians on rural roads. The crash patterns uncovered in this study might also be useful for prioritizing future pedestrian safety research areas.

  3. Cluster-guided imaging-based CFD analysis of airflow and particle deposition in asthmatic human lungs

    NASA Astrophysics Data System (ADS)

    Choi, Jiwoong; Leblanc, Lawrence; Choi, Sanghun; Haghighi, Babak; Hoffman, Eric; Lin, Ching-Long

    2017-11-01

    The goal of this study is to assess inter-subject variability in delivery of orally inhaled drug products to small airways in asthmatic lungs. A recent multiscale imaging-based cluster analysis (MICA) of computed tomography (CT) lung images in an asthmatic cohort identified four clusters with statistically distinct structural and functional phenotypes associating with unique clinical biomarkers. Thus, we aimed to address inter-subject variability via inter-cluster variability. We selected a representative subject from each of the 4 asthma clusters as well as 1 male and 1 female healthy controls, and performed computational fluid and particle simulations on CT-based airway models of these subjects. The results from one severe and one non-severe asthmatic cluster subjects characterized by segmental airway constriction had increased particle deposition efficiency, as compared with the other two cluster subjects (one non-severe and one severe asthmatics) without airway constriction. Constriction-induced jets impinging on distal bifurcations led to excessive particle deposition. The results emphasize the impact of airway constriction on regional particle deposition rather than disease severity, demonstrating the potential of using cluster membership to tailor drug delivery. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837. XSEDE.

  4. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

    PubMed

    Sauzet, Odile; Peacock, Janet L

    2017-07-20

    The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  5. HICOSMO - X-ray analysis of a complete sample of galaxy clusters

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T.

    2017-10-01

    Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

  6. Type 2 diabetes mellitus: distribution of genetic markers in Kazakh population

    PubMed Central

    Sikhayeva, Nurgul; Talzhanov, Yerkebulan; Iskakova, Aisha; Dzharmukhanov, Jarkyn; Nugmanova, Raushan; Zholdybaeva, Elena; Ramanculov, Erlan

    2018-01-01

    Background Ethnic differences exist in the frequencies of genetic variations that contribute to the risk of common disease. This study aimed to analyse the distribution of several genes, previously associated with susceptibility to type 2 diabetes and obesity-related phenotypes, in a Kazakh population. Methods A total of 966 individuals belonging to the Kazakh ethnicity were recruited from an outpatient clinic. We genotyped 41 common single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes in other ethnic groups and 31 of these were in Hardy–Weinberg equilibrium. The obtained allele frequencies were further compared to publicly available data from other ethnic populations. Allele frequencies for other (compared) populations were pooled from the haplotype map (HapMap) database. Principal component analysis (PCA), cluster analysis, and multidimensional scaling (MDS) were used for the analysis of genetic relationship between the populations. Results Comparative analysis of allele frequencies of the studied SNPs showed significant differentiation among the studied populations. The Kazakh population was grouped with Asian populations according to the cluster analysis and with the Caucasian populations according to PCA. According to MDS, results of the current study show that the Kazakh population holds an intermediate position between Caucasian and Asian populations. Conclusion A high percentage of population differentiation was observed between Kazakh and world populations. The Kazakh population was clustered with Caucasian populations, and this result may indicate a significant Caucasian component in the Kazakh gene pool. PMID:29551892

  7. A novel artificial immune algorithm for spatial clustering with obstacle constraint and its applications.

    PubMed

    Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji

    2014-01-01

    An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.

  8. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

    PubMed

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  9. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

    NASA Astrophysics Data System (ADS)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  10. Comparison of Salmonella enteritidis phage types isolated from layers and humans in Belgium in 2005.

    PubMed

    Welby, Sarah; Imberechts, Hein; Riocreux, Flavien; Bertrand, Sophie; Dierick, Katelijne; Wildemauwe, Christa; Hooyberghs, Jozef; Van der Stede, Yves

    2011-08-01

    The aim of this study was to investigate the available results for Belgium of the European Union coordinated monitoring program (2004/665 EC) on Salmonella in layers in 2005, as well as the results of the monthly outbreak reports of Salmonella Enteritidis in humans in 2005 to identify a possible statistical significant trend in both populations. Separate descriptive statistics and univariate analysis were carried out and the parametric and/or non-parametric hypothesis tests were conducted. A time cluster analysis was performed for all Salmonella Enteritidis phage types (PTs) isolated. The proportions of each Salmonella Enteritidis PT in layers and in humans were compared and the monthly distribution of the most common PT, isolated in both populations, was evaluated. The time cluster analysis revealed significant clusters during the months May and June for layers and May, July, August, and September for humans. PT21, the most frequently isolated PT in both populations in 2005, seemed to be responsible of these significant clusters. PT4 was the second most frequently isolated PT. No significant difference was found for the monthly trend evolution of both PT in both populations based on parametric and non-parametric methods. A similar monthly trend of PT distribution in humans and layers during the year 2005 was observed. The time cluster analysis and the statistical significance testing confirmed these results. Moreover, the time cluster analysis showed significant clusters during the summer time and slightly delayed in time (humans after layers). These results suggest a common link between the prevalence of Salmonella Enteritidis in layers and the occurrence of the pathogen in humans. Phage typing was confirmed to be a useful tool for identifying temporal trends.

  11. Methane Production in Dairy Cows Correlates with Rumen Methanogenic and Bacterial Community Structure.

    PubMed

    Danielsson, Rebecca; Dicksved, Johan; Sun, Li; Gonda, Horacio; Müller, Bettina; Schnürer, Anna; Bertilsson, Jan

    2017-01-01

    Methane (CH 4 ) is produced as an end product from feed fermentation in the rumen. Yield of CH 4 varies between individuals despite identical feeding conditions. To get a better understanding of factors behind the individual variation, 73 dairy cows given the same feed but differing in CH 4 emissions were investigated with focus on fiber digestion, fermentation end products and bacterial and archaeal composition. In total 21 cows (12 Holstein, 9 Swedish Red) identified as persistent low, medium or high CH 4 emitters over a 3 month period were furthermore chosen for analysis of microbial community structure in rumen fluid. This was assessed by sequencing the V4 region of 16S rRNA gene and by quantitative qPCR of targeted Methanobrevibacter groups. The results showed a positive correlation between low CH 4 emitters and higher abundance of Methanobrevibacter ruminantium clade. Principal coordinate analysis (PCoA) on operational taxonomic unit (OTU) level of bacteria showed two distinct clusters ( P < 0.01) that were related to CH 4 production. One cluster was associated with low CH 4 production (referred to as cluster L) whereas the other cluster was associated with high CH 4 production (cluster H) and the medium emitters occurred in both clusters. The differences between clusters were primarily linked to differential abundances of certain OTUs belonging to Prevotella . Moreover, several OTUs belonging to the family Succinivibrionaceae were dominant in samples belonging to cluster L. Fermentation pattern of volatile fatty acids showed that proportion of propionate was higher in cluster L, while proportion of butyrate was higher in cluster H. No difference was found in milk production or organic matter digestibility between cows. Cows in cluster L had lower CH 4 /kg energy corrected milk (ECM) compared to cows in cluster H, 8.3 compared to 9.7 g CH 4 /kg ECM, showing that low CH 4 cows utilized the feed more efficient for milk production which might indicate a more efficient microbial population or host genetic differences that is reflected in bacterial and archaeal (or methanogens) populations.

  12. Methane Production in Dairy Cows Correlates with Rumen Methanogenic and Bacterial Community Structure

    PubMed Central

    Danielsson, Rebecca; Dicksved, Johan; Sun, Li; Gonda, Horacio; Müller, Bettina; Schnürer, Anna; Bertilsson, Jan

    2017-01-01

    Methane (CH4) is produced as an end product from feed fermentation in the rumen. Yield of CH4 varies between individuals despite identical feeding conditions. To get a better understanding of factors behind the individual variation, 73 dairy cows given the same feed but differing in CH4 emissions were investigated with focus on fiber digestion, fermentation end products and bacterial and archaeal composition. In total 21 cows (12 Holstein, 9 Swedish Red) identified as persistent low, medium or high CH4 emitters over a 3 month period were furthermore chosen for analysis of microbial community structure in rumen fluid. This was assessed by sequencing the V4 region of 16S rRNA gene and by quantitative qPCR of targeted Methanobrevibacter groups. The results showed a positive correlation between low CH4 emitters and higher abundance of Methanobrevibacter ruminantium clade. Principal coordinate analysis (PCoA) on operational taxonomic unit (OTU) level of bacteria showed two distinct clusters (P < 0.01) that were related to CH4 production. One cluster was associated with low CH4 production (referred to as cluster L) whereas the other cluster was associated with high CH4 production (cluster H) and the medium emitters occurred in both clusters. The differences between clusters were primarily linked to differential abundances of certain OTUs belonging to Prevotella. Moreover, several OTUs belonging to the family Succinivibrionaceae were dominant in samples belonging to cluster L. Fermentation pattern of volatile fatty acids showed that proportion of propionate was higher in cluster L, while proportion of butyrate was higher in cluster H. No difference was found in milk production or organic matter digestibility between cows. Cows in cluster L had lower CH4/kg energy corrected milk (ECM) compared to cows in cluster H, 8.3 compared to 9.7 g CH4/kg ECM, showing that low CH4 cows utilized the feed more efficient for milk production which might indicate a more efficient microbial population or host genetic differences that is reflected in bacterial and archaeal (or methanogens) populations. PMID:28261182

  13. Different mechanisms of cluster explosion within a unified smooth particle hydrodynamics Thomas-Fermi approach: Optical and short-wavelength regimes compared

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rusek, Marian; Orlowski, Arkadiusz

    2005-04-01

    The dynamics of small ({<=}55 atoms) argon clusters ionized by an intense femtosecond laser pulse is studied using a time-dependent Thomas-Fermi model. The resulting Bloch-like hydrodynamic equations are solved numerically using the smooth particle hydrodynamics method without the necessity of grid simulations. As follows from recent experiments, absorption of radiation and subsequent ionization of clusters observed in the short-wavelength laser frequency regime (98 nm) differs considerably from that in the optical spectral range (800 nm). Our theoretical approach provides a unified framework for treating these very different frequency regimes and allows for a deeper understanding of the underlying cluster explosionmore » mechanisms. The results of our analysis following from extensive numerical simulations presented in this paper are compared both with experimental findings and with predictions of other theoretical models.« less

  14. Spatial cluster for clustering the influence factor of birth and death child in Bogor Regency, West Java

    NASA Astrophysics Data System (ADS)

    Bekti, Rokhana Dwi; Rachmawati, Ro'fah

    2014-03-01

    The number of birth and death child is the benchmarks to determine and monitor the health and welfare in Indonesia. It can be used to identify groups of people who have a high mortality risk. Identifying group is important to compare the characteristics of human that have high and low risk. These characteristics can be seen from the factors that influenced it. Furthermore, there are factors which influence of birth and death child, such us economic, health facility, education, and others. The influence factors of every individual are different, but there are similarities some individuals which live close together or in the close locations. It means there was spatial effect. To identify group in this research, clustering is done by spatial cluster method, which is view to considering the influence of the location or the relationship between locations. One of spatial cluster method is Spatial 'K'luster Analysis by Tree Edge Removal (SKATER). The research was conducted in Bogor Regency, West Java. The goal was to get a cluster of districts based on the factors that influence birth and death child. SKATER build four number of cluster respectively consists of 26, 7, 2, and 5 districts. SKATER has good performance for clustering which include spatial effect. If it compare by other cluster method, Kmeans has good performance by MANOVA test.

  15. Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes.

    PubMed

    Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J

    2006-01-01

    One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.

  16. Implementation of the force decomposition machine for molecular dynamics simulations.

    PubMed

    Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka

    2012-09-01

    We present the design and implementation of the force decomposition machine (FDM), a cluster of personal computers (PCs) that is tailored to running molecular dynamics (MD) simulations using the distributed diagonal force decomposition (DDFD) parallelization method. The cluster interconnect architecture is optimized for the communication pattern of the DDFD method. Our implementation of the FDM relies on standard commodity components even for networking. Although the cluster is meant for DDFD MD simulations, it remains general enough for other parallel computations. An analysis of several MD simulation runs on both the FDM and a standard PC cluster demonstrates that the FDM's interconnect architecture provides a greater performance compared to a more general cluster interconnect. Copyright © 2012 Elsevier Inc. All rights reserved.

  17. Genotypic diversity of oscillatoriacean strains belonging to the genera Geitlerinema and Spirulina determined by 16S rDNA restriction analysis.

    PubMed

    Margheri, Maria C; Piccardi, Raffaella; Ventura, Stefano; Viti, Carlo; Giovannetti, Luciana

    2003-05-01

    Genotypic diversity of several cyanobacterial strains mostly isolated from marine or brackish waters, belonging to the genera Geitlerinema and Spirulina, was investigated by amplified 16S ribosomal DNA restriction analysis and compared with morphological features and response to salinity. Cluster analysis was performed on amplified 16S rDNA restriction profiles of these strains along with profiles obtained from sequence data of five Spirulina-like strains, including three representatives of the new genus Halospirulina. Our strains with tightly coiled trichomes from hypersaline waters could be assigned to the Halospirulina genus. Among the uncoiled strains, the two strains of hypersaline origin clustered together and were found to be distant from their counterparts of marine and freshwater habitat. Moreover, another cluster, formed by alkali-tolerant strains with tightly coiled trichomes, was well delineated.

  18. Study on text mining algorithm for ultrasound examination of chronic liver diseases based on spectral clustering

    NASA Astrophysics Data System (ADS)

    Chang, Bingguo; Chen, Xiaofei

    2018-05-01

    Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.

  19. A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.

    PubMed

    Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang

    2016-12-01

    This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.

  20. A Typology of Teacher-Rated Child Behavior: Revisiting Subgroups over 10 Years Later

    ERIC Educational Resources Information Center

    DiStefano, Christine A.; Kamphaus, Randy W.; Mindrila, Diana L.

    2010-01-01

    The purpose of this article was to examine a typology of child behavior using the Behavioral Assessment System for Children, Teacher Rating Scale (BASC TRS-C, 2nd edition; Reynolds & Kamphaus, 2004). The typology was compared with the solution identified from the 1992 BASC TRS-C norm dataset. Using cluster analysis, a seven-cluster solution…

  1. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.

    PubMed

    Lee, Alexandra J; Chang, Ivan; Burel, Julie G; Lindestam Arlehamn, Cecilia S; Mandava, Aishwarya; Weiskopf, Daniela; Peters, Bjoern; Sette, Alessandro; Scheuermann, Richard H; Qian, Yu

    2018-04-17

    Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and the ClusterR package. For cell population identification, DAFi supports multiple options including clustering, bisecting, slope-based gating, and reversed filtering to meet various autogating needs from different scientific use cases. © 2018 International Society for Advancement of Cytometry. © 2018 International Society for Advancement of Cytometry.

  2. Text mining to decipher free-response consumer complaints: insights from the NHTSA vehicle owner's complaint database.

    PubMed

    Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D

    2014-09-01

    This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.

  3. Peeking Network States with Clustered Patterns

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Jinoh; Sim, Alex

    2015-10-20

    Network traffic monitoring has long been a core element for effec- tive network management and security. However, it is still a chal- lenging task with a high degree of complexity for comprehensive analysis when considering multiple variables and ever-increasing traffic volumes to monitor. For example, one of the widely con- sidered approaches is to scrutinize probabilistic distributions, but it poses a scalability concern and multivariate analysis is not gen- erally supported due to the exponential increase of the complexity. In this work, we propose a novel method for network traffic moni- toring based on clustering, one of the powerful deep-learningmore » tech- niques. We show that the new approach enables us to recognize clustered results as patterns representing the network states, which can then be utilized to evaluate “similarity” of network states over time. In addition, we define a new quantitative measure for the similarity between two compared network states observed in dif- ferent time windows, as a supportive means for intuitive analysis. Finally, we demonstrate the clustering-based network monitoring with public traffic traces, and show that the proposed approach us- ing the clustering method has a great opportunity for feasible, cost- effective network monitoring.« less

  4. Machine-learned cluster identification in high-dimensional data.

    PubMed

    Ultsch, Alfred; Lötsch, Jörn

    2017-02-01

    High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  5. Finding approximate gene clusters with Gecko 3.

    PubMed

    Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian

    2016-11-16

    Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Identification of clusters of individuals relevant to temporomandibular disorders and other chronic pain conditions: the OPPERA study

    PubMed Central

    Bair, Eric; Gaynor, Sheila; Slade, Gary D.; Ohrbach, Richard; Fillingim, Roger B.; Greenspan, Joel D.; Dubner, Ronald; Smith, Shad B.; Diatchenko, Luda; Maixner, William

    2016-01-01

    The classification of most chronic pain disorders gives emphasis to anatomical location of the pain to distinguish one disorder from the other (eg, back pain vs temporomandibular disorder [TMD]) or to define subtypes (eg, TMD myalgia vs arthralgia). However, anatomical criteria overlook etiology, potentially hampering treatment decisions. This study identified clusters of individuals using a comprehensive array of biopsychosocial measures. Data were collected from a case–control study of 1031 chronic TMD cases and 3247 TMD-free controls. Three subgroups were identified using supervised cluster analysis (referred to as the adaptive, pain-sensitive, and global symptoms clusters). Compared with the adaptive cluster, participants in the pain-sensitive cluster showed heightened sensitivity to experimental pain, and participants in the global symptoms cluster showed both greater pain sensitivity and greater psychological distress. Cluster membership was strongly associated with chronic TMD: 91.5% of TMD cases belonged to the pain-sensitive and global symptoms clusters, whereas 41.2% of controls belonged to the adaptive cluster. Temporomandibular disorder cases in the pain-sensitive and global symptoms clusters also showed greater pain intensity, jaw functional limitation, and more comorbid pain conditions. Similar results were obtained when the same methodology was applied to a smaller case–control study consisting of 199 chronic TMD cases and 201 TMD-free controls. During a median 3-year follow-up period of TMD-free individuals, participants in the global symptoms cluster had greater risk of developing first-onset TMD (hazard ratio = 2.8) compared with participants in the other 2 clusters. Cross-cohort predictive modeling was used to demonstrate the reliability of the clusters. PMID:26928952

  7. Generalization of Clustering Coefficients to Signed Correlation Networks

    PubMed Central

    Costantini, Giulio; Perugini, Marco

    2014-01-01

    The recent interest in network analysis applications in personality psychology and psychopathology has put forward new methodological challenges. Personality and psychopathology networks are typically based on correlation matrices and therefore include both positive and negative edge signs. However, some applications of network analysis disregard negative edges, such as computing clustering coefficients. In this contribution, we illustrate the importance of the distinction between positive and negative edges in networks based on correlation matrices. The clustering coefficient is generalized to signed correlation networks: three new indices are introduced that take edge signs into account, each derived from an existing and widely used formula. The performances of the new indices are illustrated and compared with the performances of the unsigned indices, both on a signed simulated network and on a signed network based on actual personality psychology data. The results show that the new indices are more resistant to sample variations in correlation networks and therefore have higher convergence compared with the unsigned indices both in simulated networks and with real data. PMID:24586367

  8. Bayesian multivariate hierarchical transformation models for ROC analysis.

    PubMed

    O'Malley, A James; Zou, Kelly H

    2006-02-15

    A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.

  9. Bayesian multivariate hierarchical transformation models for ROC analysis

    PubMed Central

    O'Malley, A. James; Zou, Kelly H.

    2006-01-01

    SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836

  10. Coronal Mass Ejection Data Clustering and Visualization of Decision Trees

    NASA Astrophysics Data System (ADS)

    Ma, Ruizhe; Angryk, Rafal A.; Riley, Pete; Filali Boubrahimi, Soukaina

    2018-05-01

    Coronal mass ejections (CMEs) can be categorized as either “magnetic clouds” (MCs) or non-MCs. Features such as a large magnetic field, low plasma-beta, and low proton temperature suggest that a CME event is also an MC event; however, so far there is neither a definitive method nor an automatic process to distinguish the two. Human labeling is time-consuming, and results can fluctuate owing to the imprecise definition of such events. In this study, we approach the problem of MC and non-MC distinction from a time series data analysis perspective and show how clustering can shed some light on this problem. Although many algorithms exist for traditional data clustering in the Euclidean space, they are not well suited for time series data. Problems such as inadequate distance measure, inaccurate cluster center description, and lack of intuitive cluster representations need to be addressed for effective time series clustering. Our data analysis in this work is twofold: clustering and visualization. For clustering we compared the results from the popular hierarchical agglomerative clustering technique to a distance density clustering heuristic we developed previously for time series data clustering. In both cases, dynamic time warping will be used for similarity measure. For classification as well as visualization, we use decision trees to aggregate single-dimensional clustering results to form a multidimensional time series decision tree, with averaged time series to present each decision. In this study, we achieved modest accuracy and, more importantly, an intuitive interpretation of how different parameters contribute to an MC event.

  11. A user credit assessment model based on clustering ensemble for broadband network new media service supervision

    NASA Astrophysics Data System (ADS)

    Liu, Fang; Cao, San-xing; Lu, Rui

    2012-04-01

    This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.

  12. Testing feedback message framing and comparators to address prescribing of high-risk medications in nursing homes: protocol for a pragmatic, factorial, cluster-randomized trial.

    PubMed

    Ivers, Noah M; Desveaux, Laura; Presseau, Justin; Reis, Catherine; Witteman, Holly O; Taljaard, Monica K; McCleary, Nicola; Thavorn, Kednapa; Grimshaw, Jeremy M

    2017-07-14

    Audit and feedback (AF) interventions that leverage routine administrative data offer a scalable and relatively low-cost method to improve processes of care. AF interventions are usually designed to highlight discrepancies between desired and actual performance and to encourage recipients to act to address such discrepancies. Comparing to a regional average is a common approach, but more recipients would have a discrepancy if compared to a higher-than-average level of performance. In addition, how recipients perceive and respond to discrepancies may depend on how the feedback itself is framed. We aim to evaluate the effectiveness of different comparators and framing in feedback on high-risk prescribing in nursing homes. This is a pragmatic, 2 × 2 factorial, cluster-randomized controlled trial testing variations in the comparator and framing on the effectiveness of quarterly AF in changing high-risk prescribing in nursing homes in Ontario, Canada. We grouped homes that share physicians into clusters and randomized these clusters into the four experimental conditions. Outcomes will be assessed after 6 months; all primary analyses will be by intention-to-treat. The primary outcome (monthly number of high-risk medications received by each patient) will be analysed using a general linear mixed effects regression model. We will present both four-arm and factorial analyses. With 160 clusters and an average of 350 beds per cluster, assuming no interaction and similar effects for each intervention, we anticipate 90% power to detect an absolute mean difference of 0.3 high-risk medications prescribed. A mixed-methods process evaluation will explore potential mechanisms underlying the observed effects, exploring targeted constructs including intention, self-efficacy, outcome expectations, descriptive norms, and goal prioritization. An economic analysis will examine cost-effectiveness analysis from the perspective of the publicly funded health care system. This protocol describes the rationale and methodology of a trial testing manipulations of theory-informed components of an audit and feedback intervention to determine how to improve an existing intervention and provide generalizable insights for implementation science. NCT02979964.

  13. Chaotic map clustering algorithm for EEG analysis

    NASA Astrophysics Data System (ADS)

    Bellotti, R.; De Carlo, F.; Stramaglia, S.

    2004-03-01

    The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.

  14. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis

    PubMed Central

    Ji, Zhicheng; Ji, Hongkai

    2016-01-01

    When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. PMID:27179027

  15. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.

    PubMed

    Ji, Zhicheng; Ji, Hongkai

    2016-07-27

    When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. [Visual field progression in glaucoma: cluster analysis].

    PubMed

    Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M

    2012-11-01

    Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

  17. Diffusion of oxygen interstitials in UO2+x using kinetic Monte Carlo simulations: Role of O/M ratio and sensitivity analysis

    NASA Astrophysics Data System (ADS)

    Behera, Rakesh K.; Watanabe, Taku; Andersson, David A.; Uberuaga, Blas P.; Deo, Chaitanya S.

    2016-04-01

    Oxygen interstitials in UO2+x significantly affect the thermophysical properties and microstructural evolution of the oxide nuclear fuel. In hyperstoichiometric Urania (UO2+x), these oxygen interstitials form different types of defect clusters, which have different migration behavior. In this study we have used kinetic Monte Carlo (kMC) to evaluate diffusivities of oxygen interstitials accounting for mono- and di-interstitial clusters. Our results indicate that the predicted diffusivities increase significantly at higher non-stoichiometry (x > 0.01) for di-interstitial clusters compared to a mono-interstitial only model. The diffusivities calculated at higher temperatures compare better with experimental values than at lower temperatures (< 973 K). We have discussed the resulting activation energies achieved for diffusion with all the mono- and di-interstitial models. We have carefully performed sensitivity analysis to estimate the effect of input di-interstitial binding energies on the predicted diffusivities and activation energies. While this article only discusses mono- and di-interstitials in evaluating oxygen diffusion response in UO2+x, future improvements to the model will primarily focus on including energetic definitions of larger stable interstitial clusters reported in the literature. The addition of larger clusters to the kMC model is expected to improve the comparison of oxygen transport in UO2+x with experiment.

  18. Social phobia subtypes in the general population revealed by cluster analysis.

    PubMed

    Furmark, T; Tillfors, M; Stattin, H; Ekselius, L; Fredrikson, M

    2000-11-01

    Epidemiological data on subtypes of social phobia are scarce and their defining features are debated. Hence, the present study explored the prevalence and descriptive characteristics of empirically derived social phobia subgroups in the general population. To reveal subtypes, data on social distress, functional impairment, number of social fears and criteria fulfilled for avoidant personality disorder were extracted from a previously published epidemiological study of 188 social phobics and entered into an hierarchical cluster analysis. Criterion validity was evaluated by comparing clusters on the Social Phobia Scale (SPS) and the Social Interaction Anxiety Scale (SIAS). Finally, profile analyses were performed in which clusters were compared on a set of sociodemographic and descriptive characteristics. Three clusters emerged, consisting of phobics scoring either high (generalized subtype), intermediate (non-generalized subtype) or low (discrete subtype) on all variables. Point prevalence rates were 2.0%, 5.9% and 7.7% respectively. All subtypes were distinguished on both SPS and SIAS. Generalized or severe social phobia tended to be over-represented among individuals with low levels of educational attainment and social support. Overall, public-speaking was the most common fear. Although categorical distinctions may be used, the present data suggest that social phobia subtypes in the general population mainly differ dimensionally along a mild moderate-severe continuum, and that the number of cases declines with increasing severity.

  19. Coresets vs clustering: comparison of methods for redundancy reduction in very large white matter fiber sets

    NASA Astrophysics Data System (ADS)

    Alexandroni, Guy; Zimmerman Moreno, Gali; Sochen, Nir; Greenspan, Hayit

    2016-03-01

    Recent advances in Diffusion Weighted Magnetic Resonance Imaging (DW-MRI) of white matter in conjunction with improved tractography produce impressive reconstructions of White Matter (WM) pathways. These pathways (fiber sets) often contain hundreds of thousands of fibers, or more. In order to make fiber based analysis more practical, the fiber set needs to be preprocessed to eliminate redundancies and to keep only essential representative fibers. In this paper we demonstrate and compare two distinctive frameworks for selecting this reduced set of fibers. The first framework entails pre-clustering the fibers using k-means, followed by Hierarchical Clustering and replacing each cluster with one representative. For the second clustering stage seven distance metrics were evaluated. The second framework is based on an efficient geometric approximation paradigm named coresets. Coresets present a new approach to optimization and have huge success especially in tasks requiring large computation time and/or memory. We propose a modified version of the coresets algorithm, Density Coreset. It is used for extracting the main fibers from dense datasets, leaving a small set that represents the main structures and connectivity of the brain. A novel approach, based on a 3D indicator structure, is used for comparing the frameworks. This comparison was applied to High Angular Resolution Diffusion Imaging (HARDI) scans of 4 healthy individuals. We show that among the clustering based methods, that cosine distance gives the best performance. In comparing the clustering schemes with coresets, Density Coreset method achieves the best performance.

  20. A weak lensing analysis of the PLCK G100.2-30.4 cluster

    NASA Astrophysics Data System (ADS)

    Radovich, M.; Formicola, I.; Meneghetti, M.; Bartalucci, I.; Bourdin, H.; Mazzotta, P.; Moscardini, L.; Ettori, S.; Arnaud, M.; Pratt, G. W.; Aghanim, N.; Dahle, H.; Douspis, M.; Pointecouteau, E.; Grado, A.

    2015-07-01

    We present a mass estimate of the Planck-discovered cluster PLCK G100.2-30.4, derived from a weak lensing analysis of deep Subaru griz images. We perform a careful selection of the background galaxies using the multi-band imaging data, and undertake the weak lensing analysis on the deep (1 h) r -band image. The shape measurement is based on the Kaiser-Squires-Broadhurst algorithm; we adopt the PSFex software to model the point spread function (PSF) across the field and correct for this in the shape measurement. The weak lensing analysis is validated through extensive image simulations. We compare the resulting weak lensing mass profile and total mass estimate to those obtained from our re-analysis of XMM-Newton observations, derived under the hypothesis of hydrostatic equilibrium. The total integrated mass profiles agree remarkably well, within 1σ across their common radial range. A mass M500 ~ 7 × 1014M⊙ is derived for the cluster from our weak lensing analysis. Comparing this value to that obtained from our reanalysis of XMM-Newton data, we obtain a bias factor of (1-b) = 0.8 ± 0.1. This is compatible within 1σ with the value of (1-b) obtained in Planck 2015 from the calibration of the bias factor using newly available weak lensing reconstructed masses. Based on data collected at Subaru Telescope (University of Tokyo).

  1. Extended phenotype and clinical subgroups in unilateral Meniere disease: A cross-sectional study with cluster analysis.

    PubMed

    Frejo, L; Martin-Sanz, E; Teggi, R; Trinidad, G; Soto-Varela, A; Santos-Perez, S; Manrique, R; Perez, N; Aran, I; Almeida-Branco, M S; Batuecas-Caletrio, A; Fraile, J; Espinosa-Sanchez, J M; Perez-Guillen, V; Perez-Garrigues, H; Oliva-Dominguez, M; Aleman, O; Benitez, J; Perez, P; Lopez-Escamez, J A

    2017-12-01

    To define clinical subgroups by cluster analysis in patients with unilateral Meniere disease (MD) and to compare them with the clinical subgroups found in bilateral MD. A cross-sectional study with a two-step cluster analysis. A tertiary referral multicenter study. Nine hundred and eighty-eight adult patients with unilateral MD. best predictors to define clinical subgroups with potential different aetiologies. We established five clusters in unilateral MD. Group 1 is the most frequently found, includes 53% of patients, and it is defined as the sporadic, classic MD without migraine and without autoimmune disorder (AD). Group 2 is found in 8% of patients, and it is defined by hearing loss, which antedates the vertigo episodes by months or years (delayed MD), without migraine or AD in most of cases. Group 3 involves 13% of patients, and it is considered familial MD, while group 4, which includes 15% of patients, is linked to the presence of migraine in all cases. Group 5 is found in 11% of patients and is defined by a comorbid AD. We found significant differences in the distribution of AD in clusters 3, 4 and 5 between patients with uni- and bilateral MD. Cluster analysis defines clinical subgroups in MD, and it extends the phenotype beyond audiovestibular symptoms. This classification will help to improve the phenotyping in MD and facilitate the selection of patients for randomised clinical trials. © 2017 John Wiley & Sons Ltd.

  2. Defining objective clusters for rabies virus sequences using affinity propagation clustering

    PubMed Central

    Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

    2018-01-01

    Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361

  3. SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.

    PubMed

    Nepusz, Tamás; Sasidharan, Rajkumar; Paccanaro, Alberto

    2010-03-09

    An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences). Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at http://www.paccanarolab.org/software/scps.

  4. Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

    PubMed

    Xiao, Yongling; Abrahamowicz, Michal

    2010-03-30

    We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.

  5. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value depends on the threshold which helps to understand the time pattern of the studied events. Our findings detected the presence of overdensity of events in particular time periods and showed that the forest fire sequences in Portugal can be considered as a multifractal process with a degree of time-clustering of the events. Key words: time sequences, Morisita index, fractals, multifractals, box-counting, Ripley's K-function, Allan Factor, variography, forest fires, point process. Acknowledgements This work was partly supported by the SNFS Project No. 200021-140658, "Analysis and Modelling of Space-Time Patterns in Complex Regions". References - Kanevski M. (Editor). 2008. Advanced Mapping of Environmental Data: Geostatistics, Machine Learning and Bayesian Maximum Entropy. London / Hoboken: iSTE / Wiley. - Telesca L. and Pereira M.G. 2010. Time-clustering investigation of fire temporal fluctuations in Portugal, Nat. Hazards Earth Syst. Sci., vol. 10(4): 661-666. - Vega Orozco C., Tonini M., Conedera M., Kanevski M. (2012) Cluster recognition in spatial-temporal sequences: the case of forest fires, Geoinformatica, vol. 16(4): 653-673.

  6. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    NASA Astrophysics Data System (ADS)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic catalog into background seismicity and individual sequences of earthquake clusters, also in areas characterized by moderate seismic activity, where the standard declustering techniques may turn out rather gross approximations. With these results acquired, the main statistical features of seismic clusters are explored, including complex interdependence of related events, with the aim to characterize the space-time patterns of earthquakes occurrence in North-Eastern Italy and capture their basic differences with Central Italy sequences.

  7. Track structure in radiation biology: theory and applications.

    PubMed

    Nikjoo, H; Uehara, S; Wilson, W E; Hoshi, M; Goodhead, D T

    1998-04-01

    A brief review is presented of the basic concepts in track structure and the relative merit of various theoretical approaches adopted in Monte-Carlo track-structure codes are examined. In the second part of the paper, a formal cluster analysis is introduced to calculate cluster-distance distributions. Total experimental ionization cross-sections were least-square fitted and compared with the calculation by various theoretical methods. Monte-Carlo track-structure code Kurbuc was used to examine and compare the spectrum of the secondary electrons generated by using functions given by Born-Bethe, Jain-Khare, Gryzinsky, Kim-Rudd, Mott and Vriens' theories. The cluster analysis in track structure was carried out using the k-means method and Hartigan algorithm. Data are presented on experimental and calculated total ionization cross-sections: inverse mean free path (IMFP) as a function of electron energy used in Monte-Carlo track-structure codes; the spectrum of secondary electrons generated by different functions for 500 eV primary electrons; cluster analysis for 4 MeV and 20 MeV alpha-particles in terms of the frequency of total cluster energy to the root-mean-square (rms) radius of the cluster and differential distance distributions for a pair of clusters; and finally relative frequency distribution for energy deposited in DNA, single-strand break and double-strand breaks for 10MeV/u protons, alpha-particles and carbon ions. There are a number of Monte-Carlo track-structure codes that have been developed independently and the bench-marking presented in this paper allows a better choice of the theoretical method adopted in a track-structure code to be made. A systematic bench-marking of cross-sections and spectra of the secondary electrons shows differences between the codes at atomic level, but such differences are not significant in biophysical modelling at the macromolecular level. Clustered-damage evaluation shows: that a substantial proportion of dose ( 30%) is deposited by low-energy electrons; the majority of DNA damage lesions are of simple type; the complexity of damage increases with increased LET, while the total yield of strand breaks remains constant; and at high LET values nearly 70% of all double-strand breaks are of complex type.

  8. Identification of Hard X-ray Sources in Galactic Globular Clusters: Simbol-X Simulations

    NASA Astrophysics Data System (ADS)

    Servillat, M.

    2009-05-01

    Globular clusters harbour an excess of X-ray sources compared to the number of X-ray sources in the Galactic plane. It has been proposed that many of these X-ray sources are cataclysmic variables that have an intermediate magnetic field, i.e. intermediate polars, which remains to be confirmed and understood. We present here several methods to identify intermediate polars in globular clusters from multiwavelength analysis. First, we report on XMM-Newton, Chandra and HST observations of the very dense Galactic globular cluster NGC 2808. By comparing UV and X-ray properties of the cataclysmic variable candidates, the fraction of intermediate polars in this cluster can be estimated. We also present the optical spectra of two cataclysmic variables in the globular cluster M 22. The HeII (4868 Å) emission line in these spectra could be related to the presence of a magnetic field in these objects. Simulations of Simbol-X observations indicate that the angular resolution is sufficient to study X-ray sources in the core of close, less dense globular clusters, such as M 22. The sensitivity of Simbol-X in an extended energy band up to 80 keV will allow us to discriminate between hard X-ray sources (such as magnetic cataclysmic variables) and soft X-ray sources (such as chromospherically active binaries).

  9. Cognitive profiles in euthymic patients with bipolar disorders: results from the FACE-BD cohort.

    PubMed

    Roux, Paul; Raust, Aurélie; Cannavo, Anne Sophie; Aubin, Valérie; Aouizerate, Bruno; Azorin, Jean-Michel; Bellivier, Frank; Belzeaux, Raoul; Bougerol, Thierry; Cussac, Iréna; Courtet, Philippe; Etain, Bruno; Gard, Sébastien; Job, Sophie; Kahn, Jean-Pierre; Leboyer, Marion; Olié, Emilie; Henry, Chantal; Passerieux, Christine

    2017-03-01

    Although cognitive deficits are a well-established feature of bipolar disorders (BD), even during periods of euthymia, little is known about cognitive phenotype heterogeneity among patients with BD. We investigated neuropsychological performance in 258 euthymic patients with BD recruited via the French network of expert centers for BD. We used a test battery assessing six domains of cognition. Hierarchical cluster analysis of the cross-sectional data was used to determine the optimal number of subgroups and to assign each patient to a specific cognitive cluster. Subsequently, subjects from each cluster were compared on demographic, clinical functioning, and pharmacological variables. A four-cluster solution was identified. The global cognitive performance was above normal in one cluster and below normal in another. The other two clusters had a near-normal cognitive performance, with above and below average verbal memory, respectively. Among the four clusters, significant differences were observed in estimated intelligence quotient and social functioning, which were lower for the low cognitive performers compared to the high cognitive performers. These results confirm the existence of several distinct cognitive profiles in BD. Identification of these profiles may help to develop profile-specific cognitive remediation programs, which might improve functioning in BD. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  10. Choosing appropriate analysis methods for cluster randomised cross-over trials with a binary outcome.

    PubMed

    Morgan, Katy E; Forbes, Andrew B; Keogh, Ruth H; Jairath, Vipul; Kahan, Brennan C

    2017-01-30

    In cluster randomised cross-over (CRXO) trials, clusters receive multiple treatments in a randomised sequence over time. In such trials, there is usual correlation between patients in the same cluster. In addition, within a cluster, patients in the same period may be more similar to each other than to patients in other periods. We demonstrate that it is necessary to account for these correlations in the analysis to obtain correct Type I error rates. We then use simulation to compare different methods of analysing a binary outcome from a two-period CRXO design. Our simulations demonstrated that hierarchical models without random effects for period-within-cluster, which do not account for any extra within-period correlation, performed poorly with greatly inflated Type I errors in many scenarios. In scenarios where extra within-period correlation was present, a hierarchical model with random effects for cluster and period-within-cluster only had correct Type I errors when there were large numbers of clusters; with small numbers of clusters, the error rate was inflated. We also found that generalised estimating equations did not give correct error rates in any scenarios considered. An unweighted cluster-level summary regression performed best overall, maintaining an error rate close to 5% for all scenarios, although it lost power when extra within-period correlation was present, especially for small numbers of clusters. Results from our simulation study show that it is important to model both levels of clustering in CRXO trials, and that any extra within-period correlation should be accounted for. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  11. A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

    PubMed Central

    Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven

    2017-01-01

    Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313

  12. Untangling Magmatic Processes and Hydrothermal Alteration of in situ Superfast Spreading Ocean Crust at ODP/IODP Site 1256 with Fuzzy c-means Cluster Analysis of Rock Magnetic Properties

    NASA Astrophysics Data System (ADS)

    Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.

    2014-12-01

    Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.

  13. Geometric, electronic, and bonding properties of AuNM (N = 1-7, M = Ni, Pd, Pt) clusters.

    PubMed

    Yuan, D W; Wang, Yang; Zeng, Zhi

    2005-03-15

    Employing first-principles methods, based on density functional theory, we report the ground state geometric and electronic structures of gold clusters doped with platinum group atoms, Au(N)M (N = 1-7, M = Ni, Pd, Pt). The stability and electronic properties of Ni-doped gold clusters are similar to that of pure gold clusters with an enhancement of bond strength. Due to the strong d-d or s-d interplay between impurities and gold atoms originating in the relativistic effects and unique properties of dopant delocalized s-electrons in Pd- and Pt-doped gold clusters, the dopant atoms markedly change the geometric and electronic properties of gold clusters, and stronger bond energies are found in Pt-doped clusters. The Mulliken populations analysis of impurities and detailed decompositions of bond energies as well as a variety of density of states of the most stable dopant gold clusters are given to understand the different effects of individual dopant atom on bonding and electronic properties of dopant gold clusters. From the electronic properties of dopant gold clusters, the different chemical reactivity toward O(2), CO, or NO molecule is predicted in transition metal-doped gold clusters compared to pure gold clusters.

  14. A critical analysis of high-redshift, massive, galaxy clusters. Part I

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoyle, Ben; Jimenez, Raul; Verde, Licia

    2012-02-01

    We critically investigate current statistical tests applied to high redshift clusters of galaxies in order to test the standard cosmological model and describe their range of validity. We carefully compare a sample of high-redshift, massive, galaxy clusters with realistic Poisson sample simulations of the theoretical mass function, which include the effect of Eddington bias. We compare the observations and simulations using the following statistical tests: the distributions of ensemble and individual existence probabilities (in the > M, > z sense), the redshift distributions, and the 2d Kolmogorov-Smirnov test. Using seemingly rare clusters from Hoyle et al. (2011), and Jee etmore » al. (2011) and assuming the same survey geometry as in Jee et al. (2011, which is less conservative than Hoyle et al. 2011), we find that the ( > M, > z) existence probabilities of all clusters are fully consistent with ΛCDM. However assuming the same survey geometry, we use the 2d K-S test probability to show that the observed clusters are not consistent with being the least probable clusters from simulations at > 95% confidence, and are also not consistent with being a random selection of clusters, which may be caused by the non-trivial selection function and survey geometry. Tension can be removed if we examine only a X-ray selected sub sample, with simulations performed assuming a modified survey geometry.« less

  15. Tidal radii and destruction rates of globular clusters in the Milky Way due to bulge-bar and disk shocking

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moreno, Edmundo; Pichardo, Bárbara; Velázquez, Héctor

    2014-10-01

    We calculate orbits, tidal radii, and bulge-bar and disk shocking destruction rates for 63 globular clusters in our Galaxy. Orbits are integrated in both an axisymmetric and a nonaxisymmetric Galactic potential that includes a bar and a three-dimensional model for the spiral arms. With the use of a Monte Carlo scheme, we consider in our simulations observational uncertainties in the kinematical data of the clusters. In the analysis of destruction rates due to the bulge-bar, we consider the rigorous treatment of using the real Galactic cluster orbit instead of the usual linear trajectory employed in previous studies. We compare resultsmore » in both treatments. We find that the theoretical tidal radius computed in the nonaxisymmetric Galactic potential compares better with the observed tidal radius than that obtained in the axisymmetric potential. In both Galactic potentials, bulge-shocking destruction rates computed with a linear trajectory of a cluster at its perigalacticons give a good approximation of the result obtained with the real trajectory of the cluster. Bulge-shocking destruction rates for clusters with perigalacticons in the inner Galactic region are smaller in the nonaxisymmetric potential than those in the axisymmetric potential. For the majority of clusters with high orbital eccentricities (e > 0.5), their total bulge+disk destruction rates are smaller in the nonaxisymmetric potential.« less

  16. Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats.

    PubMed

    Seman, Ali; Sapawi, Azizian Mohd; Salleh, Mohd Zaki

    2015-06-01

    Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data.

  17. Rapid identification of Enterobacter hormaechei and Enterobacter cloacae genetic cluster III.

    PubMed

    Ohad, S; Block, C; Kravitz, V; Farber, A; Pilo, S; Breuer, R; Rorman, E

    2014-05-01

    Enterobacter cloacae complex bacteria are of both clinical and environmental importance. Phenotypic methods are unable to distinguish between some of the species in this complex, which often renders their identification incomplete. The goal of this study was to develop molecular assays to identify Enterobacter hormaechei and Ent. cloacae genetic cluster III which are relatively frequently encountered in clinical material. The molecular assays developed in this study are qPCR technology based and served to identify both Ent. hormaechei and Ent. cloacae genetic cluster III. qPCR results were compared to hsp60 sequence analysis. Most clinical isolates were assigned to Ent. hormaechei subsp. steigerwaltii and Ent. cloacae genetic cluster III. The latter was proportionately more frequently isolated from bloodstream infections than from other material (P < 0·05). The qPCR assays detecting Ent. hormaechei and Ent. cloacae genetic cluster III demonstrated high sensitivity and specificity. The presented qPCR assays allow accurate and rapid identification of clinical isolates of the Ent. cloacae complex. The improved identifications obtained can specifically assist analysis of Ent. hormaechei and Ent. cloacae genetic cluster III in nosocomial outbreaks and can promote rapid environmental monitoring. An association was observed between Ent. cloacae cluster III and systemic infection that deserves further attention. © 2014 The Society for Applied Microbiology.

  18. Clustering analysis of water distribution systems: identifying critical components and community impacts.

    PubMed

    Diao, K; Farmani, R; Fu, G; Astaraie-Imani, M; Ward, S; Butler, D

    2014-01-01

    Large water distribution systems (WDSs) are networks with both topological and behavioural complexity. Thereby, it is usually difficult to identify the key features of the properties of the system, and subsequently all the critical components within the system for a given purpose of design or control. One way is, however, to more explicitly visualize the network structure and interactions between components by dividing a WDS into a number of clusters (subsystems). Accordingly, this paper introduces a clustering strategy that decomposes WDSs into clusters with stronger internal connections than external connections. The detected cluster layout is very similar to the community structure of the served urban area. As WDSs may expand along with urban development in a community-by-community manner, the correspondingly formed distribution clusters may reveal some crucial configurations of WDSs. For verification, the method is applied to identify all the critical links during firefighting for the vulnerability analysis of a real-world WDS. Moreover, both the most critical pipes and clusters are addressed, given the consequences of pipe failure. Compared with the enumeration method, the method used in this study identifies the same group of the most critical components, and provides similar criticality prioritizations of them in a more computationally efficient time.

  19. Space-Time Cluster Analysis to Detect Innovative Clinical Practices: A Case Study of Aripiprazole in the Department of Veterans Affairs.

    PubMed

    Penfold, Robert B; Burgess, James F; Lee, Austin F; Li, Mingfei; Miller, Christopher J; Nealon Seibert, Marjorie; Semla, Todd P; Mohr, David C; Kazis, Lewis E; Bauer, Mark S

    2018-02-01

    To identify space-time clusters of changes in prescribing aripiprazole for bipolar disorder among providers in the VA. VA administrative data from 2002 to 2010 were used to identify prescriptions of aripiprazole for bipolar disorder. Prescriber characteristics were obtained using the Personnel and Accounting Integrated Database. We conducted a retrospective space-time cluster analysis using the space-time permutation statistic. All VA service users with a diagnosis of bipolar disorder were included in the patient population. Individuals with any schizophrenia spectrum diagnoses were excluded. We also identified all clinicians who wrote a prescription for any bipolar disorder medication. The study population included 32,630 prescribers. Of these, 8,643 wrote qualifying prescriptions. We identified three clusters of aripiprazole prescribing centered in Massachusetts, Ohio, and the Pacific Northwest. Clusters were associated with prescribing by VA-employed (vs. contracted) prescribers. Nurses with prescribing privileges were more likely to make a prescription for aripiprazole in cluster locations compared with psychiatrists. Primary care physicians were less likely. Early prescribing of aripiprazole for bipolar disorder clustered geographically and was associated with prescriber subgroups. These methods support prospective surveillance of practice changes and identification of associated health system characteristics. © Health Research and Educational Trust.

  20. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less

  1. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGES

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less

  2. Comparative 1H NMR Metabolomic Urinalysis of People Diagnosed with Balkan Endemic Nephropathy, and Healthy Subjects, in Romania and Bulgaria: A Pilot Study

    PubMed Central

    Mantle, Peter; Modalca, Mirela; Nicholls, Andrew; Tatu, Calin; Tatu, Diana; Toncheva, Draga

    2011-01-01

    1H NMR spectroscopy of urine has been applied to exploring metabolomic differences between people diagnosed with Balkan endemic nephropathy (BEN), and treated by haemodialysis, and those without overt renal disease in Romania and Bulgaria. Convenience sampling was made from patients receiving haemodialysis in hospital and healthy controls in their village. Principal component analysis clustered healthy controls from both countries together. Bulgarian BEN patients clustered separately from controls, though in the same space. However, Romanian BEN patients not only also clustered away from controls but also clustered separately from the BEN patients in Bulgaria. Notably, the urinary metabolomic data of two people sampled as Romanian controls clustered within the Romanian BEN group. One of these had been suspected of incipient symptoms of BEN at the time of selection as a ‘healthy’ control. This implies, at first sight, that metabolomic analysis can be predictive of impending morbidity before conventional criteria can diagnose BEN. Separate clustering of BEN patients from Romania and Bulgaria could indicate difference in aetiology of this particular silent renal atrophy in different geographic foci across the Balkans. PMID:22069742

  3. Metal mixtures in urban and rural populations in the US: The Multi-Ethnic Study of Atherosclerosis and the Strong Heart Study.

    PubMed

    Pang, Yuanjie; Peng, Roger D; Jones, Miranda R; Francesconi, Kevin A; Goessler, Walter; Howard, Barbara V; Umans, Jason G; Best, Lyle G; Guallar, Eliseo; Post, Wendy S; Kaufman, Joel D; Vaidya, Dhananjay; Navas-Acien, Ana

    2016-05-01

    Natural and anthropogenic sources of metal exposure differ for urban and rural residents. We searched to identify patterns of metal mixtures which could suggest common environmental sources and/or metabolic pathways of different urinary metals, and compared metal-mixtures in two population-based studies from urban/sub-urban and rural/town areas in the US: the Multi-Ethnic Study of Atherosclerosis (MESA) and the Strong Heart Study (SHS). We studied a random sample of 308 White, Black, Chinese-American, and Hispanic participants in MESA (2000-2002) and 277 American Indian participants in SHS (1998-2003). We used principal component analysis (PCA), cluster analysis (CA), and linear discriminant analysis (LDA) to evaluate nine urinary metals (antimony [Sb], arsenic [As], cadmium [Cd], lead [Pb], molybdenum [Mo], selenium [Se], tungsten [W], uranium [U] and zinc [Zn]). For arsenic, we used the sum of inorganic and methylated species (∑As). All nine urinary metals were higher in SHS compared to MESA participants. PCA and CA revealed the same patterns in SHS, suggesting 4 distinct principal components (PC) or clusters (∑As-U-W, Pb-Sb, Cd-Zn, Mo-Se). In MESA, CA showed 2 large clusters (∑As-Mo-Sb-U-W, Cd-Pb-Se-Zn), while PCA showed 4 PCs (Sb-U-W, Pb-Se-Zn, Cd-Mo, ∑As). LDA indicated that ∑As, U, W, and Zn were the most discriminant variables distinguishing MESA and SHS participants. In SHS, the ∑As-U-W cluster and PC might reflect groundwater contamination in rural areas, and the Cd-Zn cluster and PC could reflect common sources from meat products or metabolic interactions. Among the metals assayed, ∑As, U, W and Zn differed the most between MESA and SHS, possibly reflecting disproportionate exposure from drinking water and perhaps food in rural Native communities compared to urban communities around the US. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. The `TTIME' Package: Performance Evaluation in a Cluster Computing Environment

    NASA Astrophysics Data System (ADS)

    Howe, Marico; Berleant, Daniel; Everett, Albert

    2011-06-01

    The objective of translating developmental event time across mammalian species is to gain an understanding of the timing of human developmental events based on known time of those events in animals. The potential benefits include improvements to diagnostic and intervention capabilities. The CRAN `ttime' package provides the functionality to infer unknown event timings and investigate phylogenetic proximity utilizing hierarchical clustering of both known and predicted event timings. The original generic mammalian model included nine eutherian mammals: Felis domestica (cat), Mustela putorius furo (ferret), Mesocricetus auratus (hamster), Macaca mulatta (monkey), Homo sapiens (humans), Mus musculus (mouse), Oryctolagus cuniculus (rabbit), Rattus norvegicus (rat), and Acomys cahirinus (spiny mouse). However, the data for this model is expected to grow as more data about developmental events is identified and incorporated into the analysis. Performance evaluation of the `ttime' package across a cluster computing environment versus a comparative analysis in a serial computing environment provides an important computational performance assessment. A theoretical analysis is the first stage of a process in which the second stage, if justified by the theoretical analysis, is to investigate an actual implementation of the `ttime' package in a cluster computing environment and to understand the parallelization process that underlies implementation.

  5. Asian Students' Conceptions of Future Civic Engagement: Comparing Clusters Using Person-Centered Analysis

    ERIC Educational Resources Information Center

    Chow, Joseph Kui Foon; Kennedy, Kerry J.

    2015-01-01

    Researchers in comparative education have suggested different ways in which their field of study can be enhanced by considering units of analysis at different levels rather than focusing on a single level such as the nation-state (Bray and Thomas, 1995; Torney-Purta and Barber, 2011). The study reported here seeks to contribute to this area of…

  6. Clustering of financial time series with application to index and enhanced index tracking portfolio

    NASA Astrophysics Data System (ADS)

    Dose, Christian; Cincotti, Silvano

    2005-09-01

    A stochastic-optimization technique based on time series cluster analysis is described for index tracking and enhanced index tracking problems. Our methodology solves the problem in two steps, i.e., by first selecting a subset of stocks and then setting the weight of each stock as a result of an optimization process (asset allocation). Present formulation takes into account constraints on the number of stocks and on the fraction of capital invested in each of them, whilst not including transaction costs. Computational results based on clustering selection are compared to those of random techniques and show the importance of clustering in noise reduction and robust forecasting applications, in particular for enhanced index tracking.

  7. BioTextQuest: a web-based biomedical text mining suite for concept discovery.

    PubMed

    Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J

    2011-12-01

    BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.

  8. Cancer Transcriptome Dataset Analysis: Comparing Methods of Pathway and Gene Regulatory Network-Based Cluster Identification.

    PubMed

    Nam, Seungyoon

    2017-04-01

    Cancer transcriptome analysis is one of the leading areas of Big Data science, biomarker, and pharmaceutical discovery, not to forget personalized medicine. Yet, cancer transcriptomics and postgenomic medicine require innovation in bioinformatics as well as comparison of the performance of available algorithms. In this data analytics context, the value of network generation and algorithms has been widely underscored for addressing the salient questions in cancer pathogenesis. Analysis of cancer trancriptome often results in complicated networks where identification of network modularity remains critical, for example, in delineating the "druggable" molecular targets. Network clustering is useful, but depends on the network topology in and of itself. Notably, the performance of different network-generating tools for network cluster (NC) identification has been little investigated to date. Hence, using gastric cancer (GC) transcriptomic datasets, we compared two algorithms for generating pathway versus gene regulatory network-based NCs, showing that the pathway-based approach better agrees with a reference set of cancer-functional contexts. Finally, by applying pathway-based NC identification to GC transcriptome datasets, we describe cancer NCs that associate with candidate therapeutic targets and biomarkers in GC. These observations collectively inform future research on cancer transcriptomics, drug discovery, and rational development of new analysis tools for optimal harnessing of omics data.

  9. Multiple receptor conformation docking, dock pose clustering and 3D QSAR studies on human poly(ADP-ribose) polymerase-1 (PARP-1) inhibitors.

    PubMed

    Fatima, Sabiha; Jatavath, Mohan Babu; Bathini, Raju; Sivan, Sree Kanth; Manga, Vijjulatha

    2014-10-01

    Poly(ADP-ribose) polymerase-1 (PARP-1) functions as a DNA damage sensor and signaling molecule. It plays a vital role in the repair of DNA strand breaks induced by radiation and chemotherapeutic drugs; inhibitors of this enzyme have the potential to improve cancer chemotherapy or radiotherapy. Three-dimensional quantitative structure activity relationship (3D QSAR) models were developed using comparative molecular field analysis, comparative molecular similarity indices analysis and docking studies. A set of 88 molecules were docked into the active site of six X-ray crystal structures of poly(ADP-ribose)polymerase-1 (PARP-1), by a procedure called multiple receptor conformation docking (MRCD), in order to improve the 3D QSAR models through the analysis of binding conformations. The docked poses were clustered to obtain the best receptor binding conformation. These dock poses from clustering were used for 3D QSAR analysis. Based on MRCD and QSAR information, some key features have been identified that explain the observed variance in the activity. Two receptor-based QSAR models were generated; these models showed good internal and external statistical reliability that is evident from the [Formula: see text], [Formula: see text] and [Formula: see text]. The identified key features enabled us to design new PARP-1 inhibitors.

  10. Computer-aided detection of clustered microcalcifications in multiscale bilateral filtering regularized reconstructed digital breast tomosynthesis volume

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Samala, Ravi K., E-mail: rsamala@umich.edu; Chan, Heang-Ping; Lu, Yao

    Purpose: Develop a computer-aided detection (CADe) system for clustered microcalcifications in digital breast tomosynthesis (DBT) volume enhanced with multiscale bilateral filtering (MSBF) regularization. Methods: With Institutional Review Board approval and written informed consent, two-view DBT of 154 breasts, of which 116 had biopsy-proven microcalcification (MC) clusters and 38 were free of MCs, was imaged with a General Electric GEN2 prototype DBT system. The DBT volumes were reconstructed with MSBF-regularized simultaneous algebraic reconstruction technique (SART) that was designed to enhance MCs and reduce background noise while preserving the quality of other tissue structures. The contrast-to-noise ratio (CNR) of MCs was furthermore » improved with enhancement-modulated calcification response (EMCR) preprocessing, which combined multiscale Hessian response to enhance MCs by shape and bandpass filtering to remove the low-frequency structured background. MC candidates were then located in the EMCR volume using iterative thresholding and segmented by adaptive region growing. Two sets of potential MC objects, cluster centroid objects and MC seed objects, were generated and the CNR of each object was calculated. The number of candidates in each set was controlled based on the breast volume. Dynamic clustering around the centroid objects grouped the MC candidates to form clusters. Adaptive criteria were designed to reduce false positive (FP) clusters based on the size, CNR values and the number of MCs in the cluster, cluster shape, and cluster based maximum intensity projection. Free-response receiver operating characteristic (FROC) and jackknife alternative FROC (JAFROC) analyses were used to assess the performance and compare with that of a previous study. Results: Unpaired two-tailedt-test showed a significant increase (p < 0.0001) in the ratio of CNRs for MCs with and without MSBF regularization compared to similar ratios for FPs. For view-based detection, a sensitivity of 85% was achieved at an FP rate of 2.16 per DBT volume. For case-based detection, a sensitivity of 85% was achieved at an FP rate of 0.85 per DBT volume. JAFROC analysis showed a significant improvement in the performance of the current CADe system compared to that of our previous system (p = 0.003). Conclusions: MBSF regularized SART reconstruction enhances MCs. The enhancement in the signals, in combination with properly designed adaptive threshold criteria, effective MC feature analysis, and false positive reduction techniques, leads to a significant improvement in the detection of clustered MCs in DBT.« less

  11. Characterization of Erwinia chrysanthemi by pectinolytic isozyme polymorphism and restriction fragment length polymorphism analysis of PCR-amplified fragments of pel genes.

    PubMed Central

    Nassar, A; Darrasse, A; Lemattre, M; Kotoujansky, A; Dervin, C; Vedel, R; Bertheau, Y

    1996-01-01

    Conserved regions about 420 bp long of the pelADE cluster specific to Erwinia chrysanthemi were amplified by PCR and used to differentiate 78 strains of E. chrysanthemi that were obtained from different hosts and geographical areas. No PCR products were obtained from DNA samples extracted from other pectinolytic and nonpectinolytic species and genera. The pel fragments amplified from the E. chrysanthemi strains studied were compared by performing a restriction fragment length polymorphism (RFLP) analysis. On the basis of similarity coefficients derived from the RFLP analysis, the strains were separated into 16 PCR RFLP patterns grouped in six clusters, These clusters appeared to be correlated with other infraspecific levels of E. chrysanthemi classification, such as pathovar and biovar, and occasionally with geographical origin. Moreover, the clusters correlated well with the polymorphism of pectate lyase and pectin methylesterase isoenzymes. While the pectin methylesterase profiles correlated with host monocot-dicot classification, the pectate lyase polymorphism might reflect the cell wall microdomains of the plants belonging to these classes. PMID:8779560

  12. Retinal ganglion cells in the eastern newt Notophthalmus viridescens: topography, morphology, and diversity.

    PubMed

    Pushchin, Igor I; Karetin, Yuriy A

    2009-10-20

    The topography and morphology of retinal ganglion cells (RGCs) in the eastern newt were studied. Cells were retrogradely labeled with tetramethylrhodamine-conjugated dextran amines or horseradish peroxidase and examined in retinal wholemounts. Their total number was 18,025 +/- 3,602 (mean +/- SEM). The spatial density of RGCs varied from 2,100 cells/mm(2) in the retinal periphery to 4,500 cells/mm(2) in the dorsotemporal retina. No prominent retinal specializations were found. The spatial resolution estimated from the spatial density of RGCs varied from 1.4 cycles per degree in the periphery to 1.95 cycles per degree in the region of the peak RGC density. A sample of 68 cells was camera lucida drawn and subjected to quantitative analysis. A total of 21 parameters related to RGC morphology and stratification in the retina were estimated. Partitionings obtained by using different clustering algorithms combined with automatic variable weighting and dimensionality reduction techniques were compared, and an effective solution was found by using silhouette analysis. A total of seven clusters were identified and associated with potential cell types. Kruskal-Wallis ANOVA-on-Ranks with post hoc Mann-Whitney U tests showed significant pairwise between-cluster differences in one or more of the clustering variables. The average silhouette values of the clusters were reasonably high, ranging from 0.52 to 0.79. Cells assigned to the same cluster displayed similar morphology and stratification in the retina. The advantages and limitations of the methodology adopted are discussed. The present classification is compared with known morphological and physiological RGC classifications in other salamanders.

  13. Sensitivity and specificity of univariate MRI analysis of experimentally degraded cartilage

    PubMed Central

    Lin, Ping-Chang; Reiter, David A.; Spencer, Richard G.

    2010-01-01

    MRI is increasingly used to evaluate cartilage in tissue constructs, explants, and animal and patient studies. However, while mean values of MR parameters, including T1, T2, magnetization transfer rate km, apparent diffusion coefficient ADC, and the dGEMRIC-derived fixed charge density, correlate with tissue status, the ability to classify tissue according to these parameters has not been explored. Therefore, the sensitivity and specificity with which each of these parameters was able to distinguish between normal and trypsin- degraded, and between normal and collagenase-degraded, cartilage explants were determined. Initial analysis was performed using a training set to determine simple group means to which parameters obtained from a validation set were compared. T1 and ADC showed the greatest ability to discriminate between normal and degraded cartilage. Further analysis with k-means clustering, which eliminates the need for a priori identification of sample status, generally performed comparably. Use of fuzzy c-means (FCM) clustering to define centroids likewise did not result in improvement in discrimination. Finally, a FCM clustering approach in which validation samples were assigned in a probabilistic fashion to control and degraded groups was implemented, reflecting the range of tissue characteristics seen with cartilage degradation. PMID:19705467

  14. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Interrupted time-series analysis yielded an effect estimate concordant with the cluster-randomized controlled trial result.

    PubMed

    Fretheim, Atle; Soumerai, Stephen B; Zhang, Fang; Oxman, Andrew D; Ross-Degnan, Dennis

    2013-08-01

    We reanalyzed the data from a cluster-randomized controlled trial (C-RCT) of a quality improvement intervention for prescribing antihypertensive medication. Our objective was to estimate the effectiveness of the intervention using both interrupted time-series (ITS) and RCT methods, and to compare the findings. We first conducted an ITS analysis using data only from the intervention arm of the trial because our main objective was to compare the findings from an ITS analysis with the findings from the C-RCT. We used segmented regression methods to estimate changes in level or slope coincident with the intervention, controlling for baseline trend. We analyzed the C-RCT data using generalized estimating equations. Last, we estimated the intervention effect by including data from both study groups and by conducting a controlled ITS analysis of the difference between the slope and level changes in the intervention and control groups. The estimates of absolute change resulting from the intervention were ITS analysis, 11.5% (95% confidence interval [CI]: 9.5, 13.5); C-RCT, 9.0% (95% CI: 4.9, 13.1); and the controlled ITS analysis, 14.0% (95% CI: 8.6, 19.4). ITS analysis can provide an effect estimate that is concordant with the results of a cluster-randomized trial. A broader range of comparisons from other RCTs would help to determine whether these are generalizable results. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. Copy number gain at 8q12.1-q22.1 is associated with a malignant tumor phenotype in salivary gland myoepitheliomas.

    PubMed

    Vékony, Hedy; Röser, Kerstin; Löning, Thomas; Ylstra, Bauke; Meijer, Gerrit A; van Wieringen, Wessel N; van de Wiel, Mark A; Carvalho, Beatriz; Kok, Klaas; Leemans, C René; van der Waal, Isaäc; Bloemena, Elisabeth

    2009-02-01

    Salivary gland myoepithelial tumors are relatively uncommon tumors with an unpredictable clinical course. More knowledge about their genetic profiles is necessary to identify novel predictors of disease. In this study, we subjected 27 primary tumors (15 myoepitheliomas and 12 myoepithelial carcinomas) to genome-wide microarray-based comparative genomic hybridization (array CGH). We set out to delineate known chromosomal aberrations in more detail and to unravel chromosomal differences between benign myoepitheliomas and myoepithelial carcinomas. Patterns of DNA copy number aberrations were analyzed by unsupervised hierarchical cluster analysis. Both benign and malignant tumors revealed a limited amount of chromosomal alterations (median of 5 and 7.5, respectively). In both tumor groups, high frequency gains (> or =20%) were found mainly at loci of growth factors and growth factor receptors (e.g., PDGF, FGF(R)s, and EGFR). In myoepitheliomas, high frequency losses (> or =20%) were detected at regions of proto-cadherins. Cluster analysis of the array CGH data identified three clusters. Differential copy numbers on chromosome arm 8q and chromosome 17 set the clusters apart. Cluster 1 contained a mixture of the two phenotypes (n = 10), cluster 2 included mostly benign tumors (n = 10), and cluster 3 only contained carcinomas (n = 7). Supervised analysis between malignant and benign tumors revealed a 36 Mbp-region at 8q being more frequently gained in malignant tumors (P = 0.007, FDR = 0.05). This is the first study investigating genomic differences between benign and malignant myoepithelial tumors of the salivary glands at a genomic level. Both unsupervised and supervised analysis of the genomic profiles revealed chromosome arm 8q to be involved in the malignant phenotype of salivary gland myoepitheliomas.

  17. Swarm Intelligence in Text Document Clustering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cui, Xiaohui; Potok, Thomas E

    2008-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less

  18. Using cluster ensemble and validation to identify subtypes of pervasive developmental disorders.

    PubMed

    Shen, Jess J; Lee, Phil-Hyoun; Holden, Jeanette J A; Shatkay, Hagit

    2007-10-11

    Pervasive Developmental Disorders (PDD) are neurodevelopmental disorders characterized by impairments in social interaction, communication and behavior. Given the diversity and varying severity of PDD, diagnostic tools attempt to identify homogeneous subtypes within PDD. Identifying subtypes can lead to targeted etiology studies and to effective type-specific intervention. Cluster analysis can suggest coherent subsets in data; however, different methods and assumptions lead to different results. Several previous studies applied clustering to PDD data, varying in number and characteristics of the produced subtypes. Most studies used a relatively small dataset (fewer than 150 subjects), and all applied only a single clustering method. Here we study a relatively large dataset (358 PDD patients), using an ensemble of three clustering methods. The results are evaluated using several validation methods, and consolidated through an integration step. Four clusters are identified, analyzed and compared to subtypes previously defined by the widely used diagnostic tool DSM-IV.

  19. Using Cluster Ensemble and Validation to Identify Subtypes of Pervasive Developmental Disorders

    PubMed Central

    Shen, Jess J.; Lee, Phil Hyoun; Holden, Jeanette J.A.; Shatkay, Hagit

    2007-01-01

    Pervasive Developmental Disorders (PDD) are neurodevelopmental disorders characterized by impairments in social interaction, communication and behavior.1 Given the diversity and varying severity of PDD, diagnostic tools attempt to identify homogeneous subtypes within PDD. Identifying subtypes can lead to targeted etiology studies and to effective type-specific intervention. Cluster analysis can suggest coherent subsets in data; however, different methods and assumptions lead to different results. Several previous studies applied clustering to PDD data, varying in number and characteristics of the produced subtypes19. Most studies used a relatively small dataset (fewer than 150 subjects), and all applied only a single clustering method. Here we study a relatively large dataset (358 PDD patients), using an ensemble of three clustering methods. The results are evaluated using several validation methods, and consolidated through an integration step. Four clusters are identified, analyzed and compared to subtypes previously defined by the widely used diagnostic tool DSM-IV.2 PMID:18693920

  20. Deep Brain Stimulation of the Subthalamic Nucleus Improves Lexical Switching in Parkinsons Disease Patients.

    PubMed

    Vonberg, Isabelle; Ehlen, Felicitas; Fromm, Ortwin; Kühn, Andrea A; Klostermann, Fabian

    2016-01-01

    Reduced verbal fluency (VF) has been reported in patients with Parkinson's disease (PD), especially those treated by Deep Brain Stimulation of the subthalamic nucleus (STN DBS). To delineate the nature of this dysfunction we aimed at identifying the particular VF-related operations modified by STN DBS. Eleven PD patients performed VF tasks in their STN DBS ON and OFF condition. To differentiate VF-components modulated by the stimulation, a temporal cluster analysis was performed, separating production spurts (i.e., 'clusters' as correlates of automatic activation spread within lexical fields) from slower cluster transitions (i.e., 'switches' reflecting set-shifting towards new lexical fields). The results were compared to those of eleven healthy control subjects. PD patients produced significantly more switches accompanied by shorter switch times in the STN DBS ON compared to the STN DBS OFF condition. The number of clusters and time intervals between words within clusters were not affected by the treatment state. Although switch behavior in patients with DBS ON improved, their task performance was still lower compared to that of healthy controls. Beyond impacting on motor symptoms, STN DBS seems to influence the dynamics of cognitive procedures. Specifically, the results are in line with basal ganglia roles for cognitive switching, in the particular case of VF, from prevailing lexical concepts to new ones.

  1. Cluster mass profile reconstruction with size and flux magnification on the HST STAGES survey.

    PubMed

    Duncan, Christopher A J; Heymans, Catherine; Heavens, Alan F; Joachimi, Benjamin

    2016-03-21

    We present the first measurement of individual cluster mass estimates using weak lensing size and flux magnification. Using data from the HST STAGES (Space Telescope A901/902 Galaxy Evolution Survey) survey of the A901/902 supercluster we detect the four known groups in the supercluster at high significance using magnification alone. We discuss the application of a fully Bayesian inference analysis, and investigate a broad range of potential systematics in the application of the method. We compare our results to a previous weak lensing shear analysis of the same field finding the recovered signal-to-noise of our magnification-only analysis to range from 45 to 110 per cent of the signal-to-noise in the shear-only analysis. On a case-by-case basis we find consistent magnification and shear constraints on cluster virial radius, and finding that for the full sample, magnification constraints to be a factor 0.77 ± 0.18 lower than the shear measurements.

  2. Spectroscopic determination of fundamental parameters of small angular diameter galactic open clusters

    NASA Astrophysics Data System (ADS)

    Ahumada, A. V.; Claria, J. J.; Bica, E.; Parisi, M. C.; Torres, M. C.; Pavani, D. B.

    We present integrated spectra obtained at CASLEO (Argentina) for 9 galactic open clusters of small angular diameter. Two of them (BH 55 and Rup 159) have not been the target of previous research. The flux-calibrated spectra cover the spectral range approx. 3600-6900 A. Using the equivalent widths (EWs) of the Balmer lines and comparing the cluster spectra with template spectra, we determined E(B-V) colour excesses and ages for the present cluster sample. The parameters obtained for 6 of the clusters show good agreement with previous determinations based mainly on photometric methods. This is not the case, however, for BH 90, a scarcely reddened cluster, for which Moffat and Vogt (1975, Astron. and Astroph. SS, 20, 125) derived E(B-V) = 0.51. We explain and justify the strong discrepancy found for this object. According to the present analysis, 3 clusters are very young (Bo 14, Tr 15 and Tr 27), 2 are moderately young (NGC 6268 and BH 205), 3 are Hyades-like clusters (Rup 164, BH 90 and BH 55) and only one is an intermediate-age cluster (Rup 159).

  3. The Atacama Cosmology Telescope: Cosmology from Galaxy Clusters Detected Via the Sunyaev-Zel'dovich Effect

    NASA Technical Reports Server (NTRS)

    Sehgal, Neelima; Trac, Hy; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John W.; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard; hide

    2010-01-01

    We present constraints on cosmological parameters based on a sample of Sunyaev-Zel'dovich-selected galaxy clusters detected in a millimeter-wave survey by the Atacama Cosmology Telescope. The cluster sample used in this analysis consists of 9 optically-confirmed high-mass clusters comprising the high-significance end of the total cluster sample identified in 455 square degrees of sky surveyed during 2008 at 148 GHz. We focus on the most massive systems to reduce the degeneracy between unknown cluster astrophysics and cosmology derived from SZ surveys. We describe the scaling relation between cluster mass and SZ signal with a 4-parameter fit. Marginalizing over the values of the parameters in this fit with conservative priors gives (sigma)8 = 0.851 +/- 0.115 and w = -1.14 +/- 0.35 for a spatially-flat wCDM cosmological model with WMAP 7-year priors on cosmological parameters. This gives a modest improvement in statistical uncertainty over WMAP 7-year constraints alone. Fixing the scaling relation between cluster mass and SZ signal to a fiducial relation obtained from numerical simulations and calibrated by X-ray observations, we find (sigma)8 + 0.821 +/- 0.044 and w = -1.05 +/- 0.20. These results are consistent with constraints from WMAP 7 plus baryon acoustic oscillations plus type Ia supernova which give (sigma)8 = 0.802 +/- 0.038 and w = -0.98 +/- 0.053. A stacking analysis of the clusters in this sample compared to clusters simulated assuming the fiducial model also shows good agreement. These results suggest that, given the sample of clusters used here, both the astrophysics of massive clusters and the cosmological parameters derived from them are broadly consistent with current models.

  4. Validating clustering of molecular dynamics simulations using polymer models.

    PubMed

    Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn

    2011-11-14

    Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.

  5. Validating clustering of molecular dynamics simulations using polymer models

    PubMed Central

    2011-01-01

    Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218

  6. Principal components derived from CSF inflammatory profiles predict outcome in survivors after severe traumatic brain injury.

    PubMed

    Kumar, Raj G; Rubin, Jonathan E; Berger, Rachel P; Kochanek, Patrick M; Wagner, Amy K

    2016-03-01

    Studies have characterized absolute levels of multiple inflammatory markers as significant risk factors for poor outcomes after traumatic brain injury (TBI). However, inflammatory marker concentrations are highly inter-related, and production of one may result in the production or regulation of another. Therefore, a more comprehensive characterization of the inflammatory response post-TBI should consider relative levels of markers in the inflammatory pathway. We used principal component analysis (PCA) as a dimension-reduction technique to characterize the sets of markers that contribute independently to variability in cerebrospinal (CSF) inflammatory profiles after TBI. Using PCA results, we defined groups (or clusters) of individuals (n=111) with similar patterns of acute CSF inflammation that were then evaluated in the context of outcome and other relevant CSF and serum biomarkers collected days 0-3 and 4-5 post-injury. We identified four significant principal components (PC1-PC4) for CSF inflammation from days 0-3, and PC1 accounted for the greatest (31%) percentage of variance. PC1 was characterized by relatively higher CSF sICAM-1, sFAS, IL-10, IL-6, sVCAM-1, IL-5, and IL-8 levels. Cluster analysis then defined two distinct clusters, such that individuals in cluster 1 had highly positive PC1 scores and relatively higher levels of CSF cortisol, progesterone, estradiol, testosterone, brain derived neurotrophic factor (BDNF), and S100b; this group also had higher serum cortisol and lower serum BDNF. Multinomial logistic regression analyses showed that individuals in cluster 1 had a 10.9 times increased likelihood of GOS scores of 2/3 vs. 4/5 at 6 months compared to cluster 2, after controlling for covariates. Cluster group did not discriminate between mortality compared to GOS scores of 4/5 after controlling for age and other covariates. Cluster groupings also did not discriminate mortality or 12 month outcomes in multivariate models. PCA and cluster analysis establish that a subset of CSF inflammatory markers measured in days 0-3 post-TBI may distinguish individuals with poor 6-month outcome, and future studies should prospectively validate these findings. PCA of inflammatory mediators after TBI could aid in prognostication and in identifying patient subgroups for therapeutic interventions. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Creating peer groups for assessing and comparing nursing home performance.

    PubMed

    Byrne, Margaret M; Daw, Christina; Pietz, Ken; Reis, Brian; Petersen, Laura A

    2013-11-01

    Publicly reported performance data for hospitals and nursing homes are becoming ubiquitous. For such comparisons to be fair, facilities must be compared with their peers. To adapt a previously published methodology for developing hospital peer groupings so that it is applicable to nursing homes and to explore the characteristics of "nearest-neighbor" peer groupings. Analysis of Department of Veterans Affairs administrative databases and nursing home facility characteristics. The nearest-neighbor methodology for developing peer groupings involves calculating the Euclidean distance between facilities based on facility characteristics. We describe our steps in selection of facility characteristics, describe the characteristics of nearest-neighbor peer groups, and compare them with peer groups derived through classical cluster analysis. The facility characteristics most pertinent to nursing home groupings were found to be different from those that were most relevant for hospitals. Unlike classical cluster groups, nearest neighbor groups are not mutually exclusive, and the nearest-neighbor methodology resulted in nursing home peer groupings that were substantially less diffuse than nursing home peer groups created using traditional cluster analysis. It is essential that healthcare policy makers and administrators have a means of fairly grouping facilities for the purposes of quality, cost, or efficiency comparisons. In this research, we show that a previously published methodology can be successfully applied to a nursing home setting. The same approach could be applied in other clinical settings such as primary care.

  8. Species richness and composition of epiphytic bryophytes in flooded forests of Caxiuanã National Forest, Eastern Amazon, Brazil.

    PubMed

    Cerqueira, Gabriela R; Ilkiu-Borges, Anna Luiza; Ferreira, Leandro V

    2017-01-01

    This study aimed to compare the richness and composition of the epiphytic bryoflora between várzea and igapó forests in Caxiuanã National Forest, Brazilian Amazon. Bryophytes were collected on 502 phorophytes of Virola surinamensis. Average richness per phorophyte and composition between forests and between dry and rainy periods was tested by two-way analysis and by cluster analysis, respectively. In total, 54 species of 13 families were identified. Richness was greater in igapó forest (44 species) compared to várzea forest (38 species). There was no significant difference in the number of species between the studied periods. Cluster analysis showed the bryoflora composition was different between várzea and igapó, but not between dry and rainy periods. Results did not corroborate the hypothesis that várzea forests harbor higher species richness than igapó forests.

  9. Comparing exposure metrics for classifying ‘dangerous heat’ in heat wave and health warning systems

    PubMed Central

    Zhang, Kai; Rood, Richard B.; Michailidis, George; Oswald, Evan M.; Schwartz, Joel D.; Zanobetti, Antonella; Ebi, Kristie L.; O’Neill, Marie S.

    2012-01-01

    Heat waves have been linked to excess mortality and morbidity, and are projected to increase in frequency and intensity with a warming climate. This study compares exposure metrics to trigger heat wave and health warning systems (HHWS), and introduces a novel multi-level hybrid clustering method to identify potential dangerously hot days. Two-level and three-level hybrid clustering analysis as well as common indices used to trigger HHWS, including spatial synoptic classification (SSC); and 90th, 95th, and 99th percentiles of minimum and relative minimum temperature (using a 10 day reference period), were calculated using a summertime weather dataset in Detroit from 1976 to 2006. The days classified as ‘hot’ with hybrid clustering analysis, SSC, minimum and relative minimum temperature methods differed by method type. SSC tended to include the days with, on average, 2.6 °C lower daily minimum temperature and 5.3 °C lower dew point than days identified by other methods. These metrics were evaluated by comparing their performance in predicting excess daily mortality. The 99th percentile of minimum temperature was generally the most predictive, followed by the three-level hybrid clustering method, the 95th percentile of minimum temperature, SSC and others. Our proposed clustering framework has more flexibility and requires less substantial meteorological prior information than the synoptic classification methods. Comparison of these metrics in predicting excess daily mortality suggests that metrics thought to better characterize physiological heat stress by considering several weather conditions simultaneously may not be the same metrics that are better at predicting heat-related mortality, which has significant implications in HHWSs. PMID:22673187

  10. Comparative Study of Broadband Photometry Relations for Ultra-Diffuse and Normal Galaxies in the Coma Cluster

    NASA Astrophysics Data System (ADS)

    Stone, Maria Babakhanyan

    Ultra-diffuse galaxies are a novel type of galaxies discovered first in the Coma cluster. These objects are characterized simultaneously by large sizes and by very low counts of constituent stars. Conflicting theories have been proposed to explain how these large diffuse galaxies could have survived in the harsh environment of clusters. To date, thousands of these new galaxies have been identified in cluster environments. However, further studies are required to understand their relationship to the known giant and dwarf classes of galaxies. The purpose of this study is to compare the trends of inner and outer populations of normal members of the Coma cluster and ultra-diffuse galaxies in color-magnitude space. The present work used several astronomical catalogs to identify the member galaxies based on the coordinates of their positions and to extract available colors and magnitudes. We obtained correlations to convert colors and magnitudes from different systems into the common Sloan Digital Sky Survey system to facilitate the comparative analysis. We showed the quantitative relations describing the color-magnitude trends of galaxies in the core and the outskirts of the cluster. We confirmed that the inner and outer populations of ultra-diffuse galaxies exhibit an offset similar to the normal red sequence galaxies. We presented an initial assessment of stellar population ages and metallicities which correspond to the obtained color offsets. We surveyed the available images of the cluster for outliers, merger candidates, and candidate ultra-diffuse galaxies. We conclude that ultra-diffuse galaxies are an important part of the Coma cluster evolutionary history and future work is needed especially in obtaining spectroscopic data of a larger number of these dim galaxies.

  11. Classification of different types of beer according to their colour characteristics

    NASA Astrophysics Data System (ADS)

    Nikolova, Kr T.; Gabrova, R.; Boyadzhiev, D.; Pisanova, E. S.; Ruseva, J.; Yanakiev, D.

    2017-01-01

    Twenty-two samples from different beers have been investigated in two colour systems - XYZ and SIELab - and have been characterised according to their colour parameters. The goals of the current study were to conduct correlation and discriminant analysis and to find the inner relation between the studied indices. K-means cluster has been used to compare and group the tested types of beer based on their similarity. To apply the K-Cluster analysis it is required that the number of clusters be determined in advance. The variant K = 4 was worked out. The first cluster unified all bright beers, the second one contained samples with fruits, the third one contained samples with addition of lemon, the fourth unified the samples of dark beers. By applying the discriminant analysis it is possible to help selections in the establishment of the type of beer. The proposed model correctly describes the types of beer on the Bulgarian market and it can be used for determining the affiliation of the beer which is not used in obtained model. One sample has been chosen from each cluster and the digital image has been obtained. It confirms the color parameters in the color system XYZ and SIELab. These facts can be used for elaboration for express estimation of beer by color.

  12. Recognizing different tissues in human fetal femur cartilage by label-free Raman microspectroscopy

    NASA Astrophysics Data System (ADS)

    Kunstar, Aliz; Leijten, Jeroen; van Leuveren, Stefan; Hilderink, Janneke; Otto, Cees; van Blitterswijk, Clemens A.; Karperien, Marcel; van Apeldoorn, Aart A.

    2012-11-01

    Traditionally, the composition of bone and cartilage is determined by standard histological methods. We used Raman microscopy, which provides a molecular "fingerprint" of the investigated sample, to detect differences between the zones in human fetal femur cartilage without the need for additional staining or labeling. Raman area scans were made from the (pre)articular cartilage, resting, proliferative, and hypertrophic zones of growth plate and endochondral bone within human fetal femora. Multivariate data analysis was performed on Raman spectral datasets to construct cluster images with corresponding cluster averages. Cluster analysis resulted in detection of individual chondrocyte spectra that could be separated from cartilage extracellular matrix (ECM) spectra and was verified by comparing cluster images with intensity-based Raman images for the deoxyribonucleic acid/ribonucleic acid (DNA/RNA) band. Specific dendrograms were created using Ward's clustering method, and principal component analysis (PCA) was performed with the separated and averaged Raman spectra of cells and ECM of all measured zones. Overall (dis)similarities between measured zones were effectively visualized on the dendrograms and main spectral differences were revealed by PCA allowing for label-free detection of individual cartilaginous zones and for label-free evaluation of proper cartilaginous matrix formation for future tissue engineering and clinical purposes.

  13. Hypersexuality and high sexual desire: exploring the structure of problematic sexuality.

    PubMed

    Carvalho, Joana; Štulhofer, Aleksandar; Vieira, Armando L; Jurin, Tanja

    2015-06-01

    The concept of hypersexuality has been accompanied by fierce debates and conflicting conclusions about its nature. One of the central questions under the discussion is a potential overlap between hypersexuality and high sexual desire. With the relevant research in its early phase, the structure of hypersexuality remains largely unknown. The aim of the present study was to systematically explore the overlap between problematic sexuality and high sexual desire. A community online survey was carried out in Croatia in 2014. The data were first cluster analyzed (by gender) based on sexual desire, sexual activity, perceived lack of control over one's sexuality, and negative behavioral consequences. Participants in the meaningful clusters were then compared for psychosocial characteristics. To complement cluster analysis (CA), multigroup confirmatory factor analysis (CFA) of the same four constructs was carried out. Indicators representing the proposed structure of hypersexuality were included: sexual desire, frequency of sexual activity, lack of control over one's sexuality, and negative behavioral outcomes. Psychosocial characteristics such as religiosity, attitudes toward pornography, and general psychopathology were also evaluated. CA pointed to the existence of two meaningful clusters, one representing problematic sexuality, that is, lack of control over one's sexuality and negative outcomes (control/consequences cluster), and the other reflecting high sexual desire and frequent sexual activity (desire/activity cluster). Compared with the desire/activity cluster, individuals from the control/consequences cluster reported more psychopathology and were characterized by more traditional attitudes. Complementing the CA findings, CFA pointed to two distinct latent dimensions-problematic sexuality and high sexual desire/activity. Our study supports the distinctiveness of hypersexuality and high sexual desire/activity, suggesting that problematic sexuality might be more associated with the perceived lack of personal control over sexuality and moralistic attitudes than with high levels of sexual desire and activity. © 2015 International Society for Sexual Medicine.

  14. Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment

    NASA Astrophysics Data System (ADS)

    Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.

    2014-06-01

    Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.

  15. Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment

    NASA Astrophysics Data System (ADS)

    Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.

    2014-11-01

    Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods that have been recently employed to analyse PNSD data; however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectrum to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.

  16. Cluster-distinguishing genotypic and phenotypic diversity of carbapenem-resistant Gram-negative bacteria in solid-organ transplantation patients: a comparative study.

    PubMed

    Karampatakis, Theodoros; Geladari, Anastasia; Politi, Lida; Antachopoulos, Charalampos; Iosifidis, Elias; Tsiatsiou, Olga; Karyoti, Aggeliki; Papanikolaou, Vasileios; Tsakris, Athanassios; Roilides, Emmanuel

    2017-07-31

    Solid-organ transplant recipients may display high rates of colonization and/or infection by multidrug-resistant bacteria. We analysed and compared the phenotypic and genotypic diversity of carbapenem-resistant (CR) strains of Klebsiella pneumoniae, Pseudomonas aeruginosa and Acinetobacter baumannii isolated from patients in the Solid Organ Transplantation department of our hospital. Between March 2012 and August 2013, 56 CR strains from various biological fluids underwent antimicrobial susceptibility testing with VITEK 2, molecular analysis by PCR amplification and genotypic analysis with pulsed-field gel electrophoresis (PFGE). They were clustered according to antimicrobial drug susceptibility and genotypic profiles. Diversity analyses were performed by calculating Simpson's diversity index and applying computed rarefaction curves.Results/Key findings. Among K. pneumoniae, KP-producers predominated (57.1 %). VIM and OXA-23 carbapenemases prevailed among P. aeruginosa and A. baumannii (89.4 and 88.9 %, respectively). KPC-producing K. pneumoniae and OXA-23 A. baumannii were assigned in single PFGE pulsotypes. VIM-producing P. aeruginosa generated multiple pulsotypes. CR K. pneumoniae strains displayed phenotypic diversity in tigecycline, colistin (CS), amikacin (AMK), gentamicin (GEN) and co-trimoxazole (SXT) (16 clusters); P. aeruginosa displayed phenotypic diversity in cefepime (FEP), ceftazidime, aztreonam, piperacillin, piperacillin-tazobactam, AMK, GEN and CS (9 clusters); and A. baumannii displayed phenotypic diversity in AMK, GEN, SXT, FEP, tobramycin and rifampicin (8 clusters). The Simpson diversity indices for the interpretative phenotype and PFGE analysis were 0.89 and 0.6, respectively, for K. pneumoniae strains (P<0.001); 0.77 and 0.6 for P. aeruginosa (P=0.22); and 0.86 and 0.19 for A. baumannii (P=0.004). The presence of different antimicrobial susceptibility profiles does not preclude the possibility that two CR K. pneumoniae or A. baumannii isolates are clonally related.

  17. Suzaku observations of low surface brightness cluster Abell 1631

    NASA Astrophysics Data System (ADS)

    Babazaki, Yasunori; Mitsuishi, Ikuyuki; Ota, Naomi; Sasaki, Shin; Böhringer, Hans; Chon, Gayoung; Pratt, Gabriel W.; Matsumoto, Hironori

    2018-04-01

    We present analysis results for a nearby galaxy cluster Abell 1631 at z = 0.046 using the X-ray observatory Suzaku. This cluster is categorized as a low X-ray surface brightness cluster. To study the dynamical state of the cluster, we conduct four-pointed Suzaku observations and investigate physical properties of the Mpc-scale hot gas associated with the A 1631 cluster for the first time. Unlike relaxed clusters, the X-ray image shows no strong peak at the center and an irregular morphology. We perform spectral analysis and investigate the radial profiles of the gas temperature, density, and entropy out to approximately 1.5 Mpc in the east, north, west, and south directions by combining with the XMM-Newton data archive. The measured gas density in the central region is relatively low (a few ×10-4 cm-3) at the given temperature (˜2.9 keV) compared with X-ray-selected clusters. The entropy profile and value within the central region (r < 0.1 r200) are found to be flatter and higher (≳400 keV cm2). The observed bolometric luminosity is approximately three times lower than that expected from the luminosity-temperature relation in previous studies of relaxed clusters. These features are also observed in another low surface brightness cluster, Abell 76. The spatial distributions of galaxies and the hot gas appear to be different. The X-ray luminosity is relatively lower than that expected from the velocity dispersion. A post-merger scenario may explain the observed results.

  18. Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

    PubMed Central

    Sinyor, Mark; Schaffer, Ayal; Streiner, David L

    2014-01-01

    Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321

  19. Suzaku observations of low surface brightness cluster Abell 1631

    NASA Astrophysics Data System (ADS)

    Babazaki, Yasunori; Mitsuishi, Ikuyuki; Ota, Naomi; Sasaki, Shin; Böhringer, Hans; Chon, Gayoung; Pratt, Gabriel W.; Matsumoto, Hironori

    2018-06-01

    We present analysis results for a nearby galaxy cluster Abell 1631 at z = 0.046 using the X-ray observatory Suzaku. This cluster is categorized as a low X-ray surface brightness cluster. To study the dynamical state of the cluster, we conduct four-pointed Suzaku observations and investigate physical properties of the Mpc-scale hot gas associated with the A 1631 cluster for the first time. Unlike relaxed clusters, the X-ray image shows no strong peak at the center and an irregular morphology. We perform spectral analysis and investigate the radial profiles of the gas temperature, density, and entropy out to approximately 1.5 Mpc in the east, north, west, and south directions by combining with the XMM-Newton data archive. The measured gas density in the central region is relatively low (a few ×10-4 cm-3) at the given temperature (˜2.9 keV) compared with X-ray-selected clusters. The entropy profile and value within the central region (r < 0.1 r200) are found to be flatter and higher (≳400 keV cm2). The observed bolometric luminosity is approximately three times lower than that expected from the luminosity-temperature relation in previous studies of relaxed clusters. These features are also observed in another low surface brightness cluster, Abell 76. The spatial distributions of galaxies and the hot gas appear to be different. The X-ray luminosity is relatively lower than that expected from the velocity dispersion. A post-merger scenario may explain the observed results.

  20. Assessment of the climatic potential for tourism in Iran through biometeorology clustering.

    PubMed

    Roshan, Gholamreza; Yousefi, Robabe; Błażejczyk, Krzysztof

    2018-04-01

    This study presents a spatiotemporal analysis of bioclimatic comfort conditions for Iran using mean daily meteorological data from 1995 to 2014, analyzed through Physiological Equivalent Temperature (PET) index and Universal Thermal Climate Index (UTCI) indices, and bioclimatic clustering. The results of this study demonstrate that due to the climate variability across Iran during the year, there is at any point in time a location with climatic condition suitable for tourism. Mean values demonstrate maxima in bioclimatic comfort indices for the country in late winter and spring and minima for summer. Seven statistically significant clusters in bioclimatic indices were identified. Comparing these with clustering performed on PET and UTCI, the maximum overlaps between the two indices. In the following, the outputs of this research showed that most appropriate bioclimatic clustering for Iran includes seven clusters. These clustering locations according to climatic suitability for tourism provide a valuable contribution to tourism management in the country, particularly through marketing destinations to maximize tourist flow.

  1. HICOSMO - cosmology with a complete sample of galaxy clusters - I. Data analysis, sample selection and luminosity-mass scaling relation

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T. H.

    2017-08-01

    The X-ray regime, where the most massive visible component of galaxy clusters, the intracluster medium, is visible, offers directly measured quantities, like the luminosity, and derived quantities, like the total mass, to characterize these objects. The aim of this project is to analyse a complete sample of galaxy clusters in detail and constrain cosmological parameters, like the matter density, Ωm, or the amplitude of initial density fluctuations, σ8. The purely X-ray flux-limited sample (HIFLUGCS) consists of the 64 X-ray brightest galaxy clusters, which are excellent targets to study the systematic effects, that can bias results. We analysed in total 196 Chandra observations of the 64 HIFLUGCS clusters, with a total exposure time of 7.7 Ms. Here, we present our data analysis procedure (including an automated substructure detection and an energy band optimization for surface brightness profile analysis) that gives individually determined, robust total mass estimates. These masses are tested against dynamical and Planck Sunyaev-Zeldovich (SZ) derived masses of the same clusters, where good overall agreement is found with the dynamical masses. The Planck SZ masses seem to show a mass-dependent bias to our hydrostatic masses; possible biases in this mass-mass comparison are discussed including the Planck selection function. Furthermore, we show the results for the (0.1-2.4) keV luminosity versus mass scaling relation. The overall slope of the sample (1.34) is in agreement with expectations and values from literature. Splitting the sample into galaxy groups and clusters reveals, even after a selection bias correction, that galaxy groups exhibit a significantly steeper slope (1.88) compared to clusters (1.06).

  2. Cluster mass calibration at high redshift: HST weak lensing analysis of 13 distant galaxy clusters from the South Pole Telescope Sunyaev-Zel'dovich Survey

    NASA Astrophysics Data System (ADS)

    Schrabback, T.; Applegate, D.; Dietrich, J. P.; Hoekstra, H.; Bocquet, S.; Gonzalez, A. H.; von der Linden, A.; McDonald, M.; Morrison, C. B.; Raihan, S. F.; Allen, S. W.; Bayliss, M.; Benson, B. A.; Bleem, L. E.; Chiu, I.; Desai, S.; Foley, R. J.; de Haan, T.; High, F. W.; Hilbert, S.; Mantz, A. B.; Massey, R.; Mohr, J.; Reichardt, C. L.; Saro, A.; Simon, P.; Stern, C.; Stubbs, C. W.; Zenteno, A.

    2018-02-01

    We present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (zmedian = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V - I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration-mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass-temperature scaling relation ln (E(z)M500c/1014 M⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81^{+0.24}_{-0.14}(stat.) {± } 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c_200c=5.6^{+3.7}_{-1.8}.

  3. Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel'dovich Survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schrabback, T.; et al.

    We present an HST/ACS weak gravitational lensing analysis of 13 massive high-redshift (z_median=0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the sourcemore » redshift distribution is based on CANDELS data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the mass-concentration relation using simulations. In combination with temperature estimates from Chandra we constrain the normalisation of the mass-temperature scaling relation ln(E(z) M_500c/10^14 M_sun)=A+1.5 ln(kT/7.2keV) to A=1.81^{+0.24}_{-0.14}(stat.) +/- 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c_200c=5.6^{+3.7}_{-1.8}.« less

  4. Associations between a Genetic Risk Score for Clinical CAD and Early Stage Lesions in the Coronary Artery and the Aorta.

    PubMed

    Salfati, Elias L; Herrington, David M; Assimes, Themistocles L

    2016-01-01

    The correlation between the extent of fatty streaks, more advanced atherosclerotic lesions, and community rates of coronary artery disease (CAD) is substantially higher for the coronary artery compared to the aorta. We sought to determine whether a genetic basis contributes to these differences. We conducted a cluster analysis of 6 subclinical atherosclerosis phenotypes documented in 564 white participants of the Pathobiological Determinants of Atherosclerosis in Youth study including the extent of fatty streaks and raised lesions in the coronary artery (CF and CR), thoracic aorta (TF and TR), and abdominal aorta (AF and AR) followed by a genetic association analysis of the same phenotypes. Our cluster analysis grouped all raised lesions and fatty streaks in the coronary into one cluster (CF, CR, TR, and AR) and the fatty streaks in the aorta into a second cluster (TF and AF). We found a genetic risk score of high-risk alleles at 57 susceptibility loci for CAD to be variably associated with the phenotypes in the first cluster (OR: 1.30 p = 0.009 for being in top quartile of degree of involvement of CF, 1.34 p = 0.005 for CR, 1.25: p = 0.11 for TR, and 1.19 p = 0.08 for AR) but not at all with the phenotypes in the second cluster (OR: 1.01, p = 0.95 for TF and 0.98, p = 0.82 for AF). The genetic determinants of fatty streaks in the aorta do not appear to overlap substantially with the genetic determinants of fatty streaks in the coronary as well as raised lesions in both the coronary and the aorta. These findings may explain why a larger fraction of fatty streaks in the aorta are less likely to progress to raised lesions compared to the coronary artery.

  5. Associations between a Genetic Risk Score for Clinical CAD and Early Stage Lesions in the Coronary Artery and the Aorta

    PubMed Central

    Herrington, David M.

    2016-01-01

    Objective The correlation between the extent of fatty streaks, more advanced atherosclerotic lesions, and community rates of coronary artery disease (CAD) is substantially higher for the coronary artery compared to the aorta. We sought to determine whether a genetic basis contributes to these differences. Approach and Results We conducted a cluster analysis of 6 subclinical atherosclerosis phenotypes documented in 564 white participants of the Pathobiological Determinants of Atherosclerosis in Youth study including the extent of fatty streaks and raised lesions in the coronary artery (CF and CR), thoracic aorta (TF and TR), and abdominal aorta (AF and AR) followed by a genetic association analysis of the same phenotypes. Our cluster analysis grouped all raised lesions and fatty streaks in the coronary into one cluster (CF, CR, TR, and AR) and the fatty streaks in the aorta into a second cluster (TF and AF). We found a genetic risk score of high-risk alleles at 57 susceptibility loci for CAD to be variably associated with the phenotypes in the first cluster (OR: 1.30 p = 0.009 for being in top quartile of degree of involvement of CF, 1.34 p = 0.005 for CR, 1.25: p = 0.11 for TR, and 1.19 p = 0.08 for AR) but not at all with the phenotypes in the second cluster (OR: 1.01, p = 0.95 for TF and 0.98, p = 0.82 for AF). Conclusions The genetic determinants of fatty streaks in the aorta do not appear to overlap substantially with the genetic determinants of fatty streaks in the coronary as well as raised lesions in both the coronary and the aorta. These findings may explain why a larger fraction of fatty streaks in the aorta are less likely to progress to raised lesions compared to the coronary artery. PMID:27861582

  6. Novel Insights into the Diversity of Catabolic Metabolism from Ten Haloarchaeal Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain; Scheuner, Carmen; Goker, Markus

    2011-05-03

    The extremely halophilic archaea are present worldwide in saline environments and have important biotechnological applications. Ten complete genomes of haloarchaea are now available, providing an opportunity for comparative analysis. We report here the comparative analysis of five newly sequenced haloarchaeal genomes with five previously published ones. Whole genome trees based on protein sequences provide strong support for deep relationships between the ten organisms. Using a soft clustering approach, we identified 887 protein clusters present in all halophiles. Of these core clusters, 112 are not found in any other archaea and therefore constitute the haloarchaeal signature. Four of the halophiles weremore » isolated from water, and four were isolated from soil or sediment. Although there are few habitat-specific clusters, the soil/sediment halophiles tend to have greater capacity for polysaccharide degradation, siderophore synthesis, and cell wall modification. Halorhabdus utahensis and Haloterrigena turkmenica encode over forty glycosyl hydrolases each, and may be capable of breaking down naturally occurring complex carbohydrates. H. utahensis is specialized for growth on carbohydrates and has few amino acid degradation pathways. It uses the non-oxidative pentose phosphate pathway instead of the oxidative pathway, giving it more flexibility in the metabolism of pentoses. These new genomes expand our understanding of haloarchaeal catabolic pathways, providing a basis for further experimental analysis, especially with regard to carbohydrate metabolism. Halophilic glycosyl hydrolases for use in biofuel production are more likely to be found in halophiles isolated from soil or sediment.« less

  7. Cortical atrophy patterns in early Parkinson's disease patients using hierarchical cluster analysis.

    PubMed

    Uribe, Carme; Segura, Barbara; Baggio, Hugo Cesar; Abos, Alexandra; Garcia-Diaz, Anna Isabel; Campabadal, Anna; Marti, Maria Jose; Valldeoriola, Francesc; Compta, Yaroslau; Tolosa, Eduard; Junque, Carme

    2018-05-01

    Cortical brain atrophy detectable with MRI in non-demented advanced Parkinson's disease (PD) is well characterized, but its presence in early disease stages is still under debate. We aimed to investigate cortical atrophy patterns in a large sample of early untreated PD patients using a hypothesis-free data-driven approach. Seventy-seven de novo PD patients and 50 controls from the Parkinson's Progression Marker Initiative database with T1-weighted images in a 3-tesla Siemens scanner were included in this study. Mean cortical thickness was extracted from 360 cortical areas defined by the Human Connectome Project Multi-Modal Parcellation version 1.0, and a hierarchical cluster analysis was performed using Ward's linkage method. A general linear model with cortical thickness data was then used to compare clustering groups using FreeSurfer software. We identified two patterns of cortical atrophy. Compared with controls, patients grouped in pattern 1 (n = 33) were characterized by cortical thinning in bilateral orbitofrontal, anterior cingulate, and lateral and medial anterior temporal gyri. Patients in pattern 2 (n = 44) showed cortical thinning in bilateral occipital gyrus, cuneus, superior parietal gyrus, and left postcentral gyrus, and they showed neuropsychological impairment in memory and other cognitive domains. Even in the early stages of PD, there is evidence of cortical brain atrophy. Neuroimaging clustering analysis is able to detect two subgroups of cortical thinning, one with mainly anterior atrophy, and the other with posterior predominance and worse cognitive performance. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Severe or life-threatening asthma exacerbation: patient heterogeneity identified by cluster analysis.

    PubMed

    Sekiya, K; Nakatani, E; Fukutomi, Y; Kaneda, H; Iikura, M; Yoshida, M; Takahashi, K; Tomii, K; Nishikawa, M; Kaneko, N; Sugino, Y; Shinkai, M; Ueda, T; Tanikawa, Y; Shirai, T; Hirabayashi, M; Aoki, T; Kato, T; Iizuka, K; Homma, S; Taniguchi, M; Tanaka, H

    2016-08-01

    Severe or life-threatening asthma exacerbation is one of the worst outcomes of asthma because of the risk of death. To date, few studies have explored the potential heterogeneity of this condition. To examine the clinical characteristics and heterogeneity of patients with severe or life-threatening asthma exacerbation. This was a multicentre, prospective study of patients with severe or life-threatening asthma exacerbation and pulse oxygen saturation < 90% who were admitted to 17 institutions across Japan. Cluster analysis was performed using variables from patient- and physician-orientated structured questionnaires. Analysis of data from 175 patients with severe or life-threatening asthma exacerbation revealed five distinct clusters. Cluster 1 (n = 27) was younger-onset asthma with severe symptoms at baseline, including limitation of activities, a higher frequency of treatment with oral corticosteroids and short-acting beta-agonists, and a higher frequency of asthma hospitalizations in the past year. Cluster 2 (n = 35) was predominantly composed of elderly females, with the highest frequency of comorbid, chronic hyperplastic rhinosinusitis/nasal polyposis, and a long disease duration. Cluster 3 (n = 40) was allergic asthma without inhaled corticosteroid use at baseline. Patients in this cluster had a higher frequency of atopy, including allergic rhinitis and furred pet hypersensitivity, and a better prognosis during hospitalization compared with the other clusters. Cluster 4 (n = 34) was characterized by elderly males with concomitant chronic obstructive pulmonary disease (COPD). Although cluster 5 (n = 39) had very mild symptoms at baseline according to the patient questionnaires, 41% had previously been hospitalized for asthma. This study demonstrated that significant heterogeneity exists among patients with severe or life-threatening asthma exacerbation. Differences were observed in the severity of asthma symptoms and use of inhaled corticosteroids at baseline, and the presence of comorbid COPD. These findings may contribute to a deeper understanding and better management of this patient population. © 2016 The Authors. Clinical & Experimental Allergy Published by John Wiley & Sons Ltd.

  9. NoSOCS in SDSS - VI. The environmental dependence of AGN in clusters and field in the local Universe

    NASA Astrophysics Data System (ADS)

    Lopes, P. A. A.; Ribeiro, A. L. B.; Rembold, S. B.

    2017-11-01

    We investigated the variation in the fraction of optical active galactic nuclei (AGNs) hosts with stellar mass, as well as their local and global environments. Our sample is composed of cluster members and field galaxies at z ≤ 0.1 and we consider only strong AGN. We find a strong variation in the AGN fraction (FAGN) with stellar mass. The field population comprises a higher AGN fraction compared to the global cluster population, especially for objects with log M* > 10.6. Hence, we restricted our analysis to more massive objects. We detected a smooth variation in the FAGN with local stellar mass density for cluster objects, reaching a plateau in the field environment. As a function of cluster-centric distance we verify that FAGN is roughly constant for R > R200, but show a steep decline inwards. We have also verified the dependence of the AGN population on cluster velocity dispersion, finding a constant behaviour for low mass systems (σP ≲ 650-700 km s-1). However, there is a strong decline in FAGN for higher mass clusters (>700 km s-1). When comparing the FAGN in clusters with or without substructure, we only find different results for objects at large radii (R > R200), in the sense that clusters with substructure present some excess in the AGN fraction. Finally, we have found that the phase-space distribution of AGN cluster members is significantly different than other populations. Due to the environmental dependence of FAGN and their phase-space distribution, we interpret AGN to be the result of galaxy interactions, favoured in environments where the relative velocities are low, typical of the field, low mass groups or cluster outskirts.

  10. A comparative meta-analysis of QTL between intraspecific Gossypium hirsutum interspecific populations and Gossypium hirsutum x Gossypium barbadense populations

    USDA-ARS?s Scientific Manuscript database

    Recent Meta-analysis of quantitative trait loci (QTL) in tetraploid cotton (Gossypium spp.) has identified regions of the genome with high concentrations of various trait QTL called clusters, and specific trait QTL called hotspots. The Meta-analysis included all population types of Gossypium mixing ...

  11. Rotation Period of Blanco 1 Members from KELT Light Curves: Comparing Rotation-Ages to Various Stellar Chronometers at 100 Myr

    NASA Astrophysics Data System (ADS)

    Cargile, Phillip; James, D. J.; Pepper, J.; Kuhn, R.; Siverd, R. J.; Stassun, K. G.

    2012-01-01

    The age of a star is one of its most fundamental properties, and yet tragically it is also the one property that is not directly measurable in observations. We must therefore rely on age estimates based on mostly model-dependent or empirical methods. Moreover, there remains a critical need for direct comparison of different age-dating techniques using the same stars analyzed in a consistent fashion. One chronometer commonly being employed is using stellar rotation rates to measure stellar ages, i.e., gyrochronology. Although this technique is one of the better-understood chronometers, its calibration relies heavily on the solar datum, as well as benchmark open clusters with reliable ages, and also lacks a comprehensive comparative analysis to other stellar chronometers. The age of the nearby (? pc) open cluster Blanco 1 has been estimated using various techniques, including being one of only 7 clusters with an LDB age measurement, making it a unique and powerful comparative laboratory for stellar chronometry, including gyrochronology. Here, we present preliminary results from our light-curve analysis of solar-type stars in Blanco 1 in order to identify and measure rotation periods of cluster members. The light-curve data were obtained during the engineering and calibration phase of the KELT-South survey. The large area on the sky and low number of contaminating field stars makes Blanco 1 an ideal target for the extremely wide field and large pixel scale of the KELT telescope. We apply a period-finding technique using the Lomb-Scargle periodogram and FAP statistics to measure significant rotation periods in the KELT-South light curves for confirmed Blanco 1 members. These new rotation periods allow us to test and inform rotation evolution models for stellar ages at ? Myr, determining a rotation-age for Blanco 1 using gyrochronology, and compare this rotation-age to other age measurements for this cluster.

  12. Performance analysis of unsupervised optimal fuzzy clustering algorithm for MRI brain tumor segmentation.

    PubMed

    Blessy, S A Praylin Selva; Sulochana, C Helen

    2015-01-01

    Segmentation of brain tumor from Magnetic Resonance Imaging (MRI) becomes very complicated due to the structural complexities of human brain and the presence of intensity inhomogeneities. To propose a method that effectively segments brain tumor from MR images and to evaluate the performance of unsupervised optimal fuzzy clustering (UOFC) algorithm for segmentation of brain tumor from MR images. Segmentation is done by preprocessing the MR image to standardize intensity inhomogeneities followed by feature extraction, feature fusion and clustering. Different validation measures are used to evaluate the performance of the proposed method using different clustering algorithms. The proposed method using UOFC algorithm produces high sensitivity (96%) and low specificity (4%) compared to other clustering methods. Validation results clearly show that the proposed method with UOFC algorithm effectively segments brain tumor from MR images.

  13. Network visualization of conformational sampling during molecular dynamics simulation.

    PubMed

    Ahlstrom, Logan S; Baker, Joseph Lee; Ehrlich, Kent; Campbell, Zachary T; Patel, Sunita; Vorontsov, Ivan I; Tama, Florence; Miyashita, Osamu

    2013-11-01

    Effective data reduction methods are necessary for uncovering the inherent conformational relationships present in large molecular dynamics (MD) trajectories. Clustering algorithms provide a means to interpret the conformational sampling of molecules during simulation by grouping trajectory snapshots into a few subgroups, or clusters, but the relationships between the individual clusters may not be readily understood. Here we show that network analysis can be used to visualize the dominant conformational states explored during simulation as well as the connectivity between them, providing a more coherent description of conformational space than traditional clustering techniques alone. We compare the results of network visualization against 11 clustering algorithms and principal component conformer plots. Several MD simulations of proteins undergoing different conformational changes demonstrate the effectiveness of networks in reaching functional conclusions. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Patterns of Self-care in Adults With Heart Failure and Their Associations With Sociodemographic and Clinical Characteristics, Quality of Life, and Hospitalizations: A Cluster Analysis.

    PubMed

    Vellone, Ercole; Fida, Roberta; Ghezzi, Valerio; D'Agostino, Fabio; Biagioli, Valentina; Paturzo, Marco; Strömberg, Anna; Alvaro, Rosaria; Jaarsma, Tiny

    Self-care is important in heart failure (HF) treatment, but patients may have difficulties and be inconsistent in its performance. Inconsistencies in self-care behaviors may mirror patterns of self-care in HF patients that are worth identifying to provide interventions tailored to patients. The aims of this study are to identify clusters of HF patients in relation to self-care behaviors and to examine and compare the profile of each HF patient cluster considering the patient's sociodemographics, clinical variables, quality of life, and hospitalizations. This was a secondary analysis of data from a cross-sectional study in which we enrolled 1192 HF patients across Italy. A cluster analysis was used to identify clusters of patients based on the European Heart Failure Self-care Behaviour Scale factor scores. Analysis of variance and χ test were used to examine the characteristics of each cluster. Patients were 72.4 years old on average, and 58% were men. Four clusters of patients were identified: (1) high consistent adherence with high consulting behaviors, characterized by younger patients, with higher formal education and higher income, less clinically compromised, with the best physical and mental quality of life (QOL) and lowest hospitalization rates; (2) low consistent adherence with low consulting behaviors, characterized mainly by male patients, with lower formal education and lowest income, more clinically compromised, and worse mental QOL; (3) inconsistent adherence with low consulting behaviors, characterized by patients who were less likely to have a caregiver, with the longest illness duration, the highest number of prescribed medications, and the best mental QOL; (4) and inconsistent adherence with high consulting behaviors, characterized by patients who were mostly female, with lower formal education, worst cognitive impairment, worst physical and mental QOL, and higher hospitalization rates. The 4 clusters identified in this study and their associated characteristics could be used to tailor interventions aimed at improving self-care behaviors in HF patients.

  15. Functional connectivity analysis of the neural bases of emotion regulation: A comparison of independent component method with density-based k-means clustering method.

    PubMed

    Zou, Ling; Guo, Qian; Xu, Yi; Yang, Biao; Jiao, Zhuqing; Xiang, Jianbo

    2016-04-29

    Functional magnetic resonance imaging (fMRI) is an important tool in neuroscience for assessing connectivity and interactions between distant areas of the brain. To find and characterize the coherent patterns of brain activity as a means of identifying brain systems for the cognitive reappraisal of the emotion task, both density-based k-means clustering and independent component analysis (ICA) methods can be applied to characterize the interactions between brain regions involved in cognitive reappraisal of emotion. Our results reveal that compared with the ICA method, the density-based k-means clustering method provides a higher sensitivity of polymerization. In addition, it is more sensitive to those relatively weak functional connection regions. Thus, the study concludes that in the process of receiving emotional stimuli, the relatively obvious activation areas are mainly distributed in the frontal lobe, cingulum and near the hypothalamus. Furthermore, density-based k-means clustering method creates a more reliable method for follow-up studies of brain functional connectivity.

  16. Fast gene ontology based clustering for microarray experiments.

    PubMed

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  17. Cluster analysis in systems of magnetic spheres and cubes

    NASA Astrophysics Data System (ADS)

    Pyanzina, E. S.; Gudkova, A. V.; Donaldson, J. G.; Kantorovich, S. S.

    2017-06-01

    In the present work we use molecular dynamics simulations and graph-theory based cluster analysis to compare self-assembly in systems of magnetic spheres, and cubes where the dipole moment is oriented along the side of the cube in the [001] crystallographic direction. We show that under the same conditions cubes aggregate far less than their spherical counterparts. This difference can be explained in terms of the volume of phase space in which the formation of the bond is thermodynamically advantageous. It follows that this volume is much larger for a dipolar sphere than for a dipolar cube.

  18. X-Ray Morphological Analysis of the Planck ESZ Clusters

    NASA Astrophysics Data System (ADS)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Ettori, Stefano; Andrade-Santos, Felipe; Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W.; Randall, Scott; Kraft, Ralph

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev-Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev-Zeldovich (ESZ) objects observed with XMM-Newton. We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.

  19. X-Ray Morphological Analysis of the Planck ESZ Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper wemore » determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.« less

  20. Obesity-promoting food environments and the spatial clustering of food outlets around schools.

    PubMed

    Day, Peter L; Pearce, Jamie

    2011-02-01

    The increasing prevalence of overweight and obesity in school-aged children is potentially linked to contextual influences such as the food environment around schools. The proximity of fast-food and convenience stores to schools may enhance access to unhealthy foods and have a negative impact on diet. This study used spatial cluster analysis to determine whether food outlets are clustered around schools and evaluated the extent of food outlet clustering by school and school neighborhood sociodemographic characteristics. The locations in 2008 of all schools, fast-food outlets, and convenience stores in five urban regions across New Zealand were geocoded. Using GIS analysis conducted in 2009, the number and proportion of outlets within 400-m and 800-m road distance around each school was calculated. The spatial clustering of food outlets within 1.5 km of schools was determined using a multi-type K-function. Food outlet type, school level, SES, the degree of population density, and commercial land use zoning around each school were compared. Primary/intermediate schools had a total proportion of 19.3 outlets per 1000 students within 800 m compared to 6.6 for secondary schools. The most socially deprived quintile of schools had three times the number and proportion of food outlets compared to the least-deprived quintile. There was a high degree of clustering of food outlets around schools, with up to 5.5 times more outlets than might be expected. Outlets were most clustered up to 800 m from schools and around secondary schools, socially deprived schools, and schools in densely populated and commercially zoned areas. Food environments in New Zealand within walking proximity to schools are characterized by a high density of fast-food outlets and convenience stores, particularly in more-socially deprived settings. These obesogenic environments provide ready access to obesity-promoting foods that may have a negative impact on student diet and contribute to inequalities in health. Copyright © 2011 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  1. Comparative Genomic Analysis Reveals Organization, Function and Evolution of ars Genes in Pantoea spp.

    PubMed

    Wang, Liying; Wang, Jin; Jing, Chuanyong

    2017-01-01

    Numerous genes are involved in various strategies to resist toxic arsenic (As). However, the As resistance strategy in genus Pantoea is poorly understood. In this study, a comparative genome analysis of 23 Pantoea genomes was conducted. Two vertical genetic arsC -like genes without any contribution to As resistance were found to exist in the 23 Pantoea strains. Besides the two arsC -like genes, As resistance gene clusters arsRBC or arsRBCH were found in 15 Pantoea genomes. These ars clusters were found to be acquired by horizontal gene transfer (HGT) from sources related to Franconibacter helveticus, Serratia marcescens , and Citrobacter freundii . During the history of evolution, the ars clusters were acquired more than once in some species, and were lost in some strains, producing strains without As resistance capability. This study revealed the organization, distribution and the complex evolutionary history of As resistance genes in Pantoea spp.. The insights gained in this study improved our understanding on the As resistance strategy of Pantoea spp. and its roles in the biogeochemical cycling of As.

  2. Comparative Genomic Analysis Reveals Organization, Function and Evolution of ars Genes in Pantoea spp.

    PubMed Central

    Wang, Liying; Wang, Jin; Jing, Chuanyong

    2017-01-01

    Numerous genes are involved in various strategies to resist toxic arsenic (As). However, the As resistance strategy in genus Pantoea is poorly understood. In this study, a comparative genome analysis of 23 Pantoea genomes was conducted. Two vertical genetic arsC-like genes without any contribution to As resistance were found to exist in the 23 Pantoea strains. Besides the two arsC-like genes, As resistance gene clusters arsRBC or arsRBCH were found in 15 Pantoea genomes. These ars clusters were found to be acquired by horizontal gene transfer (HGT) from sources related to Franconibacter helveticus, Serratia marcescens, and Citrobacter freundii. During the history of evolution, the ars clusters were acquired more than once in some species, and were lost in some strains, producing strains without As resistance capability. This study revealed the organization, distribution and the complex evolutionary history of As resistance genes in Pantoea spp.. The insights gained in this study improved our understanding on the As resistance strategy of Pantoea spp. and its roles in the biogeochemical cycling of As. PMID:28377759

  3. Comparative study of endophytic and endophytic diazotrophic bacterial communities across rice landraces grown in the highlands of northern Thailand.

    PubMed

    Rangjaroen, Chakrapong; Rerkasem, Benjavan; Teaumroong, Neung; Sungthong, Rungroch; Lumyong, Saisamorn

    2014-01-01

    Communities of bacterial endophytes within the rice landraces cultivated in the highlands of northern Thailand were studied using fingerprinting data of 16S rRNA and nifH genes profiling by polymerase chain reaction-denaturing gradient gel electrophoresis. The bacterial communities' richness, diversity index, evenness, and stability were varied depending on the plant tissues, stages of growth, and rice cultivars. These indices for the endophytic diazotrophic bacteria within the landrace rice Bue Wah Bo were significantly the lowest. The endophytic bacteria revealed greater diversity by cluster analysis with seven clusters compared to the endophytic diazotrophic bacteria (three clusters). Principal component analysis suggested that the endophytic bacteria showed that the community structures across the rice landraces had a higher stability than those of the endophytic diazotrophic bacteria. Uncultured bacteria were found dominantly in both bacterial communities, while higher generic varieties were observed in the endophytic diazotrophic bacterial community. These differences in bacterial communities might be influenced either by genetic variation in the rice landraces or the rice cultivation system, where the nitrogen input affects the endophytic diazotrophic bacterial community.

  4. Cluster folding analysis of 20Ne+16O elastic transfer

    NASA Astrophysics Data System (ADS)

    Hamada, Sh.; Keeley, N.; Kemper, K. W.; Rusek, K.

    2018-05-01

    The available experimental data for the 20Ne+16O system in the energy range where the effect of α -cluster transfer is well observed are reanalyzed using the cluster folding model. The cluster folding potential, which includes both real and imaginary terms, reproduces the data at forward angles and the inclusion of the 16O(20Ne,16O)20Ne elastic transfer process provides a satisfactory description of the backward angles. The spectroscopic factor for the 20Ne→16O+α overlap was extracted and compared with other values from the literature. The present results suggest that the (20Ne,16O ) reaction might be an alternative means of exploring the α -particle structure of nuclei.

  5. Combinatoric analysis of heterogeneous stochastic self-assembly.

    PubMed

    D'Orsogna, Maria R; Zhao, Bingyu; Berenji, Bijan; Chou, Tom

    2013-09-28

    We analyze a fully stochastic model of heterogeneous nucleation and self-assembly in a closed system with a fixed total particle number M, and a fixed number of seeds Ns. Each seed can bind a maximum of N particles. A discrete master equation for the probability distribution of the cluster sizes is derived and the corresponding cluster concentrations are found using kinetic Monte-Carlo simulations in terms of the density of seeds, the total mass, and the maximum cluster size. In the limit of slow detachment, we also find new analytic expressions and recursion relations for the cluster densities at intermediate times and at equilibrium. Our analytic and numerical findings are compared with those obtained from classical mass-action equations and the discrepancies between the two approaches analyzed.

  6. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

    PubMed Central

    2009-01-01

    Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996

  7. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

    PubMed

    Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

    2009-10-12

    Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.

  8. Mass profile and dynamical status of the z ~ 0.8 galaxy cluster LCDCS 0504

    NASA Astrophysics Data System (ADS)

    Guennou, L.; Biviano, A.; Adami, C.; Limousin, M.; Lima Neto, G. B.; Mamon, G. A.; Ulmer, M. P.; Gavazzi, R.; Cypriano, E. S.; Durret, F.; Clowe, D.; LeBrun, V.; Allam, S.; Basa, S.; Benoist, C.; Cappi, A.; Halliday, C.; Ilbert, O.; Johnston, D.; Jullo, E.; Just, D.; Kubo, J. M.; Márquez, I.; Marshall, P.; Martinet, N.; Maurogordato, S.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Schrabback, T.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.

    2014-06-01

    Context. Constraints on the mass distribution in high-redshift clusters of galaxies are currently not very strong. Aims: We aim to constrain the mass profile, M(r), and dynamical status of the z ~ 0.8 LCDCS 0504 cluster of galaxies that is characterized by prominent giant gravitational arcs near its center. Methods: Our analysis is based on deep X-ray, optical, and infrared imaging as well as optical spectroscopy, collected with various instruments, which we complemented with archival data. We modeled the mass distribution of the cluster with three different mass density profiles, whose parameters were constrained by the strong lensing features of the inner cluster region, by the X-ray emission from the intracluster medium, and by the kinematics of 71 cluster members. Results: We obtain consistent M(r) determinations from three methods based on kinematics (dispersion-kurtosis, caustics, and MAMPOSSt), out to the cluster virial radius, ≃1.3 Mpc and beyond. The mass profile inferred by the strong lensing analysis in the central cluster region is slightly higher than, but still consistent with, the kinematics estimate. On the other hand, the X-ray based M(r) is significantly lower than the kinematics and strong lensing estimates. Theoretical predictions from ΛCDM cosmology for the concentration-mass relation agree with our observational results, when taking into account the uncertainties in the observational and theoretical estimates. There appears to be a central deficit in the intracluster gas mass fraction compared with nearby clusters. Conclusions: Despite the relaxed appearance of this cluster, the determinations of its mass profile by different probes show substantial discrepancies, the origin of which remains to be determined. The extension of a dynamical analysis similar to that of other clusters of the DAFT/FADA survey with multiwavelength data of sufficient quality will allow shedding light on the possible systematics that affect the determination of mass profiles of high-z clusters, which is possibly related to our incomplete understanding of intracluster baryon physics. Table 2 is available in electronic form at http://www.aanda.org

  9. Diversity in Older Adults' Use of the Internet: Identifying Subgroups Through Latent Class Analysis.

    PubMed

    van Boekel, Leonieke C; Peek, Sebastiaan Tm; Luijkx, Katrien G

    2017-05-24

    As for all individuals, the Internet is important in the everyday life of older adults. Research on older adults' use of the Internet has merely focused on users versus nonusers and consequences of Internet use and nonuse. Older adults are a heterogeneous group, which may implicate that their use of the Internet is diverse as well. Older adults can use the Internet for different activities, and this usage can be of influence on benefits the Internet can have for them. The aim of this paper was to describe the diversity or heterogeneity in the activities for which older adults use the Internet and determine whether diversity is related to social or health-related variables. We used data of a national representative Internet panel in the Netherlands. Panel members aged 65 years and older and who have access to and use the Internet were selected (N=1418). We conducted a latent class analysis based on the Internet activities that panel members reported to spend time on. Second, we described the identified clusters with descriptive statistics and compared the clusters using analysis of variance (ANOVA) and chi-square tests. Four clusters were distinguished. Cluster 1 was labeled as the "practical users" (36.88%, n=523). These respondents mainly used the Internet for practical and financial purposes such as searching for information, comparing products, and banking. Respondents in Cluster 2, the "minimizers" (32.23%, n=457), reported lowest frequency on most Internet activities, are older (mean age 73 years), and spent the smallest time on the Internet. Cluster 3 was labeled as the "maximizers" (17.77%, n=252); these respondents used the Internet for various activities, spent most time on the Internet, and were relatively younger (mean age below 70 years). Respondents in Cluster 4, the "social users," mainly used the Internet for social and leisure-related activities such as gaming and social network sites. The identified clusters significantly differed in age (P<.001, ω 2 =0.07), time spent on the Internet (P<.001, ω 2 =0.12), and frequency of downloading apps (P<.001, ω 2 =0.14), with medium to large effect sizes. Social and health-related variables were significantly different between the clusters, except social and emotional loneliness. However, effect sizes were small. The minimizers scored significantly lower on psychological well-being, instrumental activities of daily living (iADL), and experienced health compared with the practical users and maximizers. Older adults are a diverse group in terms of their activities on the Internet. This underlines the importance to look beyond use versus nonuse when studying older adults' Internet use. The clusters we have identified in this study can help tailor the development and deployment of eHealth intervention to specific segments of the older population. ©Leonieke C van Boekel, Sebastiaan TM Peek, Katrien G Luijkx. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 24.05.2017.

  10. Diversity in Older Adults’ Use of the Internet: Identifying Subgroups Through Latent Class Analysis

    PubMed Central

    van Boekel, Leonieke C; Peek, Sebastiaan TM; Luijkx, Katrien G

    2017-01-01

    Background As for all individuals, the Internet is important in the everyday life of older adults. Research on older adults’ use of the Internet has merely focused on users versus nonusers and consequences of Internet use and nonuse. Older adults are a heterogeneous group, which may implicate that their use of the Internet is diverse as well. Older adults can use the Internet for different activities, and this usage can be of influence on benefits the Internet can have for them. Objective The aim of this paper was to describe the diversity or heterogeneity in the activities for which older adults use the Internet and determine whether diversity is related to social or health-related variables. Methods We used data of a national representative Internet panel in the Netherlands. Panel members aged 65 years and older and who have access to and use the Internet were selected (N=1418). We conducted a latent class analysis based on the Internet activities that panel members reported to spend time on. Second, we described the identified clusters with descriptive statistics and compared the clusters using analysis of variance (ANOVA) and chi-square tests. Results Four clusters were distinguished. Cluster 1 was labeled as the “practical users” (36.88%, n=523). These respondents mainly used the Internet for practical and financial purposes such as searching for information, comparing products, and banking. Respondents in Cluster 2, the “minimizers” (32.23%, n=457), reported lowest frequency on most Internet activities, are older (mean age 73 years), and spent the smallest time on the Internet. Cluster 3 was labeled as the “maximizers” (17.77%, n=252); these respondents used the Internet for various activities, spent most time on the Internet, and were relatively younger (mean age below 70 years). Respondents in Cluster 4, the “social users,” mainly used the Internet for social and leisure-related activities such as gaming and social network sites. The identified clusters significantly differed in age (P<.001, ω2=0.07), time spent on the Internet (P<.001, ω2=0.12), and frequency of downloading apps (P<.001, ω2=0.14), with medium to large effect sizes. Social and health-related variables were significantly different between the clusters, except social and emotional loneliness. However, effect sizes were small. The minimizers scored significantly lower on psychological well-being, instrumental activities of daily living (iADL), and experienced health compared with the practical users and maximizers. Conclusions Older adults are a diverse group in terms of their activities on the Internet. This underlines the importance to look beyond use versus nonuse when studying older adults’ Internet use. The clusters we have identified in this study can help tailor the development and deployment of eHealth intervention to specific segments of the older population. PMID:28539302

  11. Probing the History of Galaxy Clusters with Metallicity and Entropy Measurements

    NASA Astrophysics Data System (ADS)

    Elkholy, Tamer Yohanna

    Galaxy clusters are the largest gravitationally bound objects found today in our Universe. The gas they contain, the intra-cluster medium (ICM), is heated to temperatures in the approximate range of 1 to 10 keV, and thus emits X-ray radiation. Studying the ICM through the spatial and spectral analysis of its emission returns the richest information about both the overall cosmological context which governs the formation of clusters, as well as the physical processes occurring within. The aim of this thesis is to learn about the history of the physical processes that drive the evolution of galaxy clusters, through careful, spatially resolved measurements of their metallicity and entropy content. A sample of 45 nearby clusters observed with Chandra is analyzed to produce radial density, temperature, entropy and metallicity profiles. The entropy profiles are computed to larger radial extents than in previous Chandra analyses. The results of this analysis are made available to the scientific community in an electronic database. Comparing metallicity and entropy in the outskirts of clusters, we find no signature on the entropy profiles of the ensemble of supernovae that produced the observed metals. In the centers of clusters, we find that the metallicities of high-mass clusters are much less dispersed than those of low-mass clusters. A comparison of metallicity with the regularity of the X-ray emission morphology suggests that metallicities in low-mass clusters are more susceptible to increase from violent events such as mergers. We also find that the variation in the stellar-to-gas mass ratio as a function of cluster mass can explain the variation of central metallicity with cluster mass, only if we assume that there is a constant level of metallicity for clusters of all masses, above which the observed galaxies add more metals in proportion to their mass. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)

  12. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    PubMed

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Using Cluster Analysis to Examine Husband-Wife Decision Making

    ERIC Educational Resources Information Center

    Bonds-Raacke, Jennifer M.

    2006-01-01

    Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…

  14. Comparative Investigation of Shared Filesystems for the LHCb Online Cluster

    NASA Astrophysics Data System (ADS)

    Vijay Kartik, S.; Neufeld, Niko

    2012-12-01

    This paper describes the investigative study undertaken to evaluate shared filesystem performance and suitability in the LHCb Online environment. Particular focus is given to the measurements and field tests designed and performed on an in-house OpenAFS setup; related comparisons with NFSv4 and GPFS (a clustered filesystem from IBM) are presented. The motivation for the investigation and the test setup arises from the need to serve common user-space like home directories, experiment software and control areas, and clustered log areas. Since the operational requirements on such user-space are stringent in terms of read-write operations (in frequency and access speed) and unobtrusive data relocation, test results are presented with emphasis on file-level performance, stability and “high-availability” of the shared filesystems. Use cases specific to the experiment operation in LHCb, including the specific handling of shared filesystems served to a cluster of 1500 diskless nodes, are described. Issues of prematurely expiring authenticated sessions are explicitly addressed, keeping in mind long-running analysis jobs on the Online cluster. In addition, quantitative test results are also presented with alternatives including NFSv4. Comparative measurements of filesystem performance benchmarks are presented, which are seen to be used as reference for decisions on potential migration of the current storage solution deployed in the LHCb online cluster.

  15. Quantum chemical calculations in the structural analysis of phloretin

    NASA Astrophysics Data System (ADS)

    Gómez-Zavaglia, Andrea

    2009-07-01

    In this work, a conformational search on the molecule of phloretin [2',4',6'-Trihydroxy-3-(4-hydroxyphenyl)-propiophenone] has been performed. The molecule of phloretin has eight dihedral angles, four of them taking part in the carbon backbone and the other four, related with the orientation of the hydroxyl groups. A systematic search involving a random variation of the dihedral angles has been used to generate input structures for the quantum chemical calculations. Calculations at the DFT(B3LYP)/6-311++G(d,p) level of theory permitted the identification of 58 local minima belonging to the C 1 symmetry point group. The molecular structures of the conformers have been analyzed using hierarchical cluster analysis. This method allowed us to group conformers according to their similarities, and thus, to correlate the conformers' stability with structural parameters. The dendrogram obtained from the hierarchical cluster analysis depicted two main clusters. Cluster I included all the conformers with relative energies lower than 25 kJ mol -1 and cluster II, the remaining conformers. The possibility of forming intramolecular hydrogen bonds resulted the main factor contributing for the stability. Accordingly, all conformers depicting intramolecular H-bonds belong to cluster I. These conformations are clearly favored when the carbon backbone is as planar as possible. The values of the νC dbnd O and νOH vibrational modes were compared among all the conformers of phloretin. The redshifts associated with intramolecular H-bonds were correlated with the H-bonds distances and energies.

  16. Clustering of lifestyle risk behaviours among residents of forty deprived neighbourhoods in London: lessons for targeting public health interventions.

    PubMed

    Watts, P; Buck, D; Netuveli, G; Renton, A

    2016-06-01

    Clustering of lifestyle risk behaviours is very important in predicting premature mortality. Understanding the extent to which risk behaviours are clustered in deprived communities is vital to most effectively target public health interventions. We examined co-occurrence and associations between risk behaviours (smoking, alcohol consumption, poor diet, low physical activity and high sedentary time) reported by adults living in deprived London neighbourhoods. Associations between sociodemographic characteristics and clustered risk behaviours were examined. Latent class analysis was used to identify underlying clustering of behaviours. Over 90% of respondents reported at least one risk behaviour. Reporting specific risk behaviours predicted reporting of further risk behaviours. Latent class analyses revealed four underlying classes. Membership of a maximal risk behaviour class was more likely for young, white males who were unable to work. Compared with recent national level analysis, there was a weaker relationship between education and clustering of behaviours and a very high prevalence of clustering of risk behaviours in those unable to work. Young, white men who report difficulty managing on income were at high risk of reporting multiple risk behaviours. These groups may be an important target for interventions to reduce premature mortality caused by multiple risk behaviours. © The Author 2015. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Optical spectroscopy and velocity dispersions of galaxy clusters from the SPT-SZ survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruel, J.; Bayliss, M.; Bazin, G.

    2014-09-01

    We present optical spectroscopy of galaxies in clusters detected through the Sunyaev-Zel'dovich (SZ) effect with the South Pole Telescope (SPT). We report our own measurements of 61 spectroscopic cluster redshifts, and 48 velocity dispersions each calculated with more than 15 member galaxies. This catalog also includes 19 dispersions of SPT-observed clusters previously reported in the literature. The majority of the clusters in this paper are SPT-discovered; of these, most have been previously reported in other SPT cluster catalogs, and five are reported here as SPT discoveries for the first time. By performing a resampling analysis of galaxy velocities, we findmore » that unbiased velocity dispersions can be obtained from a relatively small number of member galaxies (≲ 30), but with increased systematic scatter. We use this analysis to determine statistical confidence intervals that include the effect of membership selection. We fit scaling relations between the observed cluster velocity dispersions and mass estimates from SZ and X-ray observables. In both cases, the results are consistent with the scaling relation between velocity dispersion and mass expected from dark-matter simulations. We measure a ∼30% log-normal scatter in dispersion at fixed mass, and a ∼10% offset in the normalization of the dispersion-mass relation when compared to the expectation from simulations, which is within the expected level of systematic uncertainty.« less

  18. Role and Regulation of the Flp/Tad Pilus in the Virulence of Pectobacterium atrosepticum SCRI1043 and Pectobacterium wasabiae SCC3193

    PubMed Central

    Nykyri, Johanna; Mattinen, Laura; Niemi, Outi; Adhikari, Satish; Kõiv, Viia; Somervuo, Panu; Fang, Xin; Auvinen, Petri; Mäe, Andres; Palva, E. Tapio; Pirhonen, Minna

    2013-01-01

    In this study, we characterized a putative Flp/Tad pilus-encoding gene cluster, and we examined its regulation at the transcriptional level and its role in the virulence of potato pathogenic enterobacteria of the genus Pectobacterium. The Flp/Tad pilus-encoding gene clusters in Pectobacterium atrosepticum, Pectobacterium wasabiae and Pectobacterium aroidearum were compared to previously characterized flp/tad gene clusters, including that of the well-studied Flp/Tad pilus model organism Aggregatibacter actinomycetemcomitans, in which this pilus is a major virulence determinant. Comparative analyses revealed substantial protein sequence similarity and open reading frame synteny between the previously characterized flp/tad gene clusters and the cluster in Pectobacterium, suggesting that the predicted flp/tad gene cluster in Pectobacterium encodes a Flp/Tad pilus-like structure. We detected genes for a novel two-component system adjacent to the flp/tad gene cluster in Pectobacterium, and mutant analysis demonstrated that this system has a positive effect on the transcription of selected Flp/Tad pilus biogenesis genes, suggesting that this response regulator regulate the flp/tad gene cluster. Mutagenesis of either the predicted regulator gene or selected Flp/Tad pilus biogenesis genes had a significant impact on the maceration ability of the bacterial strains in potato tubers, indicating that the Flp/Tad pilus-encoding gene cluster represents a novel virulence determinant in Pectobacterium. Soft-rot enterobacteria in the genera Pectobacterium and Dickeya are of great agricultural importance, and an investigation of the virulence of these pathogens could facilitate improvements in agricultural practices, thus benefiting farmers, the potato industry and consumers. PMID:24040039

  19. Role and regulation of the Flp/Tad pilus in the virulence of Pectobacterium atrosepticum SCRI1043 and Pectobacterium wasabiae SCC3193.

    PubMed

    Nykyri, Johanna; Mattinen, Laura; Niemi, Outi; Adhikari, Satish; Kõiv, Viia; Somervuo, Panu; Fang, Xin; Auvinen, Petri; Mäe, Andres; Palva, E Tapio; Pirhonen, Minna

    2013-01-01

    In this study, we characterized a putative Flp/Tad pilus-encoding gene cluster, and we examined its regulation at the transcriptional level and its role in the virulence of potato pathogenic enterobacteria of the genus Pectobacterium. The Flp/Tad pilus-encoding gene clusters in Pectobacterium atrosepticum, Pectobacterium wasabiae and Pectobacterium aroidearum were compared to previously characterized flp/tad gene clusters, including that of the well-studied Flp/Tad pilus model organism Aggregatibacter actinomycetemcomitans, in which this pilus is a major virulence determinant. Comparative analyses revealed substantial protein sequence similarity and open reading frame synteny between the previously characterized flp/tad gene clusters and the cluster in Pectobacterium, suggesting that the predicted flp/tad gene cluster in Pectobacterium encodes a Flp/Tad pilus-like structure. We detected genes for a novel two-component system adjacent to the flp/tad gene cluster in Pectobacterium, and mutant analysis demonstrated that this system has a positive effect on the transcription of selected Flp/Tad pilus biogenesis genes, suggesting that this response regulator regulate the flp/tad gene cluster. Mutagenesis of either the predicted regulator gene or selected Flp/Tad pilus biogenesis genes had a significant impact on the maceration ability of the bacterial strains in potato tubers, indicating that the Flp/Tad pilus-encoding gene cluster represents a novel virulence determinant in Pectobacterium. Soft-rot enterobacteria in the genera Pectobacterium and Dickeya are of great agricultural importance, and an investigation of the virulence of these pathogens could facilitate improvements in agricultural practices, thus benefiting farmers, the potato industry and consumers.

  20. Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations and numerical simulations

    NASA Technical Reports Server (NTRS)

    Darouzet, Fabien; DeKeyser, Johan; Decreau, Pierrette; Gallagher, Dennis; Pierrard, Viviane; Lemaire, Joseph; Dandouras, Iannis; Matsui, Hiroshi; Dunlop, Malcolm; Andre, Mats

    2005-01-01

    Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles can be derived from the plasma frequency and/or from the spacecraft potential (note that the electron spectrometer is usually not operating inside the plasmasphere); ion velocity is also measured onboard these satellites (but ion density is not reliable because of instrumental limitations). The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 minutes; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations for 3 plume events and compare CLUSTER in-situ data (panel A) with global images of the plasmasphere obtained from IMAGE (panel B), and with numerical simulations for the formation of plumes based on a model that includes the interchange instability mechanism (panel C). In particular, we study the geometry and the orientation of plasmaspheric plumes by using a four-point analysis method, the spatial gradient. We also compare several aspects of their motion as determined by different methods: (i) inner and outer plume boundary velocity calculated from time delays of this boundary observed by the wave experiment WHISPER on the four spacecraft, (ii) ion velocity derived from the ion spectrometer CIS onboard CLUSTER, (iii) drift velocity measured by the electron drift instrument ED1 onboard CLUSTER and (iv) global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.

  1. Using cluster analysis and a classification and regression tree model to developed cover types in the Sky Islands of southeastern Arizona

    Treesearch

    Jose M. Iniguez; Joseph L. Ganey; Peter J. Daughtery; John D. Bailey

    2005-01-01

    The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such a system we qualitatively and quantitatively compared a hierarchical (Ward’s) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups represented by...

  2. Using cluster analysis and a classification and regression tree model to developed cover types in the Sky Islands of southeastern Arizona [Abstract

    Treesearch

    Jose M. Iniguez; Joseph L. Ganey; Peter J. Daugherty; John D. Bailey

    2005-01-01

    The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such system we qualitatively and quantitatively compared a hierarchical (Ward’s) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups and plots...

  3. Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks.

    PubMed

    Dinov, Martin; Leech, Robert

    2017-01-01

    Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.

  4. Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks

    PubMed Central

    Dinov, Martin; Leech, Robert

    2017-01-01

    Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110

  5. Discrimination of multilocus sequence typing-based Campylobacter jejuni subgroups by MALDI-TOF mass spectrometry.

    PubMed

    Zautner, Andreas Erich; Masanta, Wycliffe Omurwa; Tareen, Abdul Malik; Weig, Michael; Lugert, Raimond; Groß, Uwe; Bader, Oliver

    2013-11-07

    Campylobacter jejuni, the most common bacterial pathogen causing gastroenteritis, shows a wide genetic diversity. Previously, we demonstrated by the combination of multi locus sequence typing (MLST)-based UPGMA-clustering and analysis of 16 genetic markers that twelve different C. jejuni subgroups can be distinguished. Among these are two prominent subgroups. The first subgroup contains the majority of hyperinvasive strains and is characterized by a dimeric form of the chemotaxis-receptor Tlp7(m+c). The second has an extended amino acid metabolism and is characterized by the presence of a periplasmic asparaginase (ansB) and gamma-glutamyl-transpeptidase (ggt). Phyloproteomic principal component analysis (PCA) hierarchical clustering of MALDI-TOF based intact cell mass spectrometry (ICMS) spectra was able to group particular C. jejuni subgroups of phylogenetic related isolates in distinct clusters. Especially the aforementioned Tlp7(m+c)(+) and ansB+/ ggt+ subgroups could be discriminated by PCA. Overlay of ICMS spectra of all isolates led to the identification of characteristic biomarker ions for these specific C. jejuni subgroups. Thus, mass peak shifts can be used to identify the C. jejuni subgroup with an extended amino acid metabolism. Although the PCA hierarchical clustering of ICMS-spectra groups the tested isolates into a different order as compared to MLST-based UPGMA-clustering, the isolates of the indicator-groups form predominantly coherent clusters. These clusters reflect phenotypic aspects better than phylogenetic clustering, indicating that the genes corresponding to the biomarker ions are phylogenetically coupled to the tested marker genes. Thus, PCA clustering could be an additional tool for analyzing the relatedness of bacterial isolates.

  6. Benzoate-Induced High-Nuclearity Silver Thiolate Clusters.

    PubMed

    Su, Yan-Min; Liu, Wei; Wang, Zhi; Wang, Shu-Ao; Li, Yan-An; Yu, Fei; Zhao, Quan-Qin; Wang, Xing-Po; Tung, Chen-Ho; Sun, Di

    2018-04-03

    Compared with the well-known anion-templated effects in shaping silver thiolate clusters, the influence from the organic ligands in the outer shell is still poorly understood. Herein, three new benzoate-functionalized high-nuclearity silver(I) thiolate clusters are isolated and characterized for the first time in the presence of diverse anion templates such as S 2- , α-[Mo 5 O 18 ] 6- , and MoO 4 2- . Single-crystal X-ray analysis reveals that the nuclearities of the three silver clusters (SD/Ag28, SD/Ag29, SD/Ag30) vary from 32 to 38 to 78 with co-capped tBuS - and benzoate ligands on the surface. SD/Ag28 is a turtle-like cluster comprising a Ag 29 shell caging a Ag 3 S 3 trigon in the center, whereas SD/Ag29 is a prolate Ag 38 sphere templated by the α-[Mo 5 O 18 ] 6- anion. Upon changing from benzoate to methoxyl-substituted benzoate, SD/Ag30 is isolated as a very complicated core-shell spherical cluster composed of a Ag 57 shell and a vase-like Ag 21 S 13 core. Four MoO 4 2- anions are arranged in a supertetrahedron and located in the interstice between the core and shell. Introduction of the bulky benzoate changes elaborately the nuclearity and arrangements of silver polygons on the shell of silver clusters, which is exemplified by comparing SD/Ag28 and a known similar silver thiolate cluster. The three new clusters emit luminescence in the near-infrared (NIR) region and show different thermochromic luminescence properties. This work presents a flexible approach to synthetic studies of high-nuclearity silver clusters decorated by different benzoates, and structural modulations are also achieved. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Statistical design and analysis plan for an impact evaluation of an HIV treatment and prevention intervention for female sex workers in Zimbabwe: a study protocol for a cluster randomised controlled trial.

    PubMed

    Hargreaves, James R; Fearon, Elizabeth; Davey, Calum; Phillips, Andrew; Cambiano, Valentina; Cowan, Frances M

    2016-01-05

    Pragmatic cluster-randomised trials should seek to make unbiased estimates of effect and be reported according to CONSORT principles, and the study population should be representative of the target population. This is challenging when conducting trials amongst 'hidden' populations without a sample frame. We describe a pair-matched cluster-randomised trial of a combination HIV-prevention intervention to reduce the proportion of female sex workers (FSW) with a detectable HIV viral load in Zimbabwe, recruiting via respondent driven sampling (RDS). We will cross-sectionally survey approximately 200 FSW at baseline and at endline to characterise each of 14 sites. RDS is a variant of chain referral sampling and has been adapted to approximate random sampling. Primary analysis will use the 'RDS-2' method to estimate cluster summaries and will adapt Hayes and Moulton's '2-step' method to adjust effect estimates for individual-level confounders and further adjust for cluster baseline prevalence. We will adapt CONSORT to accommodate RDS. In the absence of observable refusal rates, we will compare the recruitment process between matched pairs. We will need to investigate whether cluster-specific recruitment or the intervention itself affects the accuracy of the RDS estimation process, potentially causing differential biases. To do this, we will calculate RDS-diagnostic statistics for each cluster at each time point and compare these statistics within matched pairs and time points. Sensitivity analyses will assess the impact of potential biases arising from assumptions made by the RDS-2 estimation. We are not aware of any other completed pragmatic cluster RCTs that are recruiting participants using RDS. Our statistical design and analysis approach seeks to transparently document participant recruitment and allow an assessment of the representativeness of the study to the target population, a key aspect of pragmatic trials. The challenges we have faced in the design of this trial are likely to be shared in other contexts aiming to serve the needs of legally and/or socially marginalised populations for which no sampling frame exists and especially when the social networks of participants are both the target of intervention and the means of recruitment. The trial was registered at Pan African Clinical Trials Registry (PACTR201312000722390) on 9 December 2013.

  8. Systematic Association of Genes to Phenotypes by Genome and Literature Mining

    PubMed Central

    Jensen, Lars J; Perez-Iratxeta, Carolina; Kaczanowski, Szymon; Hooper, Sean D; Andrade, Miguel A

    2005-01-01

    One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases. PMID:15799710

  9. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    NASA Astrophysics Data System (ADS)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  10. Geographic atrophy phenotype identification by cluster analysis.

    PubMed

    Monés, Jordi; Biarnés, Marc

    2018-03-01

    To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm 2 /year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  11. Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters.

    PubMed

    Berenguer, Roberto; Pastor-Juan, María Del Rosario; Canales-Vázquez, Jesús; Castro-García, Miguel; Villas, María Victoria; Legorburo, Francisco Mansilla; Sabater, Sebastià

    2018-04-24

    Purpose To identify the reproducible and nonredundant radiomics features (RFs) for computed tomography (CT). Materials and Methods Two phantoms were used to test RF reproducibility by using test-retest analysis, by changing the CT acquisition parameters (hereafter, intra-CT analysis), and by comparing five different scanners with the same CT parameters (hereafter, inter-CT analysis). Reproducible RFs were selected by using the concordance correlation coefficient (as a measure of the agreement between variables) and the coefficient of variation (defined as the ratio of the standard deviation to the mean). Redundant features were grouped by using hierarchical cluster analysis. Results A total of 177 RFs including intensity, shape, and texture features were evaluated. The test-retest analysis showed that 91% (161 of 177) of the RFs were reproducible according to concordance correlation coefficient. Reproducibility of intra-CT RFs, based on coefficient of variation, ranged from 89.3% (151 of 177) to 43.1% (76 of 177) where the pitch factor and the reconstruction kernel were modified, respectively. Reproducibility of inter-CT RFs, based on coefficient of variation, also showed large material differences, from 85.3% (151 of 177; wood) to only 15.8% (28 of 177; polyurethane). Ten clusters were identified after the hierarchical cluster analysis and one RF per cluster was chosen as representative. Conclusion Many RFs were redundant and nonreproducible. If all the CT parameters are fixed except field of view, tube voltage, and milliamperage, then the information provided by the analyzed RFs can be summarized in only 10 RFs (each representing a cluster) because of redundancy. © RSNA, 2018 Online supplemental material is available for this article.

  12. miR-17-92 Cluster Promotes Cholangiocarcinoma Growth

    PubMed Central

    Zhu, Hanqing; Han, Chang; Lu, Dongdong; Wu, Tong

    2015-01-01

    miR-17-92 is an oncogenic miRNA cluster implicated in the development of several cancers; however, it remains unknown whether the miR-17-92 cluster is able to regulate cholangiocarcinogenesis. This study was designed to investigate the biological functions and molecular mechanisms of the miR-17-92 cluster in cholangiocarcinoma. In situ hybridization and quantitative RT-PCR analysis showed that the miR-17-92 cluster is highly expressed in human cholangiocarcinoma cells compared with the nonneoplastic biliary epithelial cells. Forced overexpression of the miR-17-92 cluster or its members, miR-92a and miR-19a, in cultured human cholangiocarcinoma cells enhanced tumor cell proliferation, colony formation, and invasiveness, in vitro. Overexpression of the miR-17-92 cluster or miR-92a also enhanced cholangiocarcinoma growth in vivo in hairless outbred mice with severe combined immunodeficiency (SHO-PrkdcscidHrhr). The tumor-suppressor, phosphatase and tensin homolog deleted on chromosome 10 (PTEN), was identified as a bona fide target of both miR-92a and miR-19a in cholangiocarcinoma cells via sequence prediction, 3′ untranslated region luciferase activity assay, and Western blot analysis. Accordingly, overexpression of the PTEN open reading frame protein (devoid of 3′ untranslated region) prevented miR-92a– or miR-19a–induced cholangiocarcinoma cell growth. Microarray analysis revealed additional targets of the miR-17-92 cluster in human cholangiocarcinoma cells, including APAF-1 and PRDM2. Moreover, we observed that the expression of the miR-17-92 cluster is regulated by IL-6/Stat3, a key oncogenic signaling pathway pivotal in cholangiocarcinogenesis. Taken together, our findings disclose a novel IL-6/Stat3–miR-17-92 cluster–PTEN signaling axis that is crucial for cholangiocarcinogenesis and tumor progression. PMID:25239565

  13. Prediction models for clustered data: comparison of a random intercept and standard regression model

    PubMed Central

    2013-01-01

    Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436

  14. Prediction models for clustered data: comparison of a random intercept and standard regression model.

    PubMed

    Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne

    2013-02-15

    When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.

  15. Comparison of Molecular Typing Methods Useful for Detecting Clusters of Campylobacter jejuni and C. coli Isolates through Routine Surveillance

    PubMed Central

    Taboada, Eduardo; Grant, Christopher C. R.; Blakeston, Connie; Pollari, Frank; Marshall, Barbara; Rahn, Kris; MacKinnon, Joanne; Daignault, Danielle; Pillai, Dylan; Ng, Lai-King

    2012-01-01

    Campylobacter spp. may be responsible for unreported outbreaks of food-borne disease. The detection of these outbreaks is made more difficult by the fact that appropriate methods for detecting clusters of Campylobacter have not been well defined. We have compared the characteristics of five molecular typing methods on Campylobacter jejuni and C. coli isolates obtained from human and nonhuman sources during sentinel site surveillance during a 3-year period. Comparative genomic fingerprinting (CGF) appears to be one of the optimal methods for the detection of clusters of cases, and it could be supplemented by the sequencing of the flaA gene short variable region (flaA SVR sequence typing), with or without subsequent multilocus sequence typing (MLST). Different methods may be optimal for uncovering different aspects of source attribution. Finally, the use of several different molecular typing or analysis methods for comparing individuals within a population reveals much more about that population than a single method. Similarly, comparing several different typing methods reveals a great deal about differences in how the methods group individuals within the population. PMID:22162562

  16. Subspace K-means clustering.

    PubMed

    Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

    2013-12-01

    To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).

  17. Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects

    PubMed Central

    Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

    2017-01-01

    An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called “cluster randomization”). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies. PMID:28584874

  18. Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects.

    PubMed

    Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

    2017-01-01

    An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called "cluster randomization"). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies.

  19. Preference Mapping of Soymilk with Different U.S. Consumers.

    PubMed

    Lawrence, S E; Lopetcharat, K; Drake, M A

    2016-02-01

    This study determined and compared drivers of liking for unflavored soymilk with different U.S. consumer groups. A highly trained panel documented appearance, mouthfeel and flavor attributes of 26 commercial soymilks. Twelve representative soymilks were then selected for evaluation by consumers from 3 age/cultural categories (n = 75 each category; Caucasian/African American females aged 18 to 30 y; Asian females aged 18 to 30 y; Caucasian/African American females aged 40 to 64 y). Consumers evaluated overall liking and liking and intensity of specific attributes. Results were evaluated by analysis of variance, followed by internal and external preference mapping. Age had no effect on overall liking, while ethnicity did (Caucasian/African American compared with Asian; P < 0.05). Caucasians/African Americans differentiated soymilks more than Asians and assigned a wider range of liking scores than Asians (2.1 to 7.2 compared with 4.0 to 6.1). Three consumer clusters were identified. Sweet taste with vanilla/vanillin and sweet aromatic flavors and higher viscosity were preferred by most consumers and differences between consumer clusters were primarily in drivers of dislike. Drivers of dislike were not identified for Cluster 1 consumers while Clusters 2 and 3 consumers (n = 84, n = 80) disliked beany, green/grassy and meaty/brothy flavors and astringency. Cluster 3 (n = 80) consumers scored all soymilks higher in liking (P < 0.05) than Cluster 2 consumers, and were willing to overlook disliked attributes with the addition of sweet taste, whereas the Cluster 2 consumers were not. These findings can be utilized to produce soymilks with attributes that are well liked by target consumers and to tailor attributes for segments of the population that have not yet been accommodated. © 2015 Institute of Food Technologists®

  20. [Difficulties in emotion regulation and personal distress in young adults with social anxiety].

    PubMed

    Contardi, Anna; Farina, Benedetto; Fabbricatore, Mariantonietta; Tamburello, Stella; Scapellato, Paolo; Penzo, Ilaria; Tamburello, Antonino; Innamorati, Marco

    2013-01-01

    The aim of this study was to assess the association between social anxiety and difficulties in emotion regulation in a sample of Italian young adults. Our convenience sample was composed of 298 Italian young adults (184 women and 114 men) aged 18-34 years. Participants were administered the Interaction Anxiousness Scale (IAS), the Audience Anxiousness Scale (AAS), the Difficulties in Emotion Regulation Scale (DERS), and the Interpersonal Reactivity Index (IRI). A Two Step cluster analysis was used to group subjects according to their level of social anxiety. The cluster analysis indicated a two-cluster solution. The first cluster included 163 young adults with higher scores on the AAS and the IAS than those included in cluster 2 (n=135). A generalized linear model with groups as dependent variable indicated that people with higher social anxiety (compared to those with lower social anxiety) have higher scores on the dimension personal distress of the IRI (p<0.01), and on the DERS non acceptance of negative emotions (p<0.001) and lack of emotional clarity (p<0.05). The results are consistent with models of psychopathology, which hypothesize that people who cannot deal effectively with their emotions may develop depressive and anxious disorders.

  1. Collaborative filtering recommendation model based on fuzzy clustering algorithm

    NASA Astrophysics Data System (ADS)

    Yang, Ye; Zhang, Yunhua

    2018-05-01

    As one of the most widely used algorithms in recommender systems, collaborative filtering algorithm faces two serious problems, which are the sparsity of data and poor recommendation effect in big data environment. In traditional clustering analysis, the object is strictly divided into several classes and the boundary of this division is very clear. However, for most objects in real life, there is no strict definition of their forms and attributes of their class. Concerning the problems above, this paper proposes to improve the traditional collaborative filtering model through the hybrid optimization of implicit semantic algorithm and fuzzy clustering algorithm, meanwhile, cooperating with collaborative filtering algorithm. In this paper, the fuzzy clustering algorithm is introduced to fuzzy clustering the information of project attribute, which makes the project belong to different project categories with different membership degrees, and increases the density of data, effectively reduces the sparsity of data, and solves the problem of low accuracy which is resulted from the inaccuracy of similarity calculation. Finally, this paper carries out empirical analysis on the MovieLens dataset, and compares it with the traditional user-based collaborative filtering algorithm. The proposed algorithm has greatly improved the recommendation accuracy.

  2. Differentiation of Recurrent Glioblastoma from Delayed Radiation Necrosis by Using Voxel-based Multiparametric Analysis of MR Imaging Data.

    PubMed

    Yoon, Ra Gyoung; Kim, Ho Sung; Koh, Myeong Ju; Shim, Woo Hyun; Jung, Seung Chai; Kim, Sang Joon; Kim, Jeong Hoon

    2017-10-01

    Purpose To assess a volume-weighted voxel-based multiparametric (MP) clustering method as an imaging biomarker to differentiate recurrent glioblastoma from delayed radiation necrosis. Materials and Methods The institutional review board approved this retrospective study and waived the informed consent requirement. Seventy-five patients with pathologic analysis-confirmed recurrent glioblastoma (n = 42) or radiation necrosis (n = 33) who presented with enlarged contrast material-enhanced lesions at magnetic resonance (MR) imaging after they completed concurrent chemotherapy and radiation therapy were enrolled. The diagnostic performance of the total MP cluster score was determined by using the area under the receiver operating characteristic curve (AUC) with cross-validation and compared with those of single parameter measurements (10% histogram cutoffs of apparent diffusion coefficient [ADC10] or 90% histogram cutoffs of normalized cerebral blood volume and initial time-signal intensity AUC). Results Receiver operating characteristic curve analysis showed that an AUC for differentiating recurrent glioblastoma from delayed radiation necrosis was highest in the total MP cluster score and lowest for ADC10 for both readers. The total MP cluster score had significantly better diagnostic accuracy than any single parameter (corrected P = .001-.039 for reader 1; corrected P = .005-.041 for reader 2). The total MP cluster score was the best predictor of recurrent glioblastoma (cross-validated AUCs, 0.942-0.946 for both readers), with a sensitivity of 95.2% for reader 1 and 97.6% for reader 2. Conclusion Quantitative analysis with volume-weighted voxel-based MP clustering appears to be superior to the use of single imaging parameters to differentiate recurrent glioblastoma from delayed radiation necrosis. © RSNA, 2017 Online supplemental material is available for this article.

  3. Delineation of Stenotrophomonas maltophilia isolates from cystic fibrosis patients by fatty acid methyl ester profiles and matrix-assisted laser desorption/ionization time-of-flight mass spectra using hierarchical cluster analysis and principal component analysis.

    PubMed

    Vidigal, Pedrina Gonçalves; Mosel, Frank; Koehling, Hedda Luise; Mueller, Karl Dieter; Buer, Jan; Rath, Peter Michael; Steinmann, Joerg

    2014-12-01

    Stenotrophomonas maltophilia is an opportunist multidrug-resistant pathogen that causes a wide range of nosocomial infections. Various cystic fibrosis (CF) centres have reported an increasing prevalence of S. maltophilia colonization/infection among patients with this disease. The purpose of this study was to assess specific fingerprints of S. maltophilia isolates from CF patients (n = 71) by investigating fatty acid methyl esters (FAMEs) through gas chromatography (GC) and highly abundant proteins by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), and to compare them with isolates obtained from intensive care unit (ICU) patients (n = 20) and the environment (n = 11). Principal component analysis (PCA) of GC-FAME patterns did not reveal a clustering corresponding to distinct CF, ICU or environmental types. Based on the peak area index, it was observed that S. maltophilia isolates from CF patients produced significantly higher amounts of fatty acids in comparison with ICU patients and the environmental isolates. Hierarchical cluster analysis (HCA) based on the MALDI-TOF MS peak profiles of S. maltophilia revealed the presence of five large clusters, suggesting a high phenotypic diversity. Although HCA of MALDI-TOF mass spectra did not result in distinct clusters predominantly composed of CF isolates, PCA revealed the presence of a distinct cluster composed of S. maltophilia isolates from CF patients. Our data suggest that S. maltophilia colonizing CF patients tend to modify not only their fatty acid patterns but also their protein patterns as a response to adaptation in the unfavourable environment of the CF lung. © 2014 The Authors.

  4. Influence of birth cohort on age of onset cluster analysis in bipolar I disorder.

    PubMed

    Bauer, M; Glenn, T; Alda, M; Andreassen, O A; Angelopoulos, E; Ardau, R; Baethge, C; Bauer, R; Bellivier, F; Belmaker, R H; Berk, M; Bjella, T D; Bossini, L; Bersudsky, Y; Cheung, E Y W; Conell, J; Del Zompo, M; Dodd, S; Etain, B; Fagiolini, A; Frye, M A; Fountoulakis, K N; Garneau-Fournier, J; Gonzalez-Pinto, A; Harima, H; Hassel, S; Henry, C; Iacovides, A; Isometsä, E T; Kapczinski, F; Kliwicki, S; König, B; Krogh, R; Kunz, M; Lafer, B; Larsen, E R; Lewitzka, U; Lopez-Jaramillo, C; MacQueen, G; Manchia, M; Marsh, W; Martinez-Cengotitabengoa, M; Melle, I; Monteith, S; Morken, G; Munoz, R; Nery, F G; O'Donovan, C; Osher, Y; Pfennig, A; Quiroz, D; Ramesar, R; Rasgon, N; Reif, A; Ritter, P; Rybakowski, J K; Sagduyu, K; Scippa, A M; Severus, E; Simhandl, C; Stein, D J; Strejilevich, S; Hatim Sulaiman, A; Suominen, K; Tagata, H; Tatebayashi, Y; Torrent, C; Vieta, E; Viswanath, B; Wanchoo, M J; Zetin, M; Whybrow, P C

    2015-01-01

    Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset, using a large, international database. The database includes 4037 patients with a diagnosis of bipolar I disorder, previously collected at 36 collection sites in 23 countries. Generalized estimating equations (GEE) were used to adjust the data for country median age, and in some models, birth cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After adjusting for the birth cohort or when considering only those born after 1959, two subgroups were found. With results of either two or three subgroups, the youngest subgroup was more likely to have a family history of mood disorders and a first episode with depressed polarity. However, without adjusting for birth cohort (three subgroups), family history and polarity of the first episode could not be distinguished between the middle and oldest subgroups. These results using international data confirm prior findings using single country data, that there are subgroups of bipolar I disorder based on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more useful for research. Copyright © 2014 Elsevier Masson SAS. All rights reserved.

  5. Insights into magmatic processes and hydrothermal alteration of in situ superfast spreading ocean crust at ODP/IODP site 1256 from a cluster analysis of rock magnetic properties

    NASA Astrophysics Data System (ADS)

    Dekkers, Mark J.; Heslop, David; Herrero-Bervera, Emilio; Acton, Gary; Krasa, David

    2014-08-01

    We analyze magnetic properties from Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6°44.1' N, 91°56.1' W) on the Cocos Plate in ˜15.2 Ma oceanic crust generated by superfast seafloor spreading, the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Fuzzy c-means cluster analysis and nonlinear mapping are utilized to study down-hole trends in the ratio of the saturation remanent magnetization and the saturation magnetization, the coercive force, the ratio of the remanent coercive force and coercive force, the low-field magnetic susceptibility, and the Curie temperature, to evaluate the effects of magmatic and hydrothermal processes on magnetic properties. A statistically robust five cluster solution separates the data predominantly into three clusters that express increasing hydrothermal alteration of the lavas, which differ from two distinct clusters mainly representing the dikes and gabbros. Extensive alteration can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. Thus, the analysis complements interpretation based on electrofacies analysis. All clusters display rock magnetic characteristics compatible with an ability to retain a stable natural remanent magnetization suggesting that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Paleointensity determination is difficult because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.

  6. Gaussian mixture clustering and imputation of microarray data.

    PubMed

    Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

    2004-04-12

    In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.

  7. A Systematic Analysis of Candidate Genes Associated with Nicotine Addiction

    PubMed Central

    Liu, Meng; Li, Xia; Fan, Rui; Liu, Xinhua; Wang, Ju

    2015-01-01

    Nicotine, as the major psychoactive component of tobacco, has broad physiological effects within the central nervous system, but our understanding of the molecular mechanism underlying its neuronal effects remains incomplete. In this study, we performed a systematic analysis on a set of nicotine addiction-related genes to explore their characteristics at network levels. We found that NAGenes tended to have a more moderate degree and weaker clustering coefficient and to be less central in the network compared to alcohol addiction-related genes or cancer genes. Further, clustering of these genes resulted in six clusters with themes in synaptic transmission, signal transduction, metabolic process, and apoptosis, which provided an intuitional view on the major molecular functions of the genes. Moreover, functional enrichment analysis revealed that neurodevelopment, neurotransmission activity, and metabolism related biological processes were involved in nicotine addiction. In summary, by analyzing the overall characteristics of the nicotine addiction related genes, this study provided valuable information for understanding the molecular mechanisms underlying nicotine addiction. PMID:26097843

  8. Pollen morphology and plant taxonomy of white oaks in eastern North America

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solomon, A.M.

    An evaluation of possible approaches to fossil oak pollen identification utilized scanning electron microscopy to examine exine-surface features of 171 collections, representing 16 Quercus subgenus Lepidobalanus species and varieties of eastern North America. Twenty qualitative pollen morphological characters were defined and tabulated for each of 217 pollen grains. The data were subjected to cluster analysis and cluster diagrams were compared with published white oak taxonomy. Pollen morphology and plant taxonomy compared well in series of the subgenus Lepidobalanus due primarily to consistency of character presence and absence within species and varieties. Pollen morphology of white oaks appears to reflect plantmore » systematics above the species level. Use of routine SEM analysis to identify series of white oaks among fossil pollen grains likely will yield valid results. 38 references.« less

  9. Signatures of cytoplasmic proteins in the exoproteome distinguish community- and hospital-associated methicillin-resistant Staphylococcus aureus USA300 lineages.

    PubMed

    Mekonnen, Solomon A; Palma Medina, Laura M; Glasner, Corinna; Tsompanidou, Eleni; de Jong, Anne; Grasso, Stefano; Schaffer, Marc; Mäder, Ulrike; Larsen, Anders R; Gumpert, Heidi; Westh, Henrik; Völker, Uwe; Otto, Andreas; Becher, Dörte; van Dijl, Jan Maarten

    2017-08-18

    Methicillin-resistant Staphylococcus aureus (MRSA) is the common name for a heterogeneous group of highly drug-resistant staphylococci. Two major MRSA classes are distinguished based on epidemiology, namely community-associated (CA) and hospital-associated (HA) MRSA. Notably, the distinction of CA- and HA-MRSA based on molecular traits remains difficult due to the high genomic plasticity of S. aureus. Here we sought to pinpoint global distinguishing features of CA- and HA-MRSA through a comparative genome and proteome analysis of the notorious MRSA lineage USA300. We show for the first time that CA- and HA-MRSA isolates can be distinguished by 2 distinct extracellular protein abundance clusters that are predictive not only for epidemiologic behavior, but also for their growth and survival within epithelial cells. This 'exoproteome profiling' also groups more distantly related HA-MRSA isolates into the HA exoproteome cluster. Comparative genome analysis suggests that these distinctive features of CA- and HA-MRSA isolates relate predominantly to the accessory genome. Intriguingly, the identified exoproteome clusters differ in the relative abundance of typical cytoplasmic proteins, suggesting that signatures of cytoplasmic proteins in the exoproteome represent a new distinguishing feature of CA- and HA-MRSA. Our comparative genome and proteome analysis focuses attention on potentially distinctive roles of 'liberated' cytoplasmic proteins in the epidemiology and intracellular survival of CA- and HA-MRSA isolates. Such extracellular cytoplasmic proteins were recently invoked in staphylococcal virulence, but their implication in the epidemiology of MRSA is unprecedented.

  10. Calculating the Motion and Direction of Flux Transfer Events with Cluster

    NASA Technical Reports Server (NTRS)

    Collado-Vega, Yaireska M.; Sibeck, David Gary

    2011-01-01

    We use multi-point timing analysis to determine the orientation and motion of flux transfer events (FTEs) detected by the four Cluster spacecraft on the high-latitude dayside and flank magnetopause during 2002 and 2003. During these years, the distances between the Cluster spacecraft were greater than 1000 km, providing the tetrahedral configuration needed to select events and determine velocities. Each velocity and location will be examined in detail and compared to the velocities and locations determined by the predictions of the component and antiparallel reconnection models for event formation, orientation, motion, and acceleration for a wide range of spacecraft locations and solar wind conditions.

  11. Effects of single atom doping on the ultrafast electron dynamics of M1Au24(SR)18 (M = Pd, Pt) nanoclusters

    NASA Astrophysics Data System (ADS)

    Zhou, Meng; Qian, Huifeng; Sfeir, Matthew Y.; Nobusada, Katsuyuki; Jin, Rongchao

    2016-03-01

    Atomically precise, doped metal clusters are receiving wide research interest due to their synergistic properties dependent on the metal composition. To understand the electronic properties of doped clusters, it is highly desirable to probe the excited state behavior. Here, we report the ultrafast relaxation dynamics of doped M1@Au24(SR)18 (M = Pd, Pt; R = CH2CH2Ph) clusters using femtosecond visible and near infrared transient absorption spectroscopy. Three relaxation components are identified for both mono-doped clusters: (1) sub-picosecond relaxation within the M1Au12 core states; (2) core to shell relaxation in a few picoseconds; and (3) relaxation back to the ground state in more than one nanosecond. Despite similar relaxation pathways for the two doped nanoclusters, the coupling between the metal core and surface ligands is accelerated by over 30% in the case of the Pt dopant compared with the Pd dopant. Compared to Pd doping, the case of Pt doping leads to much more drastic changes in the steady state and transient absorption of the clusters, which indicates that the 5d orbitals of the Pt atom are more strongly mixed with Au 5d and 6s orbitals than the 4d orbitals of the Pd dopant. These results demonstrate that a single foreign atom can lead to entirely different excited state spectral features of the whole cluster compared to the parent Au25(SR)18 cluster. The detailed excited state dynamics of atomically precise Pd/Pt doped gold clusters help further understand their properties and benefit the development of energy-related applications.Atomically precise, doped metal clusters are receiving wide research interest due to their synergistic properties dependent on the metal composition. To understand the electronic properties of doped clusters, it is highly desirable to probe the excited state behavior. Here, we report the ultrafast relaxation dynamics of doped M1@Au24(SR)18 (M = Pd, Pt; R = CH2CH2Ph) clusters using femtosecond visible and near infrared transient absorption spectroscopy. Three relaxation components are identified for both mono-doped clusters: (1) sub-picosecond relaxation within the M1Au12 core states; (2) core to shell relaxation in a few picoseconds; and (3) relaxation back to the ground state in more than one nanosecond. Despite similar relaxation pathways for the two doped nanoclusters, the coupling between the metal core and surface ligands is accelerated by over 30% in the case of the Pt dopant compared with the Pd dopant. Compared to Pd doping, the case of Pt doping leads to much more drastic changes in the steady state and transient absorption of the clusters, which indicates that the 5d orbitals of the Pt atom are more strongly mixed with Au 5d and 6s orbitals than the 4d orbitals of the Pd dopant. These results demonstrate that a single foreign atom can lead to entirely different excited state spectral features of the whole cluster compared to the parent Au25(SR)18 cluster. The detailed excited state dynamics of atomically precise Pd/Pt doped gold clusters help further understand their properties and benefit the development of energy-related applications. Electronic supplementary information (ESI) available: The pump dependent transient absorption spectra and the corresponding global analysis results. See DOI: 10.1039/c6nr01008c

  12. A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

    PubMed

    Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

    2014-12-01

    In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.

  13. Identifying Two Groups of Entitled Individuals: Cluster Analysis Reveals Emotional Stability and Self-Esteem Distinction.

    PubMed

    Crowe, Michael L; LoPilato, Alexander C; Campbell, W Keith; Miller, Joshua D

    2016-12-01

    The present study hypothesized that there exist two distinct groups of entitled individuals: grandiose-entitled, and vulnerable-entitled. Self-report scores of entitlement were collected for 916 individuals using an online platform. Model-based cluster analyses were conducted on the individuals with scores one standard deviation above mean (n = 159) using the five-factor model dimensions as clustering variables. The results support the existence of two groups of entitled individuals categorized as emotionally stable and emotionally vulnerable. The emotionally stable cluster reported emotional stability, high self-esteem, more positive affect, and antisocial behavior. The emotionally vulnerable cluster reported low self-esteem and high levels of neuroticism, disinhibition, conventionality, psychopathy, negative affect, childhood abuse, intrusive parenting, and attachment difficulties. Compared to the control group, both clusters reported being more antagonistic, extraverted, Machiavellian, and narcissistic. These results suggest important differences are missed when simply examining the linear relationships between entitlement and various aspects of its nomological network.

  14. Automated atlas-based clustering of white matter fiber tracts from DTMRI.

    PubMed

    Maddah, Mahnaz; Mewes, Andrea U J; Haker, Steven; Grimson, W Eric L; Warfield, Simon K

    2005-01-01

    A new framework is presented for clustering fiber tracts into anatomically known bundles. This work is motivated by medical applications in which variation analysis of known bundles of fiber tracts in the human brain is desired. To include the anatomical knowledge in the clustering, we invoke an atlas of fiber tracts, labeled by the number of bundles of interest. In this work, we construct such an atlas and use it to cluster all fiber tracts in the white matter. To build the atlas, we start with a set of labeled ROIs specified by an expert and extract the fiber tracts initiating from each ROI. Affine registration is used to project the extracted fiber tracts of each subject to the atlas, whereas their B-spline representation is used to efficiently compare them to the fiber tracts in the atlas and assign cluster labels. Expert visual inspection of the result confirms that the proposed method is very promising and efficient in clustering of the known bundles of fiber tracts.

  15. Ankle plantarflexion strength in rearfoot and forefoot runners: a novel clusteranalytic approach.

    PubMed

    Liebl, Dominik; Willwacher, Steffen; Hamill, Joseph; Brüggemann, Gert-Peter

    2014-06-01

    The purpose of the present study was to test for differences in ankle plantarflexion strengths of habitually rearfoot and forefoot runners. In order to approach this issue, we revisit the problem of classifying different footfall patterns in human runners. A dataset of 119 subjects running shod and barefoot (speed 3.5m/s) was analyzed. The footfall patterns were clustered by a novel statistical approach, which is motivated by advances in the statistical literature on functional data analysis. We explain the novel statistical approach in detail and compare it to the classically used strike index of Cavanagh and Lafortune (1980). The two groups found by the new cluster approach are well interpretable as a forefoot and a rearfoot footfall groups. The subsequent comparison study of the clustered subjects reveals that runners with a forefoot footfall pattern are capable of producing significantly higher joint moments in a maximum voluntary contraction (MVC) of their ankle plantarflexor muscles tendon units; difference in means: 0.28Nm/kg. This effect remains significant after controlling for an additional gender effect and for differences in training levels. Our analysis confirms the hypothesis that forefoot runners have a higher mean MVC plantarflexion strength than rearfoot runners. Furthermore, we demonstrate that our proposed stochastic cluster analysis provides a robust and useful framework for clustering foot strikes. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Comparison of five cluster validity indices performance in brain [18 F]FET-PET image segmentation using k-means.

    PubMed

    Abualhaj, Bedor; Weng, Guoyang; Ong, Melissa; Attarwala, Ali Asgar; Molina, Flavia; Büsing, Karen; Glatting, Gerhard

    2017-01-01

    Dynamic [ 18 F]fluoro-ethyl-L-tyrosine positron emission tomography ([ 18 F]FET-PET) is used to identify tumor lesions for radiotherapy treatment planning, to differentiate glioma recurrence from radiation necrosis and to classify gliomas grading. To segment different regions in the brain k-means cluster analysis can be used. The main disadvantage of k-means is that the number of clusters must be pre-defined. In this study, we therefore compared different cluster validity indices for automated and reproducible determination of the optimal number of clusters based on the dynamic PET data. The k-means algorithm was applied to dynamic [ 18 F]FET-PET images of 8 patients. Akaike information criterion (AIC), WB, I, modified Dunn's and Silhouette indices were compared on their ability to determine the optimal number of clusters based on requirements for an adequate cluster validity index. To check the reproducibility of k-means, the coefficients of variation CVs of the objective function values OFVs (sum of squared Euclidean distances within each cluster) were calculated using 100 random centroid initialization replications RCI 100 for 2 to 50 clusters. k-means was performed independently on three neighboring slices containing tumor for each patient to investigate the stability of the optimal number of clusters within them. To check the independence of the validity indices on the number of voxels, cluster analysis was applied after duplication of a slice selected from each patient. CVs of index values were calculated at the optimal number of clusters using RCI 100 to investigate the reproducibility of the validity indices. To check if the indices have a single extremum, visual inspection was performed on the replication with minimum OFV from RCI 100 . The maximum CV of OFVs was 2.7 × 10 -2 from all patients. The optimal number of clusters given by modified Dunn's and Silhouette indices was 2 or 3 leading to a very poor segmentation. WB and I indices suggested in median 5, [range 4-6] and 4, [range 3-6] clusters, respectively. For WB, I, modified Dunn's and Silhouette validity indices the suggested optimal number of clusters was not affected by the number of the voxels. The maximum coefficient of variation of WB, I, modified Dunn's, and Silhouette validity indices were 3 × 10 -2 , 1, 2 × 10 -1 and 3 × 10 -3 , respectively. WB-index showed a single global maximum, whereas the other indices showed also local extrema. From the investigated cluster validity indices, the WB-index is best suited for automated determination of the optimal number of clusters for [ 18 F]FET-PET brain images for the investigated image reconstruction algorithm and the used scanner: it yields meaningful results allowing better differentiation of tissues with higher number of clusters, it is simple, reproducible and has an unique global minimum. © 2016 American Association of Physicists in Medicine.

  17. Validation of gait analysis with dynamic radiostereometric analysis (RSA) in patients operated with total hip arthroplasty.

    PubMed

    Zügner, Roland; Tranberg, Roy; Lisovskaja, Vera; Shareghi, Bita; Kärrholm, Johan

    2017-07-01

    We simultaneously examined 14 patients with OTS and dynamic radiostereometric analysis (RSA) to evaluate the accuracy of both skin- and a cluster-marker models. The mean differences between the OTS and RSA system in hip flexion, abduction, and rotation varied up to 9.5° for the skin-marker and up to 11.3° for the cluster-marker models, respectively. Both models tended to underestimate the amount of flexion and abduction, but a significant systematic difference between the marker and RSA evaluations could only be established for recordings of hip abduction using cluster markers (p = 0.04). The intra-class correlation coefficient (ICC) was 0.7 or higher during flexion for both models and during abduction using skin markers, but decreased to 0.5-0.6 when abduction motion was studied with cluster markers. During active hip rotation, the two marker models tended to deviate from the RSA recordings in different ways with poor correlations at the end of the motion (ICC ≤0.4). During active hip motions soft tissue displacements occasionally induced considerable differences when compared to skeletal motions. The best correlation between RSA recordings and the skin- and cluster-marker model was found for studies of hip flexion and abduction with the skin-marker model. Studies of hip abduction with use of cluster markers were associated with a constant underestimation of the motion. Recordings of skeletal motions with use of skin or cluster markers during hip rotation were associated with high mean errors amounting up to about 10° at certain positions. © 2016 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 35:1515-1522, 2017. © 2016 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.

  18. Transcriptional profiles of Arabidopsis stomataless mutants reveal developmental and physiological features of life in the absence of stomata

    PubMed Central

    de Marcos, Alberto; Triviño, Magdalena; Pérez-Bueno, María Luisa; Ballesteros, Isabel; Barón, Matilde; Mena, Montaña; Fenoll, Carmen

    2015-01-01

    Loss of function of the positive stomata development regulators SPCH or MUTE in Arabidopsis thaliana renders stomataless plants; spch-3 and mute-3 mutants are extreme dwarfs, but produce cotyledons and tiny leaves, providing a system to interrogate plant life in the absence of stomata. To this end, we compared their cotyledon transcriptomes with that of wild-type plants. K-means clustering of differentially expressed genes generated four clusters: clusters 1 and 2 grouped genes commonly regulated in the mutants, while clusters 3 and 4 contained genes distinctively regulated in mute-3. Classification in functional categories and metabolic pathways of genes in clusters 1 and 2 suggested that both mutants had depressed secondary, nitrogen and sulfur metabolisms, while only a few photosynthesis-related genes were down-regulated. In situ quenching analysis of chlorophyll fluorescence revealed limited inhibition of photosynthesis. This and other fluorescence measurements matched the mutant transcriptomic features. Differential transcriptomes of both mutants were enriched in growth-related genes, including known stomata development regulators, which paralleled their epidermal phenotypes. Analysis of cluster 3 was not informative for developmental aspects of mute-3. Cluster 4 comprised genes differentially up−regulated in mute−3, 35% of which were direct targets for SPCH and may relate to the unique cell types of mute−3. A screen of T-DNA insertion lines in genes differentially expressed in the mutants identified a gene putatively involved in stomata development. A collection of lines for conditional overexpression of transcription factors differentially expressed in the mutants rendered distinct epidermal phenotypes, suggesting that these proteins may be novel stomatal development regulators. Thus, our transcriptome analysis represents a useful source of new genes for the study of stomata development and for characterizing physiology and growth in the absence of stomata. PMID:26157447

  19. Whole Genome Sequence and Phylogenetic Analysis Show Helicobacter pylori Strains from Latin America Have Followed a Unique Evolution Pathway

    PubMed Central

    Muñoz-Ramírez, Zilia Y.; Mendez-Tenorio, Alfonso; Kato, Ikuko; Bravo, Maria M.; Rizzato, Cosmeri; Thorell, Kaisa; Torres, Roberto; Aviles-Jimenez, Francisco; Camorlinga, Margarita; Canzian, Federico; Torres, Javier

    2017-01-01

    Helicobacter pylori (HP) genetics may determine its clinical outcomes. Despite high prevalence of HP infection in Latin America (LA), there have been no phylogenetic studies in the region. We aimed to understand the structure of HP populations in LA mestizo individuals, where gastric cancer incidence remains high. The genome of 107 HP strains from Mexico, Nicaragua and Colombia were analyzed with 59 publicly available worldwide genomes. To study bacterial relationship on whole genome level we propose a virtual hybridization technique using thousands of high-entropy 13 bp DNA probes to generate fingerprints. Phylogenetic virtual genome fingerprint (VGF) was compared with Multi Locus Sequence Analysis (MLST) and with phylogenetic analyses of cagPAI virulence island sequences. With MLST some Nicaraguan and Mexican strains clustered close to Africa isolates, whereas European isolates were spread without clustering and intermingled with LA isolates. VGF analysis resulted in increased resolution of populations, separating European from LA strains. Furthermore, clusters with exclusively Colombian, Mexican, or Nicaraguan strains were observed, where the Colombian cluster separated from Europe, Asia, and Africa, while Nicaraguan and Mexican clades grouped close to Africa. In addition, a mixed large LA cluster including Mexican, Colombian, Nicaraguan, Peruvian, and Salvadorian strains was observed; all LA clusters separated from the Amerind clade. With cagPAI sequence analyses LA clades clearly separated from Europe, Asia and Amerind, and Colombian strains formed a single cluster. A NeighborNet analyses suggested frequent and recent recombination events particularly among LA strains. Results suggests that in the new world, H. pylori has evolved to fit mestizo LA populations, already 500 years after the Spanish colonization. This co-adaption may account for regional variability in gastric cancer risk. PMID:28293542

  20. [Cluster analysis in biomedical researches].

    PubMed

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  1. Observational and Numerical Diagnostics of Galaxy Cluster Outer Regions

    NASA Technical Reports Server (NTRS)

    Eckert, D.; Vazza, F.; Ettori, S.; Molendi, S.; Nagai, D.; Lau, E.; Roncarelli, M.; Rossetti, M.; Snowden, S. L.; Gastaldello, F.

    2011-01-01

    Aims. We present the analysis of a local (z = 0.04 - 0.2) sample of 31 galaxy clusters with the aim of measuring the density of the X-ray emitting gas in cluster outskirts. We compare our results with numerical simulations to set constraints on the azimuthal symmetry and gas clumping in the outer regions of galaxy clusters. Methods. We exploit the large field-of-view and low instrumental background of ROSAT/PSPC to trace the density of the intracluster gas out to the virial radius. We perform a stacking of the density profiles to detect a signal beyond r(sub 200) and measure the typical density and scatter in cluster outskirts. We also compute the azimuthal scatter of the profiles with respect to the mean value to look for deviations from spherical symmetry. Finally, we compare our average density and scatter profiles with the results of numerical simulations. Results. As opposed to several recent results, we observe a steepening of the density profiles beyond approximately 0.3r(sub 500). Comparing our density profiles with simulations, we find that non-radiative runs predict too steep density profiles, whereas runs including additional physics and/or gas clumping are in better agreement with the observed gas distribution. We note a systematic difference between cool-core and non-cool core clusters beyond approximately 0.3r(sub 200), which we explain by a different distribution of the gas in the two classes. Beyond approximately r(sub 500), galaxy clusters deviate significantly from spherical symmetry, with only little differences between relaxed and disturbed systems. We find good agreement between the observed and predicted scatter profiles, but only when the 1% densest clumps are filtered out in the simulations. Conclusions. The general trend of steepening density around the virial radius indicates that the shallow density profiles found in several recent works were probably obtained along particular directions (e.g., filaments) and are not representative of the typical behavior of clusters. Comparing our results with numerical simulations, we find that non-radiative simulations fail to reproduce the gas distribution, even well outside cluster cores. Therefore, a detailed treatment of gas cooling, star formation, clumping, and AGN feedback is required to construct realistic models of cluster outer regions.

  2. Clustering algorithm for determining community structure in large networks

    NASA Astrophysics Data System (ADS)

    Pujol, Josep M.; Béjar, Javier; Delgado, Jordi

    2006-07-01

    We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.

  3. Optimization of b-value distribution for biexponential diffusion-weighted MR imaging of normal prostate.

    PubMed

    Jambor, Ivan; Merisaari, Harri; Aronen, Hannu J; Järvinen, Jukka; Saunavaara, Jani; Kauko, Tommi; Borra, Ronald; Pesola, Marko

    2014-05-01

    To determine the optimal b-value distribution for biexponential diffusion-weighted imaging (DWI) of normal prostate using both a computer modeling approach and in vivo measurements. Optimal b-value distributions for the fit of three parameters (fast diffusion Df, slow diffusion Ds, and fraction of fast diffusion f) were determined using Monte-Carlo simulations. The optimal b-value distribution was calculated using four individual optimization methods. Eight healthy volunteers underwent four repeated 3 Tesla prostate DWI scans using both 16 equally distributed b-values and an optimized b-value distribution obtained from the simulations. The b-value distributions were compared in terms of measurement reliability and repeatability using Shrout-Fleiss analysis. Using low noise levels, the optimal b-value distribution formed three separate clusters at low (0-400 s/mm2), mid-range (650-1200 s/mm2), and high b-values (1700-2000 s/mm2). Higher noise levels resulted into less pronounced clustering of b-values. The clustered optimized b-value distribution demonstrated better measurement reliability and repeatability in Shrout-Fleiss analysis compared with 16 equally distributed b-values. The optimal b-value distribution was found to be a clustered distribution with b-values concentrated in the low, mid, and high ranges and was shown to improve the estimation quality of biexponential DWI parameters of in vivo experiments. Copyright © 2013 Wiley Periodicals, Inc.

  4. High-Resolution Metabolic Phenotyping of Genetically and Environmentally Diverse Potato Tuber Systems. Identification of Phenocopies

    PubMed Central

    Roessner, Ute; Willmitzer, Lothar; Fernie, Alisdair R.

    2001-01-01

    We conducted a comprehensive metabolic phenotyping of potato (Solanum tuberosum L. cv Desiree) tuber tissue that had been modified either by transgenesis or exposure to different environmental conditions using a recently developed gas chromatography-mass spectrometry profiling protocol. Applying this technique, we were able to identify and quantify the major constituent metabolites of the potato tuber within a single chromatographic run. The plant systems that we selected to profile were tuber discs incubated in varying concentrations of fructose, sucrose, and mannitol and transgenic plants impaired in their starch biosynthesis. The resultant profiles were then compared, first at the level of individual metabolites and then using the statistical tools hierarchical cluster analysis and principal component analysis. These tools allowed us to assign clusters to the individual plant systems and to determine relative distances between these clusters; furthermore, analyzing the loadings of these analyses enabled identification of the most important metabolites in the definition of these clusters. The metabolic profiles of the sugar-fed discs were dramatically different from the wild-type steady-state values. When these profiles were compared with one another and also with those we assessed in previous studies, however, we were able to evaluate potential phenocopies. These comparisons highlight the importance of such an approach in the functional and qualitative assessment of diverse systems to gain insights into important mediators of metabolism. PMID:11706160

  5. Three subgroups of pain profiles identified in 227 women with arthritis: a latent class analysis.

    PubMed

    de Luca, Katie; Parkinson, Lynne; Downie, Aron; Blyth, Fiona; Byles, Julie

    2017-03-01

    The objectives were to identify subgroups of women with arthritis based upon the multi-dimensional nature of their pain experience and to compare health and socio-demographic variables between subgroups. A latent class analysis of 227 women with self-reported arthritis was used to identify clusters of women based upon the sensory, affective, and cognitive dimensions of the pain experience. Multivariate multinomial logistic regression analysis was used to determine the relationship between cluster membership and health and sociodemographic characteristics. A three-class cluster model was most parsimonious. 39.5 % of women had a unidimensional pain profile; 38.6 % of women had moderate multidimensional pain profile that included additional pain symptomatology such as sensory qualities and pain catastrophizing; and 21.9 % of women had severe multidimensional pain profile that included prominent pain symptomatology such as sensory and affective qualities of pain, pain catastrophizing, and neuropathic pain. Women with severe multidimensional pain profile have a 30.5 % higher risk of poorer quality of life and a 7.3 % higher risk of suffering depression, and women with moderate multidimensional pain profile have a 6.4 % higher risk of poorer quality of life when compared to women with unidimensional pain. This study identified three distinct subgroups of pain profiles in older women with arthritis. Women had very different experiences of pain, and cluster membership impacted significantly on health-related quality of life. These preliminary findings provide a stronger understanding of profiles of pain and may contribute to the development of tailored treatment options in arthritis.

  6. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method.

    PubMed

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-07-01

    The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous studies. We also compared pairwise distances (between geographically separated samples) with those obtained using the AMOVA method and found good agreement. Further analyses that are impossible with AMOVA were made using the discrete Laplace method: analysis of the homogeneity in two different ways and calculating marginal STR distributions. We found that the Y-STR haplotypes from e.g. Finland were relatively homogeneous as opposed to the relatively heterogeneous Y-STR haplotypes from e.g. Lublin, Eastern Poland and Berlin, Germany. We demonstrated that the observed distributions of alleles at each locus were similar to the expected ones. We also compared pairwise distances between geographically separated samples from Africa with those obtained using the AMOVA method and found good agreement. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  7. Genome-based exploration of the specialized metabolic capacities of the genus Rhodococcus.

    PubMed

    Ceniceros, Ana; Dijkhuizen, Lubbert; Petrusma, Mirjan; Medema, Marnix H

    2017-08-09

    Bacteria of the genus Rhodococcus are well known for their ability to degrade a large range of organic compounds. Some rhodococci are free-living, saprophytic bacteria; others are animal and plant pathogens. Recently, several studies have shown that their genomes encode putative pathways for the synthesis of a large number of specialized metabolites that are likely to be involved in microbe-microbe and host-microbe interactions. To systematically explore the specialized metabolic potential of this genus, we here performed a comprehensive analysis of the biosynthetic coding capacity across publicly available rhododoccal genomes, and compared these with those of several Mycobacterium strains as well as that of their mutual close relative Amycolicicoccus subflavus. Comparative genomic analysis shows that most predicted biosynthetic gene cluster families in these strains are clade-specific and lack any homology with gene clusters encoding the production of known natural products. Interestingly, many of these clusters appear to encode the biosynthesis of lipopeptides, which may play key roles in the diverse environments were rhodococci thrive, by acting as biosurfactants, pathogenicity factors or antimicrobials. We also identified several gene cluster families that are universally shared among all three genera, which therefore may have a more 'primary' role in their physiology. Inactivation of these clusters by mutagenesis might help to generate weaker strains that can be used as live vaccines. The genus Rhodococcus thus provides an interesting target for natural product discovery, in view of its large and mostly uncharacterized biosynthetic repertoire, its relatively fast growth and the availability of effective genetic tools for its genomic modification.

  8. Magnetic properties of Co-doped Nb clusters

    NASA Astrophysics Data System (ADS)

    Diaz-Bachs, A.; Peters, L.; Logemann, R.; Chernyy, V.; Bakker, J. M.; Katsnelson, M. I.; Kirilyuk, A.

    2018-04-01

    Magnetic deflection experiments on isolated Co-doped Nb clusters demonstrate a strong size dependence of magnetic properties, with large magnetic moments in certain cluster sizes and fully nonmagnetic behavior of others. There are in principle two explanations for this behavior. Either the local moment at the Co site is absent or it is screened by the delocalized electrons of the cluster, i.e., the Kondo effect. In order to reveal the physical origin, first, we established the ground state geometry of the clusters by experimentally obtaining their vibrational spectra and comparing them with a density functional theory study. Then, we performed an analysis based on the Anderson impurity model. It appears that the nonmagnetic clusters are due to the absence of the local Co moment and not due to the Kondo effect. In addition, the magnetic behavior of the clusters can be understood from an inspection of their electronic structure. Here magnetism is favored when the effective hybridization around the chemical potential is small, while the absence of magnetism is signaled by a large effective hybridization around the chemical potential.

  9. Identification of suitable genes contributes to lung adenocarcinoma clustering by multiple meta-analysis methods.

    PubMed

    Yang, Ze-Hui; Zheng, Rui; Gao, Yuan; Zhang, Qiang

    2016-09-01

    With the widespread application of high-throughput technology, numerous meta-analysis methods have been proposed for differential expression profiling across multiple studies. We identified the suitable differentially expressed (DE) genes that contributed to lung adenocarcinoma (ADC) clustering based on seven popular multiple meta-analysis methods. Seven microarray expression profiles of ADC and normal controls were extracted from the ArrayExpress database. The Bioconductor was used to perform the data preliminary preprocessing. Then, DE genes across multiple studies were identified. Hierarchical clustering was applied to compare the classification performance for microarray data samples. The classification efficiency was compared based on accuracy, sensitivity and specificity. Across seven datasets, 573 ADC cases and 222 normal controls were collected. After filtering out unexpressed and noninformative genes, 3688 genes were remained for further analysis. The classification efficiency analysis showed that DE genes identified by sum of ranks method separated ADC from normal controls with the best accuracy, sensitivity and specificity of 0.953, 0.969 and 0.932, respectively. The gene set with the highest classification accuracy mainly participated in the regulation of response to external stimulus (P = 7.97E-04), cyclic nucleotide-mediated signaling (P = 0.01), regulation of cell morphogenesis (P = 0.01) and regulation of cell proliferation (P = 0.01). Evaluation of DE genes identified by different meta-analysis methods in classification efficiency provided a new perspective to the choice of the suitable method in a given application. Varying meta-analysis methods always present varying abilities, so synthetic consideration should be taken when providing meta-analysis methods for particular research. © 2015 John Wiley & Sons Ltd.

  10. Expressed sequence tags from poplar wood tissues--a comparative analysis from multiple libraries.

    PubMed

    Déjardin, A; Leplé, J-C; Lesage-Descauses, M-C; Costa, G; Pilate, G

    2004-01-01

    Xylogenesis involves successive developmental processes--cambial division, cell expansion and differentiation, cell death--each occurring along a gradient from the cambium to the pith of the stem. Taking advantage of the high level of organisation of wood tissues, we isolated cambial zone (CZ), differentiating xylem (DX) and mature xylem (MX) from both tension wood (TW) and opposite wood (OW) of bent poplars. Four different cDNA libraries were then constructed and used to generate 10,062 EST, reflecting the genes expressed in the different wood tissues. For the most abundant clusters, the EST distributions were compared between libraries in order to identify genes specific or over-represented at some specific developmental stages. They clearly showed a developmental shift between CZ and DX, whereas there is a continuity of development between DX and MX. CZ was mainly characterized by clusters of genes involved in cell cycle, protein synthesis and fate. Interestingly, two clusters with no assigned function were found specific to the cambial zone. In DX and MX, clusters were mostly involved in methylation of lignin precursors and microtubule cytoskeleton. In addition, in DX, EST from TW and OW were compared: five clusters of arabinogalactan proteins, one for sucrose synthase and one for fructokinase were specific or over-represented in TW. Moreover, a putative transcription factor and a cluster of unknown function were also identified in DX-TW. The informative comparison of multiple libraries prepared from wood tissues led to the identification of genes--some with still unknown functions--putatively involved in xylogenesis and tension wood formation.

  11. Cluster analysis as a tool for evaluating the exploration potential of Known Geothermal Resource Areas

    DOE PAGES

    Lindsey, Cary R.; Neupane, Ghanashym; Spycher, Nicolas; ...

    2018-01-03

    Although many Known Geothermal Resource Areas in Oregon and Idaho were identified during the 1970s and 1980s, few were subsequently developed commercially. Because of advances in power plant design and energy conversion efficiency since the 1980s, some previously identified KGRAs may now be economically viable prospects. Unfortunately, available characterization data vary widely in accuracy, precision, and granularity, making assessments problematic. In this paper, we suggest a procedure for comparing test areas against proven resources using Principal Component Analysis and cluster identification. The result is a low-cost tool for evaluating potential exploration targets using uncertain or incomplete data.

  12. Cluster analysis as a tool for evaluating the exploration potential of Known Geothermal Resource Areas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lindsey, Cary R.; Neupane, Ghanashym; Spycher, Nicolas

    Although many Known Geothermal Resource Areas in Oregon and Idaho were identified during the 1970s and 1980s, few were subsequently developed commercially. Because of advances in power plant design and energy conversion efficiency since the 1980s, some previously identified KGRAs may now be economically viable prospects. Unfortunately, available characterization data vary widely in accuracy, precision, and granularity, making assessments problematic. In this paper, we suggest a procedure for comparing test areas against proven resources using Principal Component Analysis and cluster identification. The result is a low-cost tool for evaluating potential exploration targets using uncertain or incomplete data.

  13. Cluster ensemble based on Random Forests for genetic data.

    PubMed

    Alhusain, Luluah; Hafez, Alaaeldin M

    2017-01-01

    Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

  14. Comparison of mineral dust and droplet residuals measured with two single particle aerosol mass spectrometers

    NASA Astrophysics Data System (ADS)

    Wonaschütz, Anna; Ludwig, Wolfgang; Zawadowicz, Maria; Hiranuma, Naruki; Hitzenberger, Regina; Cziczo, Daniel; DeMott, Paul; Möhler, Ottmar

    2017-04-01

    Single Particle mass spectrometers are used to gain information on the chemical composition of individual aerosol particles, aerosol mixing state, and other valuable aerosol characteristics. During the Mass Spectrometry Intercomparison at the Fifth Ice Nucleation (FIN-01) Workshop, the new LAAPTOF single particle aerosol mass spectrometer (AeroMegt GmbH) was conducting simultaneous measurements together with the PALMS (Particle Analysis by Laser Mass Spectrometry) instrument. The aerosol particles were sampled from the AIDA chamber during ice cloud expansion experiments. Samples of mineral dust and ice droplet residuals were measured simultaneously. In this work, three expansion experiments are chosen for a comparison between the two mass spectrometers. A fuzzy clustering routine is used to group the spectra. Cluster centers describing the ensemble of particles are compared. First results show that while differences in the peak heights are likely due to the use of an amplifier in PALMS, cluster centers are comparable.

  15. Distributed computing for membrane-based modeling of action potential propagation.

    PubMed

    Porras, D; Rogers, J M; Smith, W M; Pollard, A E

    2000-08-01

    Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.

  16. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.

    PubMed

    Kopelman, Naama M; Mayzel, Jonathan; Jakobsson, Mattias; Rosenberg, Noah A; Mayrose, Itay

    2015-09-01

    The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. © 2015 John Wiley & Sons Ltd.

  17. The Gaia-ESO Survey. Mg-Al anti-correlation in iDR4 globular clusters

    NASA Astrophysics Data System (ADS)

    Pancino, E.; Romano, D.; Tang, B.; Tautvaišienė, G.; Casey, A. R.; Gruyters, P.; Geisler, D.; San Roman, I.; Randich, S.; Alfaro, E. J.; Bragaglia, A.; Flaccomio, E.; Korn, A. J.; Recio-Blanco, A.; Smiljanic, R.; Carraro, G.; Bayo, A.; Costado, M. T.; Damiani, F.; Jofré, P.; Lardo, C.; de Laverny, P.; Monaco, L.; Morbidelli, L.; Sbordone, L.; Sousa, S. G.; Villanova, S.

    2017-05-01

    We use Gaia-ESO (GES) Survey iDR4 data to explore the Mg-Al anti-correlation in globular clusters that were observed as calibrators, as a demonstration of the quality of Gaia-ESO Survey data and analysis. The results compare well with the available literature, within 0.1 dex or less, after a small (compared to the internal spreads) offset between the UVES and GIRAFFE data of 0.10-0.15 dex was taken into account. In particular, for the first time we present data for NGC 5927, which is one of the most metal-rich globular clusters studied in the literature so far with [ Fe / H ] = - 0.39 ± 0.04 dex; this cluster was included to connect with the open cluster regime in the Gaia-ESO Survey internal calibration. The extent and shape of the Mg-Al anti-correlation provide strong constraints on the multiple population phenomenon in globular clusters. In particular, we studied the dependency of the Mg-Al anti-correlation extension with metallicity, present-day mass,and age of the clusters, using GES data in combination with a large set of homogenized literature measurements.We find a dependency with both metallicity and mass, which is evident when fitting for the two parameters simultaneously, but we do not find significant dependency with age. We confirm that the Mg-Al anti-correlation is not seen in all clusters, but disappears for the less massive or most metal-rich clusters. We also use our data set to see whether a normal anti-correlation would explain the low [Mg/α] observed in some extragalactic globular clusters, but find that none of the clusters in our sample can reproduce it; a more extreme chemical composition, such as that of NGC 2419, would be required. We conclude that GES iDR4 data already meet the requirements set by the main survey goals and can be used to study globular clusters in detail, even if the analysis procedures were not specifically designed for them. Based on data products from observations made with ESO Telescopes at the La Silla Paranal Observatory under programme ID 188.B-3002.Full Table 2 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/601/A112

  18. Detection and Characterization of Galaxy Systems at Intermediate Redshift.

    NASA Astrophysics Data System (ADS)

    Barrena, Rafael

    2004-11-01

    This thesis is divided into two very related parts. In the first part we implement and apply a galaxy cluster detection method, based on multiband observations in visible. For this purpose, we use a new algorithm, the Voronoi Galaxy Cluster Finder, which identifies overdensities over a Poissonian field of objects. By applying this algorithm over four photometric bands (B, V, R and I) we reduce the possibility of detecting galaxy projection effects and spurious detections instead of real galaxy clusters. The B, V, R and I photometry allows a good characterization of galaxy systems. Therefore, we analyze the colour and early-type sequences in the colour-magnitude diagrams of the detected clusters. This analysis helps us to confirm the selected candidates as actual galaxy systems. In addition, by comparing observational early-type sequences with a semiempirical model we can estimate a photometric redshift for the detected clusters. We will apply this detection method on four 0.5x0.5 square degrees areas, that partially overlap the Postman Distant Cluster Survey (PDCS). The observations were performed as part of the International Time Programme 1999-B using the Wide Field Camera mounted at Isaac Newton Telescope (Roque de los Muchachos Observatory, La Palma island, Spain). The B and R data obtained were completed with V and I photometry performed by Marc Postman. The comparison of our cluster catalogue with that of PDCS reveals that our work is a clear improvement in the cluster detection techniques. Our method efficiently selects galaxy clusters, in particular low mass galaxy systems, even at relative high redshift, and estimate a precise photometric redshift. The validation of our method comes by observing spectroscopically several selected candidates. By comparing photometric and spectroscopic redshifts we conclude: 1) our photometric estimation method gives an precision lower than 0.1; 2) our detection technique is even able to detect galaxy systems at z~0.7 using visible photometric bands. In the second part of this thesis we analyze in detail the dynamical state of 1E0657-56 (z=0.296), a hot galaxy cluster with strong X-ray and radio emissions. Using spectroscopic and photometric observations in visible (obtained with the New Technology Telescope and the Very Large Telescope, both located at La Silla Observatory, Chile) we analyze the velocity field, morphology, colour and star formation in the galaxy population of this cluster. 1E0657-56 is involved in a collision event. We identify the substructure involved in this collision and we propose a dynamical model that allows us to investigate the origins of X-ray and radio emissions and the relation between them. The analysis of 1E0657-56 presented in this thesis constitutes a good example of what kind of properties could be studied in some of the clusters catalogued in first part of this thesis. In addition, the detailed analysis of this cluster represents an improvement in the study of the origin of X-ray and radio emissions and merging processes in galaxy clusters.

  19. MERGING GALAXY CLUSTERS: OFFSET BETWEEN THE SUNYAEV-ZEL'DOVICH EFFECT AND X-RAY PEAKS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Molnar, Sandor M.; Hearn, Nathan C.; Stadel, Joachim G., E-mail: sandor@phys.ntu.edu.tw

    2012-03-20

    Galaxy clusters, the most massive collapsed structures, have been routinely used to determine cosmological parameters. When using clusters for cosmology, the crucial assumption is that they are relaxed. However, subarcminute resolution Sunyaev-Zel'dovich (SZ) effect images compared with high-resolution X-ray images of some clusters show significant offsets between the two peaks. We have carried out self-consistent N-body/hydrodynamical simulations of merging galaxy clusters using FLASH to study these offsets quantitatively. We have found that significant displacements result between the SZ and X-ray peaks for large relative velocities for all masses used in our simulations as long as the impact parameters were aboutmore » 100-250 kpc. Our results suggest that the SZ peak coincides with the peak in the pressure times the line-of-sight characteristic length and not the pressure maximum (as it would for clusters in equilibrium). The peak in the X-ray emission, as expected, coincides with the density maximum of the main cluster. As a consequence, the morphology of the SZ signal, and therefore the offset between the SZ and X-ray peaks, change with viewing angle. As an application, we compare the morphologies of our simulated images to observed SZ and X-ray images and mass surface densities derived from weak-lensing observations of the merging galaxy cluster CL0152-1357, we find that a large relative velocity of 4800 km s{sup -1} is necessary to explain the observations. We conclude that an analysis of the morphologies of multi-frequency observations of merging clusters can be used to put meaningful constraints on the initial parameters of the progenitors.« less

  20. Vocal Affect Recognition and Psychopathy: Converging Findings Across Traditional and Cluster Analytic Approaches to Assessing the Construct

    PubMed Central

    Bagley, Amy D.; Abramowitz, Carolyn S.; Kosson, David S.

    2010-01-01

    Deficits in emotion processing have been widely reported to be central to psychopathy. However, few prior studies have examined vocal affect recognition in psychopaths, and these studies suffer from significant methodological limitations. Moreover, prior studies have yielded conflicting findings regarding the specificity of psychopaths’ affect recognition deficits. This study examined vocal affect recognition in 107 male inmates under conditions requiring isolated prosodic vs. semantic analysis of affective cues and compared subgroups of offenders identified via cluster analysis on vocal affect recognition. Psychopaths demonstrated deficits in vocal affect recognition under conditions requiring use of semantic cues and conditions requiring use of prosodic cues. Moreover, both primary and secondary psychopaths exhibited relatively similar emotional deficits in the semantic analysis condition compared to nonpsychopathic control participants. This study demonstrates that psychopaths’ vocal affect recognition deficits are not due to methodological limitations of previous studies and provides preliminary evidence that primary and secondary psychopaths exhibit generally similar deficits in vocal affect recognition. PMID:19413412

  1. Correlation and network analysis of global financial indices

    NASA Astrophysics Data System (ADS)

    Kumar, Sunil; Deo, Nivedita

    2012-08-01

    Random matrix theory (RMT) and network methods are applied to investigate the correlation and network properties of 20 financial indices. The results are compared before and during the financial crisis of 2008. In the RMT method, the components of eigenvectors corresponding to the second largest eigenvalue form two clusters of indices in the positive and negative directions. The components of these two clusters switch in opposite directions during the crisis. The network analysis uses the Fruchterman-Reingold layout to find clusters in the network of indices at different thresholds. At a threshold of 0.6, before the crisis, financial indices corresponding to the Americas, Europe, and Asia-Pacific form separate clusters. On the other hand, during the crisis at the same threshold, the American and European indices combine together to form a strongly linked cluster while the Asia-Pacific indices form a separate weakly linked cluster. If the value of the threshold is further increased to 0.9 then the European indices (France, Germany, and the United Kingdom) are found to be the most tightly linked indices. The structure of the minimum spanning tree of financial indices is more starlike before the crisis and it changes to become more chainlike during the crisis. The average linkage hierarchical clustering algorithm is used to find a clearer cluster structure in the network of financial indices. The cophenetic correlation coefficients are calculated and found to increase significantly, which indicates that the hierarchy increases during the financial crisis. These results show that there is substantial change in the structure of the organization of financial indices during a financial crisis.

  2. Correlation and network analysis of global financial indices.

    PubMed

    Kumar, Sunil; Deo, Nivedita

    2012-08-01

    Random matrix theory (RMT) and network methods are applied to investigate the correlation and network properties of 20 financial indices. The results are compared before and during the financial crisis of 2008. In the RMT method, the components of eigenvectors corresponding to the second largest eigenvalue form two clusters of indices in the positive and negative directions. The components of these two clusters switch in opposite directions during the crisis. The network analysis uses the Fruchterman-Reingold layout to find clusters in the network of indices at different thresholds. At a threshold of 0.6, before the crisis, financial indices corresponding to the Americas, Europe, and Asia-Pacific form separate clusters. On the other hand, during the crisis at the same threshold, the American and European indices combine together to form a strongly linked cluster while the Asia-Pacific indices form a separate weakly linked cluster. If the value of the threshold is further increased to 0.9 then the European indices (France, Germany, and the United Kingdom) are found to be the most tightly linked indices. The structure of the minimum spanning tree of financial indices is more starlike before the crisis and it changes to become more chainlike during the crisis. The average linkage hierarchical clustering algorithm is used to find a clearer cluster structure in the network of financial indices. The cophenetic correlation coefficients are calculated and found to increase significantly, which indicates that the hierarchy increases during the financial crisis. These results show that there is substantial change in the structure of the organization of financial indices during a financial crisis.

  3. Multilocus sequence analysis of Anaplasma phagocytophilum reveals three distinct lineages with different host ranges in clinically ill French cattle.

    PubMed

    Chastagner, Amélie; Dugat, Thibaud; Vourc'h, Gwenaël; Verheyden, Hélène; Legrand, Loïc; Bachy, Véronique; Chabanne, Luc; Joncour, Guy; Maillard, Renaud; Boulouis, Henri-Jean; Haddad, Nadia; Bailly, Xavier; Leblond, Agnès

    2014-12-09

    Molecular epidemiology represents a powerful approach to elucidate the complex epidemiological cycles of multi-host pathogens, such as Anaplasma phagocytophilum. A. phagocytophilum is a tick-borne bacterium that affects a wide range of wild and domesticated animals. Here, we characterized its genetic diversity in populations of French cattle; we then compared the observed genotypes with those found in horses, dogs, and roe deer to determine whether genotypes of A. phagocytophilum are shared among different hosts. We sampled 120 domesticated animals (104 cattle, 13 horses, and 3 dogs) and 40 wild animals (roe deer) and used multilocus sequence analysis on nine loci (ankA, msp4, groESL, typA, pled, gyrA, recG, polA, and an intergenic region) to characterize the genotypes of A. phagocytophilum present. Phylogenic analysis revealed three genetic clusters of bacterial variants in domesticated animals. The two principal clusters included 98% of the bacterial genotypes found in cattle, which were only distantly related to those in roe deer. One cluster comprised only cattle genotypes, while the second contained genotypes from cattle, horses, and dogs. The third contained all roe deer genotypes and three cattle genotypes. Geographical factors could not explain this clustering pattern. These results suggest that roe deer do not contribute to the spread of A. phagocytophilum in cattle in France. Further studies should explore if these different clusters are associated with differing disease severity in domesticated hosts. Additionally, it remains to be seen if the three clusters of A. phagocytophilum genotypes in cattle correspond to distinct epidemiological cycles, potentially involving different reservoir hosts.

  4. Sparse subspace clustering for data with missing entries and high-rank matrix completion.

    PubMed

    Fan, Jicong; Chow, Tommy W S

    2017-09-01

    Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Catchment classification by runoff behaviour with self-organizing maps (SOM)

    NASA Astrophysics Data System (ADS)

    Ley, R.; Casper, M. C.; Hellebrand, H.; Merz, R.

    2011-09-01

    Catchments show a wide range of response behaviour, even if they are adjacent. For many purposes it is necessary to characterise and classify them, e.g. for regionalisation, prediction in ungauged catchments, model parameterisation. In this study, we investigate hydrological similarity of catchments with respect to their response behaviour. We analyse more than 8200 event runoff coefficients (ERCs) and flow duration curves of 53 gauged catchments in Rhineland-Palatinate, Germany, for the period from 1993 to 2008, covering a huge variability of weather and runoff conditions. The spatio-temporal variability of event-runoff coefficients and flow duration curves are assumed to represent how different catchments "transform" rainfall into runoff. From the runoff coefficients and flow duration curves we derive 12 signature indices describing various aspects of catchment response behaviour to characterise each catchment. Hydrological similarity of catchments is defined by high similarities of their indices. We identify, analyse and describe hydrologically similar catchments by cluster analysis using Self-Organizing Maps (SOM). As a result of the cluster analysis we get five clusters of similarly behaving catchments where each cluster represents one differentiated class of catchments. As catchment response behaviour is supposed to be dependent on its physiographic and climatic characteristics, we compare groups of catchments clustered by response behaviour with clusters of catchments based on catchment properties. Results show an overlap of 67% between these two pools of clustered catchments which can be improved using the topologic correctness of SOMs.

  6. Catchment classification by runoff behaviour with self-organizing maps (SOM)

    NASA Astrophysics Data System (ADS)

    Ley, R.; Casper, M. C.; Hellebrand, H.; Merz, R.

    2011-03-01

    Catchments show a wide range of response behaviour, even if they are adjacent. For many purposes it is necessary to characterise and classify them, e.g. for regionalisation, prediction in ungauged catchments, model parameterisation. In this study, we investigate hydrological similarity of catchments with respect to their response behaviour. We analyse more than 8200 event runoff coefficients (ERCs) and flow duration curves of 53 gauged catchments in Rhineland-Palatinate, Germany, for the period from 1993 to 2008, covering a huge variability of weather and runoff conditions. The spatio-temporal variability of event-runoff coefficients and flow duration curves are assumed to represent how different catchments "transform" rainfall into runoff. From the runoff coefficients and flow duration curves we derive 12 signature indices describing various aspects of catchment response behaviour to characterise each catchment. Hydrological similarity of catchments is defined by high similarities of their indices. We identify, analyse and describe hydrologically similar catchments by cluster analysis using Self-Organizing Maps (SOM). As a result of the cluster analysis we get five clusters of similarly behaving catchments where each cluster represents one differentiated class of catchments. As catchment response behaviour is supposed to be dependent on its physiographic and climatic characteristics, we compare groups of catchments clustered by response behaviour with clusters of catchments based on catchment properties. Results show an overlap of 67% between these two pools of clustered catchments which can be improved using the topologic correctness of SOMs.

  7. Star Cluster Properties in Two LEGUS Galaxies Computed with Stochastic Stellar Population Synthesis Models

    NASA Astrophysics Data System (ADS)

    Krumholz, Mark R.; Adamo, Angela; Fumagalli, Michele; Wofford, Aida; Calzetti, Daniela; Lee, Janice C.; Whitmore, Bradley C.; Bright, Stacey N.; Grasha, Kathryn; Gouliermis, Dimitrios A.; Kim, Hwihyun; Nair, Preethi; Ryon, Jenna E.; Smith, Linda J.; Thilker, David; Ubeda, Leonardo; Zackrisson, Erik

    2015-10-01

    We investigate a novel Bayesian analysis method, based on the Stochastically Lighting Up Galaxies (slug) code, to derive the masses, ages, and extinctions of star clusters from integrated light photometry. Unlike many analysis methods, slug correctly accounts for incomplete initial mass function (IMF) sampling, and returns full posterior probability distributions rather than simply probability maxima. We apply our technique to 621 visually confirmed clusters in two nearby galaxies, NGC 628 and NGC 7793, that are part of the Legacy Extragalactic UV Survey (LEGUS). LEGUS provides Hubble Space Telescope photometry in the NUV, U, B, V, and I bands. We analyze the sensitivity of the derived cluster properties to choices of prior probability distribution, evolutionary tracks, IMF, metallicity, treatment of nebular emission, and extinction curve. We find that slug's results for individual clusters are insensitive to most of these choices, but that the posterior probability distributions we derive are often quite broad, and sometimes multi-peaked and quite sensitive to the choice of priors. In contrast, the properties of the cluster population as a whole are relatively robust against all of these choices. We also compare our results from slug to those derived with a conventional non-stochastic fitting code, Yggdrasil. We show that slug's stochastic models are generally a better fit to the observations than the deterministic ones used by Yggdrasil. However, the overall properties of the cluster populations recovered by both codes are qualitatively similar.

  8. Functional analysis of the upstream regulatory region of chicken miR-17-92 cluster.

    PubMed

    Cheng, Min; Zhang, Wen-jian; Xing, Tian-yu; Yan, Xiao-hong; Li, Yu-mao; Li, Hui; Wang, Ning

    2016-08-01

    miR-17-92 cluster plays important roles in cell proliferation, differentiation, apoptosis, animal development and tumorigenesis. The transcriptional regulation of miR-17-92 cluster has been extensively studied in mammals, but not in birds. To date, avian miR-17-92 cluster genomic structure has not been fully determined. The promoter location and sequence of miR-17-92 cluster have not been determined, due to the existence of a genomic gap sequence upstream of miR-17-92 cluster in all the birds whose genomes have been sequenced. In this study, genome walking was used to close the genomic gap upstream of chicken miR-17-92 cluster. In addition, bioinformatics analysis, reporter gene assay and truncation mutagenesis were used to investigate functional role of the genomic gap sequence. Genome walking analysis showed that the gap region was 1704 bp long, and its GC content was 80.11%. Bioinformatics analysis showed that in the gap region, there was a 200 bp conserved sequence among the tested 10 species (Gallus gallus, Homo sapiens, Pan troglodytes, Bos taurus, Sus scrofa, Rattus norvegicus, Mus musculus, Possum, Danio rerio, Rana nigromaculata), which is core promoter region of mammalian miR-17-92 host gene (MIR17HG). Promoter luciferase reporter gene vector of the gap region was constructed and reporter assay was performed. The result showed that the promoter activity of pGL3-cMIR17HG (-4228/-2506) was 417 times than that of negative control (empty pGL3 basic vector), suggesting that chicken miR-17-92 cluster promoter exists in the gap region. To further gain insight into the promoter structure, two different truncations for the cloned gap sequence were generated by PCR. One had a truncation of 448 bp at the 5'-end and the other had a truncation of 894 bp at the 3'-end. Further reporter analysis showed that compared with the promoter activity of pGL3-cMIR17HG (-4228/-2506), the reporter activities of the 5'-end truncation and the 3'-end truncation were reduced by 19.82% and 60.14%, respectively. These data demonstrated that the important promoter region of chicken miR-17-92 cluster is located in the -3400/-2506 bp region. Our results lay the foundation for revealing the transcriptional regulatory mechanisms of chicken miR-17-92 cluster.

  9. STAR FORMATION ACROSS THE W3 COMPLEX

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Román-Zúñiga, Carlos G.; Ybarra, Jason E.; Tapia, Mauricio

    We present a multi-wavelength analysis of the history of star formation in the W3 complex. Using deep, near-infrared ground-based images combined with images obtained with Spitzer and Chandra observatories, we identified and classified young embedded sources. We identified the principal clusters in the complex and determined their structure and extension. We constructed extinction-limited samples for five principal clusters and constructed K-band luminosity functions that we compare with those of artificial clusters with varying ages. This analysis provided mean ages and possible age spreads for the clusters. We found that IC 1795, the centermost cluster of the complex, still hosts amore » large fraction of young sources with circumstellar disks. This indicates that star formation was active in IC 1795 as recently as 2 Myr ago, simultaneous to the star-forming activity in the flanking embedded clusters, W3-Main and W3(OH). A comparison with carbon monoxide emission maps indicates strong velocity gradients in the gas clumps hosting W3-Main and W3(OH) and shows small receding clumps of gas at IC 1795, suggestive of rapid gas removal (faster than the T Tauri timescale) in the cluster-forming regions. We discuss one possible scenario for the progression of cluster formation in the W3 complex. We propose that early processes of gas collapse in the main structure of the complex could have defined the progression of cluster formation across the complex with relatively small age differences from one group to another. However, triggering effects could act as catalysts for enhanced efficiency of formation at a local level, in agreement with previous studies.« less

  10. Phylogeny of Bacteroides, Prevotella, and Porphyromonas spp. and related bacteria.

    PubMed Central

    Paster, B J; Dewhirst, F E; Olsen, I; Fraser, G J

    1994-01-01

    The phylogenetic structure of the bacteroides subgroup of the cytophaga-flavobacter-bacteroides (CFB) phylum was examined by 16S rRNA sequence comparative analysis. Approximately 95% of the 16S rRNA sequence was determined for 36 representative strains of species of Prevotella, Bacteroides, and Porphyromonas and related species by a modified Sanger sequencing method. A phylogenetic tree was constructed from a corrected distance matrix by the neighbor-joining method, and the reliability of tree branching was established by bootstrap analysis. The bacteroides subgroup was divided primarily into three major phylogenetic clusters which contained most of the species examined. The first cluster, termed the prevotella cluster, was composed of 16 species of Prevotella, including P. melaninogenica, P. intermedia, P. nigrescens, and the ruminal species P. ruminicola. Two oral species, P. zoogleoformans and P. heparinolytica, which had been recently placed in the genus Prevotella, did not fall within the prevotella cluster. These two species and six species of Bacteroides, including the type species B. fragilis, formed the second cluster, termed the bacteroides cluster. The third cluster, termed the porphyromonas cluster, was divided into two subclusters. The first contained Porphyromonas gingivalis, P. endodontalis, P. asaccharolytica, P. circumdentaria, P. salivosa, [Bacteroides] levii (the brackets around genus are used to indicate that the species does not belong to the genus by the sensu stricto definition), and [Bacteroides] macacae, and the second subcluster contained [Bacteroides] forsythus and [Bacteroides] distasonis. [Bacteroides] splanchnicus fell just outside the three major clusters but still belonged within the bacteroides subgroup. With few exceptions, the 16 S rRNA data were in overall agreement with previously proposed reclassifications of species of Bacteroides, Prevotella, and Porphyromonas. Suggestions are made to accommodate those species which do not fit previous reclassification schemes. PMID:8300528

  11. The X-ray luminosity functions of Abell clusters from the Einstein Cluster Survey

    NASA Technical Reports Server (NTRS)

    Burg, R.; Giacconi, R.; Forman, W.; Jones, C.

    1994-01-01

    We have derived the present epoch X-ray luminosity function of northern Abell clusters using luminosities from the Einstein Cluster Survey. The sample is sufficiently large that we can determine the luminosity function for each richness class separately with sufficient precision to study and compare the different luminosity functions. We find that, within each richness class, the range of X-ray luminosity is quite large and spans nearly a factor of 25. Characterizing the luminosity function for each richness class with a Schechter function, we find that the characteristic X-ray luminosity, L(sub *), scales with richness class as (L(sub *) varies as N(sub*)(exp gamma), where N(sub *) is the corrected, mean number of galaxies in a richness class, and the best-fitting exponent is gamma = 1.3 +/- 0.4. Finally, our analysis suggests that there is a lower limit to the X-ray luminosity of clusters which is determined by the integrated emission of the cluster member galaxies, and this also scales with richness class. The present sample forms a baseline for testing cosmological evolution of Abell-like clusters when an appropriate high-redshift cluster sample becomes available.

  12. Indoleamine Hallucinogens in Cluster Headache: Results of the Clusterbusters Medication Use Survey.

    PubMed

    Schindler, Emmanuelle A D; Gottschalk, Christopher H; Weil, Marsha J; Shapiro, Robert E; Wright, Douglas A; Sewell, Richard Andrew

    2015-01-01

    Cluster headache is one of the most debilitating pain syndromes. A significant number of patients are refractory to conventional therapies. The Clusterbusters.org medication use survey sought to characterize the effects of both conventional and alternative medications used in cluster headache. Participants were recruited from cluster headache websites and headache clinics. The final analysis included responses from 496 participants. The survey was modeled after previously published surveys and was available online. Most responses were chosen from a list, though others were free-texted. Conventional abortive and preventative medications were identified and their efficacies agreed with those previously published. The indoleamine hallucinogens, psilocybin, lysergic acid diethylamide, and lysergic acid amide, were comparable to or more efficacious than most conventional medications. These agents were also perceived to shorten/abort a cluster period and bring chronic cluster headache into remission more so than conventional medications. Furthermore, infrequent and non-hallucinogenic doses were reported to be efficacious. Findings provide additional evidence that several indoleamine hallucinogens are rated as effective in treating cluster headache. These data reinforce the need for further investigation of the effects of these and related compounds in cluster headache under experimentally controlled settings.

  13. EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.

    PubMed

    Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina

    2009-04-01

    In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.

  14. Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters

    PubMed Central

    Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

    2011-01-01

    The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. PMID:22144874

  15. Comparative microbial modules resource: generation and visualization of multi-species biclusters.

    PubMed

    Kacmarczyk, Thadeous; Waltman, Peter; Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

    2011-12-01

    The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. © 2011 Kacmarczyk et al.

  16. Optical–SZE scaling relations for DES optically selected clusters within the SPT-SZ Survey

    DOE PAGES

    Saro, A.; Bocquet, S.; Mohr, J.; ...

    2017-03-15

    We study the Sunyaev-Zel'dovich effect (SZE) signature in South Pole Telescope (SPT) data for an ensemble of 719 optically identified galaxy clusters selected from 124.6 degmore » $^2$ of the Dark Energy Survey (DES) science verification data, detecting a stacked SZE signal down to richness $$\\lambda\\sim20$$. The SZE signature is measured using matched-filtered maps of the 2500 deg$^2$ SPT-SZ survey at the positions of the DES clusters, and the degeneracy between SZE observable and matched-filter size is broken by adopting as priors SZE and optical mass-observable relations that are either calibrated using SPT selected clusters or through the Arnaud et al. (2010, A10) X-ray analysis. We measure the SPT signal to noise $$\\zeta$$-$$\\lambda$$, relation and two integrated Compton-$y$ $$Y_\\textrm{500}$$-$$\\lambda$$ relations for the DES-selected clusters and compare these to model expectations accounting for the SZE-optical center offset distribution. For clusters with $$\\lambda > 80$$, the two SPT calibrated scaling relations are consistent with the measurements, while for the A10-calibrated relation the measured SZE signal is smaller by a factor of $$0.61 \\pm 0.12$$ compared to the prediction. For clusters at $$20 < \\lambda < 80$$, the measured SZE signal is smaller by a factor of $$\\sim$$0.20-0.80 (between 2.3 and 10~$$\\sigma$$ significance) compared to the prediction, with the SPT calibrated scaling relations and larger $$\\lambda$$ clusters showing generally better agreement. We quantify the required corrections to achieve consistency, showing that there is a richness dependent bias that can be explained by some combination of contamination of the observables and biases in the estimated masses. We discuss possible physical effects, as contamination from line-of-sight projections or from point sources, larger offsets in the SZE-optical centering or larger scatter in the $$\\lambda$$-mass relation at lower richnesses.« less

  17. Optical–SZE scaling relations for DES optically selected clusters within the SPT-SZ Survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saro, A.; Bocquet, S.; Mohr, J.

    We study the Sunyaev-Zel'dovich effect (SZE) signature in South Pole Telescope (SPT) data for an ensemble of 719 optically identified galaxy clusters selected from 124.6 degmore » $^2$ of the Dark Energy Survey (DES) science verification data, detecting a stacked SZE signal down to richness $$\\lambda\\sim20$$. The SZE signature is measured using matched-filtered maps of the 2500 deg$^2$ SPT-SZ survey at the positions of the DES clusters, and the degeneracy between SZE observable and matched-filter size is broken by adopting as priors SZE and optical mass-observable relations that are either calibrated using SPT selected clusters or through the Arnaud et al. (2010, A10) X-ray analysis. We measure the SPT signal to noise $$\\zeta$$-$$\\lambda$$, relation and two integrated Compton-$y$ $$Y_\\textrm{500}$$-$$\\lambda$$ relations for the DES-selected clusters and compare these to model expectations accounting for the SZE-optical center offset distribution. For clusters with $$\\lambda > 80$$, the two SPT calibrated scaling relations are consistent with the measurements, while for the A10-calibrated relation the measured SZE signal is smaller by a factor of $$0.61 \\pm 0.12$$ compared to the prediction. For clusters at $$20 < \\lambda < 80$$, the measured SZE signal is smaller by a factor of $$\\sim$$0.20-0.80 (between 2.3 and 10~$$\\sigma$$ significance) compared to the prediction, with the SPT calibrated scaling relations and larger $$\\lambda$$ clusters showing generally better agreement. We quantify the required corrections to achieve consistency, showing that there is a richness dependent bias that can be explained by some combination of contamination of the observables and biases in the estimated masses. We discuss possible physical effects, as contamination from line-of-sight projections or from point sources, larger offsets in the SZE-optical centering or larger scatter in the $$\\lambda$$-mass relation at lower richnesses.« less

  18. Analyzing coastal environments by means of functional data analysis

    NASA Astrophysics Data System (ADS)

    Sierra, Carlos; Flor-Blanco, Germán; Ordoñez, Celestino; Flor, Germán; Gallego, José R.

    2017-07-01

    Here we used Functional Data Analysis (FDA) to examine particle-size distributions (PSDs) in a beach/shallow marine sedimentary environment in Gijón Bay (NW Spain). The work involved both Functional Principal Components Analysis (FPCA) and Functional Cluster Analysis (FCA). The grainsize of the sand samples was characterized by means of laser dispersion spectroscopy. Within this framework, FPCA was used as a dimension reduction technique to explore and uncover patterns in grain-size frequency curves. This procedure proved useful to describe variability in the structure of the data set. Moreover, an alternative approach, FCA, was applied to identify clusters and to interpret their spatial distribution. Results obtained with this latter technique were compared with those obtained by means of two vector approaches that combine PCA with CA (Cluster Analysis). The first method, the point density function (PDF), was employed after adapting a log-normal distribution to each PSD and resuming each of the density functions by its mean, sorting, skewness and kurtosis. The second applied a centered-log-ratio (clr) to the original data. PCA was then applied to the transformed data, and finally CA to the retained principal component scores. The study revealed functional data analysis, specifically FPCA and FCA, as a suitable alternative with considerable advantages over traditional vector analysis techniques in sedimentary geology studies.

  19. Classification of patients based on their evaluation of hospital outcomes: cluster analysis following a national survey in Norway

    PubMed Central

    2013-01-01

    Background A general trend towards positive patient-reported evaluations of hospitals could be taken as a sign that most patients form a homogeneous, reasonably pleased group, and consequently that there is little need for quality improvement. The objective of this study was to explore this assumption by identifying and statistically validating clusters of patients based on their evaluation of outcomes related to overall satisfaction, malpractice and benefit of treatment. Methods Data were collected using a national patient-experience survey of 61 hospitals in the 4 health regions in Norway during spring 2011. Postal questionnaires were mailed to 23,420 patients after their discharge from hospital. Cluster analysis was performed to identify response clusters of patients, based on their responses to single items about overall patient satisfaction, benefit of treatment and perception of malpractice. Results Cluster analysis identified six response groups, including one cluster with systematically poorer evaluation across outcomes (18.5% of patients) and one small outlier group (5.3%) with very poor scores across all outcomes. One-Way ANOVA with post-hoc tests showed that most differences between the six response groups on the three outcome items were significant. The response groups were significantly associated with nine patient-experience indicators (p < 0.001), and all groups were significantly different from each of the other groups on a majority of the patient-experience indicators. Clusters were significantly associated with age, education, self-perceived health, gender, and the degree to write open comments in the questionnaire. Conclusions The study identified five response clusters with distinct patient-reported outcome scores, in addition to a heterogeneous outlier group with very poor scores across all outcomes. The outlier group and the cluster with systematically poorer evaluation across outcomes comprised almost one-quarter of all patients, clearly demonstrating the need to tailor quality initiatives and improve patient-perceived quality in hospitals. More research on patient clustering in patient evaluation is needed, as well as standardization of methodology to increase comparability across studies. PMID:23433450

  20. Analysis of Intergrade Variables In The Fuzzy C-Means And Improved Algorithm Cat Swarm Optimization(FCM-ISO) In Search Segmentation

    NASA Astrophysics Data System (ADS)

    Saragih, Jepronel; Salim Sitompul, Opim; Situmorang, Zakaria

    2017-12-01

    One of the techniques known in Data Mining namely clustering. Image segmentation process does not always represent the actual image which is caused by a combination of algorithms as long as it has not been able to obtain optimal cluster centers. In this research will search for the smallest error with the counting result of a Fuzzy C Means process optimized with Cat swam Algorithm Optimization that has been developed by adding the weight of the energy in the process of Tracing Mode.So with the parameter can be determined the most optimal cluster centers and most closely with the data will be made the cluster. Weigh inertia in this research, namely: (0.1), (0.2), (0.3), (0.4), (0.5), (0.6), (0.7), (0.8) and (0.9). Then compare the results of each variable values inersia (W) which is different and taken the smallest results. Of this weighting analysis process can acquire the right produce inertia variable cost function the smallest.

  1. Non-invasive quantification of tumour heterogeneity in water diffusivity to differentiate malignant from benign tissues of urinary bladder: a phase I study.

    PubMed

    Nguyen, Huyen T; Shah, Zarine K; Mortazavi, Amir; Pohar, Kamal S; Wei, Lai; Jia, Guang; Zynger, Debra L; Knopp, Michael V

    2017-05-01

    To quantify the heterogeneity of the tumour apparent diffusion coefficient (ADC) using voxel-based analysis to differentiate malignancy from benign wall thickening of the urinary bladder. Nineteen patients with histopathological findings of their cystectomy specimen were included. A data set of voxel-based ADC values was acquired for each patient's lesion. Histogram analysis was performed on each data set to calculate uniformity (U) and entropy (E). The k-means clustering of the voxel-wised ADC data set was implemented to measure mean intra-cluster distance (MICD) and largest inter-cluster distance (LICD). Subsequently, U, E, MICD, and LICD for malignant tumours were compared with those for benign lesions using a two-sample t-test. Eleven patients had pathological confirmation of malignancy and eight with benign wall thickening. Histogram analysis showed that malignant tumours had a significantly higher degree of ADC heterogeneity with lower U (P = 0.016) and higher E (P = 0.005) than benign lesions. In agreement with these findings, k-means clustering of voxel-wise ADC indicated that bladder malignancy presented with significantly higher MICD (P < 0.001) and higher LICD (P = 0.002) than benign wall thickening. The quantitative assessment of tumour diffusion heterogeneity using voxel-based ADC analysis has the potential to become a non-invasive tool to distinguish malignant from benign tissues of urinary bladder cancer. • Heterogeneity is an intrinsic characteristic of tumoral tissue. • Non-invasive quantification of tumour heterogeneity can provide adjunctive information to improve cancer diagnosis accuracy. • Histogram analysis and k-means clustering can quantify tumour diffusion heterogeneity. • The quantification helps differentiate malignant from benign urinary bladder tissue.

  2. Functional Connectivity Parcellation of the Human Thalamus by Independent Component Analysis.

    PubMed

    Zhang, Sheng; Li, Chiang-Shan R

    2017-11-01

    As a key structure to relay and integrate information, the thalamus supports multiple cognitive and affective functions through the connectivity between its subnuclei and cortical and subcortical regions. Although extant studies have largely described thalamic regional functions in anatomical terms, evidence accumulates to suggest a more complex picture of subareal activities and connectivities of the thalamus. In this study, we aimed to parcellate the thalamus and examine whole-brain connectivity of its functional clusters. With resting state functional magnetic resonance imaging data from 96 adults, we used independent component analysis (ICA) to parcellate the thalamus into 10 components. On the basis of the independence assumption, ICA helps to identify how subclusters overlap spatially. Whole brain functional connectivity of each subdivision was computed for independent component's time course (ICtc), which is a unique time series to represent an IC. For comparison, we computed seed-region-based functional connectivity using the averaged time course across all voxels within a thalamic subdivision. The results showed that, at p < 10 -6 , corrected, 49% of voxels on average overlapped among subdivisions. Compared with seed-region analysis, ICtc analysis revealed patterns of connectivity that were more distinguished between thalamic clusters. ICtc analysis demonstrated thalamic connectivity to the primary motor cortex, which has eluded the analysis as well as previous studies based on averaged time series, and clarified thalamic connectivity to the hippocampus, caudate nucleus, and precuneus. The new findings elucidate functional organization of the thalamus and suggest that ICA clustering in combination with ICtc rather than seed-region analysis better distinguishes whole-brain connectivities among functional clusters of a brain region.

  3. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

    PubMed

    Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

    2015-01-01

    Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.

  4. Study on force mechanism for therapeutic effect of pushing manipulation with one-finger meditation base on similarity analysis of force and waveform.

    PubMed

    Fang, Lei; Fang, Min; Guo, Min-Min

    2016-12-27

    To reveal the force mechanism for therapeutic effect of pushing manipulation with one-finger meditation. A total of 15 participants were recruited in this study and assigned to an expert group, a skilled group and a novice group, with 5 participants in each group. Mechanical signals were collected from a biomechanical testing platform, and these data were further observed via similarity analysis and cluster analysis. Comparing the force waveforms of manipulation revealed that the manipulation forces were similar between the expert group and the skilled group (P>0.05). The mean value of vertical force was 9.8 N, and 95% CI rang from 6.37 to 14.70 N, but there were significant differences compared with the novice group (P<0.05). The result of overall similarity coefficient cluster analysis showed that two kinds of manipulation forces curves were existed between the expert group and the skilled group. Pushing manipulation with one-finger meditation is a kind of light stimulation manipulation on the acupoint, and force characteristics of double waveforms continuously alternated during manual operation.

  5. Illuminating the clinical significance of alexithymia subtypes: A cluster analysis of alexithymic traits and psychiatric symptoms.

    PubMed

    Kajanoja, J; Scheinin, N M; Karlsson, L; Karlsson, H; Karukivi, M

    2017-06-01

    Alexithymia is a personality construct involving difficulties identifying and verbalizing feelings, and an externally oriented thinking style. There is preliminary evidence for alexithymia subtypes that may carry different risk profiles for psychiatric illness. The aim of this study was to gain support for the existence of alexithymia subtypes and further characterize their clinical relevance. To identify possible subtypes, a cluster analysis was conducted for individuals with high alexithymic traits (N=113). Current depressive and anxiety symptoms, self-reported psychiatric medical history, and self-reported early life adversity were compared between subtypes. The cluster analysis was replicated with the low (N=2471) and moderate (N=290) alexithymia groups. We identified two alexithymia subtypes. Compared to type A, type B alexithymia was associated with higher levels of difficulties in identifying feelings, and was more strongly associated with current depressive (Cohen's d=0.77, p<0.001) and anxiety symptoms (Cohen's d=0.82, p<0.001), and self-reported early life adversity (Cohen's d 0.42, p=0.048). Compared to type A, type B alexithymia was also associated with a higher prevalence of self-reported diagnosis of major depressive- (30.2% vs. 8.3%) and anxiety disorder (18.9% vs. 3.3%). The results of this study support the hypothesis of alexithymia subtypes, and add support to the growing evidence showing that alexithymia is likely a heterogeneous and dimensional phenomenon. The subtype (type B) with most pronounced difficulties in identifying feelings may be associated with a higher risk for psychiatric illness compared to type A alexithymia, and may exhibit a more severe history of early life adversity. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. [Study on HPLC fingerprint of Oldenlandia diffusa].

    PubMed

    Chen, Yan; Yao, Zhi-Hong; Dai, Yi; Cheng, Hong; Wen, Li-Rong; Zhou, Guang-Xiong; Yao, Xin-Sheng

    2012-06-01

    To establish the HPLC fingerprint chromatogram of Oldenlandia diffusa coupled with chemometrics means for the quality control of multi-batches of medicinal material. The separation was developed on C18 column(4.6 mm x 250 mm, 5 microm) by gradient elution with acetonitrile-water(both containing 0.1 per thousand (V/V) ocetic acid) as mobile phase at a flow rate of 0.8 mL/min, the detection wavelength at 238 nm and column temperature at 30 degrees C. The HPLC fingerprint chromatogram of Oldenlandia diffusa was set up and the main characteristic peaks were identified by comparing with chemical reference substance. The quality of 22 batches of medicinal material was evaluated by similarity assay as well as principal component analysis (PCA) and cluster analysis. The established HPLC fingerprint chromatogram of Oldenlandia diffusa was specific, precise, reproducible and stable. 11 peaks were chemically identified. The similarity of 17 batches of Oldenlandia diffusa was obviously higher than 5 batches of adulterants. PCA showed that 17 batches of Oldenlandia diffusa were in a domain and 5 batches of adulterants were far apart from the domain. The cluster analysis of the 22 batches of medicinal material showed that 17 batches of Oldenlandia diffusa were in a cluster while 5 batches of adulterants were excluded. Further cluster analysis was carried out for the quality consistency of 17 batches of Oldenlandia diffusa and accordingly they were devided into 4 clusters. With the combination of chemometrics means, the HPLC fingerprint chromatogram provides a method for evaluation of authenticity and quality control of Oldenlandia diffusa, which is favorable to improve overall quality control of Oldenlandia diffusa.

  7. Semi-supervised clustering for parcellating brain regions based on resting state fMRI data

    NASA Astrophysics Data System (ADS)

    Cheng, Hewei; Fan, Yong

    2014-03-01

    Many unsupervised clustering techniques have been adopted for parcellating brain regions of interest into functionally homogeneous subregions based on resting state fMRI data. However, the unsupervised clustering techniques are not able to take advantage of exiting knowledge of the functional neuroanatomy readily available from studies of cytoarchitectonic parcellation or meta-analysis of the literature. In this study, we propose a semi-supervised clustering method for parcellating amygdala into functionally homogeneous subregions based on resting state fMRI data. Particularly, the semi-supervised clustering is implemented under the framework of graph partitioning, and adopts prior information and spatial consistent constraints to obtain a spatially contiguous parcellation result. The graph partitioning problem is solved using an efficient algorithm similar to the well-known weighted kernel k-means algorithm. Our method has been validated for parcellating amygdala into 3 subregions based on resting state fMRI data of 28 subjects. The experiment results have demonstrated that the proposed method is more robust than unsupervised clustering and able to parcellate amygdala into centromedial, laterobasal, and superficial parts with improved functionally homogeneity compared with the cytoarchitectonic parcellation result. The validity of the parcellation results is also supported by distinctive functional and structural connectivity patterns of the subregions and high consistency between coactivation patterns derived from a meta-analysis and functional connectivity patterns of corresponding subregions.

  8. Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland.

    PubMed

    Sasidharan, Lekshmi; Wu, Kun-Feng; Menendez, Monica

    2015-12-01

    One of the major challenges in traffic safety analyses is the heterogeneous nature of safety data, due to the sundry factors involved in it. This heterogeneity often leads to difficulties in interpreting results and conclusions due to unrevealed relationships. Understanding the underlying relationship between injury severities and influential factors is critical for the selection of appropriate safety countermeasures. A method commonly employed to address systematic heterogeneity is to focus on any subgroup of data based on the research purpose. However, this need not ensure homogeneity in the data. In this paper, latent class cluster analysis is applied to identify homogenous subgroups for a specific crash type-pedestrian crashes. The manuscript employs data from police reported pedestrian (2009-2012) crashes in Switzerland. The analyses demonstrate that dividing pedestrian severity data into seven clusters helps in reducing the systematic heterogeneity of the data and to understand the hidden relationships between crash severity levels and socio-demographic, environmental, vehicle, temporal, traffic factors, and main reason for the crash. The pedestrian crash injury severity models were developed for the whole data and individual clusters, and were compared using receiver operating characteristics curve, for which results favored clustering. Overall, the study suggests that latent class clustered regression approach is suitable for reducing heterogeneity and revealing important hidden relationships in traffic safety analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. A cluster-based strategy for assessing the overlap between large chemical libraries and its application to a recent acquisition.

    PubMed

    Engels, Michael F M; Gibbs, Alan C; Jaeger, Edward P; Verbinnen, Danny; Lobanov, Victor S; Agrafiotis, Dimitris K

    2006-01-01

    We report on the structural comparison of the corporate collections of Johnson & Johnson Pharmaceutical Research & Development (JNJPRD) and 3-Dimensional Pharmaceuticals (3DP), performed in the context of the recent acquisition of 3DP by JNJPRD. The main objective of the study was to assess the druglikeness of the 3DP library and the extent to which it enriched the chemical diversity of the JNJPRD corporate collection. The two databases, at the time of acquisition, collectively contained more than 1.1 million compounds with a clearly defined structural description. The analysis was based on a clustering approach and aimed at providing an intuitive quantitative estimate and visual representation of this enrichment. A novel hierarchical clustering algorithm called divisive k-means was employed in combination with Kelley's cluster-level selection method to partition the combined data set into clusters, and the diversity contribution of each library was evaluated as a function of the relative occupancy of these clusters. Typical 3DP chemotypes enriching the diversity of the JNJPRD collection were catalogued and visualized using a modified maximum common substructure algorithm. The joint collection of JNJPRD and 3DP compounds was also compared to other databases of known medicinally active or druglike compounds. The potential of the methodology for the analysis of very large chemical databases is discussed.

  10. The Influence of the Host Plant Is the Major Ecological Determinant of the Presence of Nitrogen-Fixing Root Nodule Symbiont Cluster II Frankia Species in Soil

    PubMed Central

    Battenberg, Kai; Wren, Jannah A.; Hillman, Janell; Edwards, Joseph; Huang, Liujing

    2016-01-01

    ABSTRACT The actinobacterial genus Frankia establishes nitrogen-fixing root nodule symbioses with specific hosts within the nitrogen-fixing plant clade. Of four genetically distinct subgroups of Frankia, cluster I, II, and III strains are capable of forming effective nitrogen-fixing symbiotic associations, while cluster IV strains generally do not. Cluster II Frankia strains have rarely been detected in soil devoid of host plants, unlike cluster I or III strains, suggesting a stronger association with their host. To investigate the degree of host influence, we characterized the cluster II Frankia strain distribution in rhizosphere soil in three locations in northern California. The presence/absence of cluster II Frankia strains at a given site correlated significantly with the presence/absence of host plants on the site, as determined by glutamine synthetase (glnA) gene sequence analysis, and by microbiome analysis (16S rRNA gene) of a subset of host/nonhost rhizosphere soils. However, the distribution of cluster II Frankia strains was not significantly affected by other potential determinants such as host-plant species, geographical location, climate, soil pH, or soil type. Rhizosphere soil microbiome analysis showed that cluster II Frankia strains occupied only a minute fraction of the microbiome even in the host-plant-present site and further revealed no statistically significant difference in the α-diversity or in the microbiome composition between the host-plant-present or -absent sites. Taken together, these data suggest that host plants provide a factor that is specific for cluster II Frankia strains, not a general growth-promoting factor. Further, the factor accumulates or is transported at the site level, i.e., beyond the host rhizosphere. IMPORTANCE Biological nitrogen fixation is a bacterial process that accounts for a major fraction of net new nitrogen input in terrestrial ecosystems. Transfer of fixed nitrogen to plant biomass is especially efficient via root nodule symbioses, which represent evolutionarily and ecologically specialized mutualistic associations. Frankia spp. (Actinobacteria), especially cluster II Frankia spp., have an extremely broad host range, yet comparatively little is known about the soil ecology of these organisms in relation to the host plants and their rhizosphere microbiomes. This study reveals a strong influence of the host plant on soil distribution of cluster II Frankia spp. PMID:27795313

  11. An analysis of the currently available calibrations in Strömgren photometry by using open clusters

    NASA Astrophysics Data System (ADS)

    Jordi, C.; Masana, E.; Figueras, F.; Torra, J.

    1997-05-01

    In recent years, several authors have revised the calibrations used to compute physical parameters (Mv, Teff, log g, [Fe/H]) from intrinsic colours in the uvby H_beta photometric system. For reddened stars, these intrinsic colours can be computed through the standard relations among colour indices for each of the regions defined by \\cite[Stromgren (1966)]{str66} on the HR diagram. We present a discussion of the coherence of these calibrations for main-sequence stars. Stars from open clusters are used to carry out this analysis. Assuming that individual reddening values and distances should be similar for all the members of a given open cluster, systematic differences among the calibrations used in each of the photometric regions might arise when comparing mean reddening values and distances for the members of each region. To classify the stars into Stromgren's regions we extended the algorithm presented by \\cite[Figueras et al. (1991)]{fig91} to a wider range of spectral types and luminosity classes. The observational ZAMS are compared with the theoretical ZAMS from stellar evolutionary models, in the range 6500-30000 K. The discrepancies are also discussed.

  12. Quantifying humpback whale song sequences to understand the dynamics of song exchange at the ocean basin scale.

    PubMed

    Garland, Ellen C; Noad, Michael J; Goldizen, Anne W; Lilley, Matthew S; Rekdahl, Melinda L; Garrigue, Claire; Constantine, Rochelle; Daeschler Hauser, Nan; Poole, M Michael; Robbins, Jooke

    2013-01-01

    Humpback whales have a continually evolving vocal sexual display, or "song," that appears to undergo both evolutionary and "revolutionary" change. All males within a population adhere to the current content and arrangement of the song. Populations within an ocean basin share similarities in their songs; this sharing is complex as multiple variations of the song (song types) may be present within a region at any one time. To quantitatively investigate the similarity of song types, songs were compared at both the individual singer and population level using the Levenshtein distance technique and cluster analysis. The highly stereotyped sequences of themes from the songs of 211 individuals from populations within the western and central South Pacific region from 1998 through 2008 were grouped together based on the percentage of song similarity, and compared to qualitatively assigned song types. The analysis produced clusters of highly similar songs that agreed with previous qualitative assignments. Each cluster contained songs from multiple populations and years, confirming the eastward spread of song types and their progressive evolution through the study region. Quantifying song similarity and exchange will assist in understanding broader song dynamics and contribute to the use of vocal displays as population identifiers.

  13. Insight into Energy Conservation via Alternative Carbon Monoxide Metabolism in Carboxydothermus pertinax Revealed by Comparative Genome Analysis.

    PubMed

    Fukuyama, Yuto; Omae, Kimiho; Yoneda, Yasuko; Yoshida, Takashi; Sako, Yoshihiko

    2018-05-04

    Carboxydothermus species are some of the most studied thermophilic carboxydotrophs. Their varied carboxydotrophic growth properties suggest distinct strategies for energy conservation via CO metabolism. In this study, we used comparative genome analysis of the genus Carboxydothermus to show variations in the CO dehydrogenase/energy-converting hydrogenase gene cluster, which is responsible for CO metabolism with H 2 production (hydrogenogenic CO metabolism). Indeed, ability or inability to produce H 2 with CO oxidation is explained by the presence or absence of this gene cluster in C. hydrogenoformans , C. islandicus , and C. ferrireducens Interestingly, despite its hydrogenogenic CO metabolism, C. pertinax lacks the Ni-CO dehydrogenase catalytic subunit (CooS-I) and its transcriptional regulator encoding genes in this gene cluster probably due to inversion. Transcriptional analysis in C. pertinax showed that the Ni-CO dehydrogenase gene ( cooS-II ) and distantly encoded energy-converting hydrogenase related genes were remarkably upregulated under 100% CO. In addition, when thiosulfate was available as a terminal electron acceptor under 100% CO, C. pertinax maximum cell density and maximum specific growth rate were 3.1-fold and 1.5-fold higher, respectively, than when thiosulfate was absent. The amount of H 2 produced was only 63% of the consumed CO, less than expected according to hydrogenogenic CO oxidation: CO + H 2 O → CO 2 + H 2 Accordingly, C. pertinax would couple CO oxidation by Ni-CO dehydrogenase-II with simultaneous reduction of not only H 2 O but thiosulfate when grown under 100% CO. IMPORTANCE Anaerobic hydrogenogenic carboxydotrophs are thought to fill a vital niche with scavenging potentially toxic CO and producing H 2 as available energy source for thermophilic microbes. This hydrogenogenic carboxydotrophy relies on a Ni-CO dehydrogenase/energy-converting hydrogenase gene cluster. This feature is thought to be as common to these organisms. However, hydrogenogenic carboxydotroph, Carboxydothermus pertinax lacks the gene for the Ni-CO dehydrogenase catalytic subunit encoded in the gene cluster. Here, we performed a comparative genome analysis of the genus Carboxydothermus , transcriptional analysis, and cultivation study under 100% CO to prove their hydrogenogenic CO metabolism. Results revealed that C. pertinax could couple Ni-CO dehydrogenase-II alternatively to the distal energy-converting hydrogenase. Furthermore, C. pertinax represents an example of the functioning of Ni-CO dehydrogenase which does not always correspond with its genomic context owing to the versatility of CO metabolism and the low redox potential of CO. Copyright © 2018 American Society for Microbiology.

  14. Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel’dovich Survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schrabback, T.; Applegate, D.; Dietrich, J. P.

    Here we present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z median = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev–Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass–observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration–mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass–temperature scaling relation ln (E(z)M 500c/10 14 M ⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81more » $$+0.24\\atop{-0.14}$$(stat.)±0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c 200c=5.6$$+3.7\\atop{-1.8}$$.« less

  15. Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel’dovich Survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schrabback, T.; Applegate, D.; Dietrich, J. P.

    We present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z(median) = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in Vmore » - I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration-mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass-temperature scaling relation ln (E(z) M-500c/10(14)M(circle dot)) = A + 1.5ln (kT/7.2 keV) to A = 1.81(-0.14)(+0.24)(stat.)+/- 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c(200c) = 5.6(-1.8)(+3.7).« less

  16. Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel’dovich Survey

    DOE PAGES

    Schrabback, T.; Applegate, D.; Dietrich, J. P.; ...

    2017-10-14

    Here we present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z median = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev–Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass–observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration–mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass–temperature scaling relation ln (E(z)M 500c/10 14 M ⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81more » $$+0.24\\atop{-0.14}$$(stat.)±0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c 200c=5.6$$+3.7\\atop{-1.8}$$.« less

  17. PCA based clustering for brain tumor segmentation of T1w MRI images.

    PubMed

    Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay

    2017-03-01

    Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  18. Fuzzy cluster analysis of high-field functional MRI data.

    PubMed

    Windischberger, Christian; Barth, Markus; Lamm, Claus; Schroeder, Lee; Bauer, Herbert; Gur, Ruben C; Moser, Ewald

    2003-11-01

    Functional magnetic resonance imaging (fMRI) based on blood-oxygen level dependent (BOLD) contrast today is an established brain research method and quickly gains acceptance for complementary clinical diagnosis. However, neither the basic mechanisms like coupling between neuronal activation and haemodynamic response are known exactly, nor can the various artifacts be predicted or controlled. Thus, modeling functional signal changes is non-trivial and exploratory data analysis (EDA) may be rather useful. In particular, identification and separation of artifacts as well as quantification of expected, i.e. stimulus correlated, and novel information on brain activity is important for both, new insights in neuroscience and future developments in functional MRI of the human brain. After an introduction on fuzzy clustering and very high-field fMRI we present several examples where fuzzy cluster analysis (FCA) of fMRI time series helps to identify and locally separate various artifacts. We also present and discuss applications and limitations of fuzzy cluster analysis in very high-field functional MRI: differentiate temporal patterns in MRI using (a) a test object with static and dynamic parts, (b) artifacts due to gross head motion artifacts. Using a synthetic fMRI data set we quantitatively examine the influences of relevant FCA parameters on clustering results in terms of receiver-operator characteristics (ROC) and compare them with a commonly used model-based correlation analysis (CA) approach. The application of FCA in analyzing in vivo fMRI data is shown for (a) a motor paradigm, (b) data from multi-echo imaging, and (c) a fMRI study using mental rotation of three-dimensional cubes. We found that differentiation of true "neural" from false "vascular" activation is possible based on echo time dependence and specific activation levels, as well as based on their signal time-course. Exploratory data analysis methods in general and fuzzy cluster analysis in particular may help to identify artifacts and add novel and unexpected information valuable for interpretation, classification and characterization of functional MRI data which can be used to design new data acquisition schemes, stimulus presentations, neuro(physio)logical paradigms, as well as to improve quantitative biophysical models.

  19. An image processing pipeline to detect and segment nuclei in muscle fiber microscopic images.

    PubMed

    Guo, Yanen; Xu, Xiaoyin; Wang, Yuanyuan; Wang, Yaming; Xia, Shunren; Yang, Zhong

    2014-08-01

    Muscle fiber images play an important role in the medical diagnosis and treatment of many muscular diseases. The number of nuclei in skeletal muscle fiber images is a key bio-marker of the diagnosis of muscular dystrophy. In nuclei segmentation one primary challenge is to correctly separate the clustered nuclei. In this article, we developed an image processing pipeline to automatically detect, segment, and analyze nuclei in microscopic image of muscle fibers. The pipeline consists of image pre-processing, identification of isolated nuclei, identification and segmentation of clustered nuclei, and quantitative analysis. Nuclei are initially extracted from background by using local Otsu's threshold. Based on analysis of morphological features of the isolated nuclei, including their areas, compactness, and major axis lengths, a Bayesian network is trained and applied to identify isolated nuclei from clustered nuclei and artifacts in all the images. Then a two-step refined watershed algorithm is applied to segment clustered nuclei. After segmentation, the nuclei can be quantified for statistical analysis. Comparing the segmented results with those of manual analysis and an existing technique, we find that our proposed image processing pipeline achieves good performance with high accuracy and precision. The presented image processing pipeline can therefore help biologists increase their throughput and objectivity in analyzing large numbers of nuclei in muscle fiber images. © 2014 Wiley Periodicals, Inc.

  20. An orbital and electron density analysis of weak interactions in ethanol-water, methanol-water, ethanol and methanol small clusters.

    PubMed

    Mejía, Sol M; Flórez, Elizabeth; Mondragón, Fanor

    2012-04-14

    A computational study of (ethanol)(n)-water, n = 1 to 5 heteroclusters was carried out employing the B3LYP∕6-31+G(d) approach. The molecular (MO) and atomic (AO) orbital analysis and the topological study of the electron density provided results that were successfully correlated. Results were compared with those obtained for (ethanol)(n), (methanol)(n), n = 1 to 6 clusters and (methanol)(n)-water, n = 1 to 5 heteroclusters. These systems showed the same trends observed in the (ethanol)(n)-water, n = 1 to 5 heteroclusters such as an O---O distance of 5 Å to which the O-H---O hydrogen bonds (HBs) can have significant influence on the constituent monomers. The HOMO of the hetero(clusters) is less stable than the HOMO of the isolated alcohol monomer as the hetero(cluster) size increases, that destabilization is higher for linear geometries than for cyclic geometries. Changes of the occupancy and energy of the AO are correlated with the strength of O-H---O and C-H---O HBs as well as with the proton donor and/or acceptor character of the involved molecules. In summary, the current MO and AO analysis provides alternative ways to characterize HBs. However, this analysis cannot be applied to the study of H---H interactions observed in the molecular graphs.

Top