Sample records for including cluster analysis

  1. Variable number of tandem repeats and pulsed-field gel electrophoresis cluster analysis of enterohemorrhagic Escherichia coli serovar O157 strains.

    PubMed

    Yokoyama, Eiji; Uchimura, Masako

    2007-11-01

    Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.

  2. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

    PubMed

    Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

    2015-01-01

    Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.

  3. Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

    NASA Astrophysics Data System (ADS)

    Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

    2017-08-01

    Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.

  4. Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis.

    PubMed

    Rennard, Stephen I; Locantore, Nicholas; Delafont, Bruno; Tal-Singer, Ruth; Silverman, Edwin K; Vestbo, Jørgen; Miller, Bruce E; Bakke, Per; Celli, Bartolomé; Calverley, Peter M A; Coxson, Harvey; Crim, Courtney; Edwards, Lisa D; Lomas, David A; MacNee, William; Wouters, Emiel F M; Yates, Julie C; Coca, Ignacio; Agustí, Alvar

    2015-03-01

    Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease that likely includes clinically relevant subgroups. To identify subgroups of COPD in ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) subjects using cluster analysis and to assess clinically meaningful outcomes of the clusters during 3 years of longitudinal follow-up. Factor analysis was used to reduce 41 variables determined at recruitment in 2,164 patients with COPD to 13 main factors, and the variables with the highest loading were used for cluster analysis. Clusters were evaluated for their relationship with clinically meaningful outcomes during 3 years of follow-up. The relationships among clinical parameters were evaluated within clusters. Five subgroups were distinguished using cross-sectional clinical features. These groups differed regarding outcomes. Cluster A included patients with milder disease and had fewer deaths and hospitalizations. Cluster B had less systemic inflammation at baseline but had notable changes in health status and emphysema extent. Cluster C had many comorbidities, evidence of systemic inflammation, and the highest mortality. Cluster D had low FEV1, severe emphysema, and the highest exacerbation and COPD hospitalization rate. Cluster E was intermediate for most variables and may represent a mixed group that includes further clusters. The relationships among clinical variables within clusters differed from that in the entire COPD population. Cluster analysis using baseline data in ECLIPSE identified five COPD subgroups that differ in outcomes and inflammatory biomarkers and show different relationships between clinical parameters, suggesting the clusters represent clinically and biologically different subtypes of COPD.

  5. Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

    PubMed

    van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

    2017-01-01

    In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.

  6. Description and typology of intensive Chios dairy sheep farms in Greece.

    PubMed

    Gelasakis, A I; Valergakis, G E; Arsenos, G; Banos, G

    2012-06-01

    The aim was to assess the intensified dairy sheep farming systems of the Chios breed in Greece, establishing a typology that may properly describe and characterize them. The study included the total of the 66 farms of the Chios sheep breeders' cooperative Macedonia. Data were collected using a structured direct questionnaire for in-depth interviews, including questions properly selected to obtain a general description of farm characteristics and overall management practices. A multivariate statistical analysis was used on the data to obtain the most appropriate typology. Initially, principal component analysis was used to produce uncorrelated variables (principal components), which would be used for the consecutive cluster analysis. The number of clusters was decided using hierarchical cluster analysis, whereas, the farms were allocated in 4 clusters using k-means cluster analysis. The identified clusters were described and afterward compared using one-way ANOVA or a chi-squared test. The main differences were evident on land availability and use, facility and equipment availability and type, expansion rates, and application of preventive flock health programs. In general, cluster 1 included newly established, intensive, well-equipped, specialized farms and cluster 2 included well-established farms with balanced sheep and feed/crop production. In cluster 3 were assigned small flock farms focusing more on arable crops than on sheep farming with a tendency to evolve toward cluster 2, whereas cluster 4 included farms representing a rather conservative form of Chios sheep breeding with low/intermediate inputs and choosing not to focus on feed/crop production. In the studied set of farms, 4 different farmer attitudes were evident: 1) farming disrupts sheep breeding; feed should be purchased and economies of scale will decrease costs (mainly cluster 1), 2) only exercise/pasture land is necessary; at least part of the feed (pasture) must be home-grown to decrease costs (clusters 1 and 4), 3) providing pasture to sheep is essential; on-farm feed production decreases costs (mainly cluster 3), and 4) large-scale farming (feed production and cash crops) does not disrupt sheep breeding; all feed must be produced on-farm to decrease costs (mainly cluster 3). Conducting a profitability analysis among different clusters, exploring and discovering the most beneficial levels of intensified management and capital investment should now be considered. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  7. Ecological tolerances of Miocene larger benthic foraminifera from Indonesia

    NASA Astrophysics Data System (ADS)

    Novak, Vibor; Renema, Willem

    2018-01-01

    To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.

  8. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    PubMed

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  9. Orbit Clustering Based on Transfer Cost

    NASA Technical Reports Server (NTRS)

    Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

    2013-01-01

    We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.

  10. Distinct Phenotypes of Cigarette Smokers Identified by Cluster Analysis of Patients with Severe Asthma.

    PubMed

    Konno, Satoshi; Taniguchi, Natsuko; Makita, Hironi; Nakamaru, Yuji; Shimizu, Kaoruko; Shijubo, Noriharu; Fuke, Satoshi; Takeyabu, Kimihiro; Oguri, Mitsuru; Kimura, Hirokazu; Maeda, Yukiko; Suzuki, Masaru; Nagai, Katsura; Ito, Yoichi M; Wenzel, Sally E; Nishimura, Masaharu

    2015-12-01

    Smoking may have multifactorial effects on asthma phenotypes, particularly in severe asthma. Cluster analysis has been applied to explore novel phenotypes, which are not based on any a priori hypotheses. To explore novel severe asthma phenotypes by cluster analysis when including cigarette smokers. We recruited a total of 127 subjects with severe asthma, including 59 current or ex-smokers, from our university hospital and its 29 affiliated hospitals/pulmonary clinics. Twelve clinical variables obtained during a 2-day hospital stay were used for cluster analysis. After clustering using clinical variables, the sputum levels of 14 molecules were measured to biologically characterize the clinical clusters. Five clinical clusters were identified, including two characterized by high pack-year exposure to cigarette smoking and low FEV1/FVC. There were marked differences between the two clusters of cigarette smokers. One had high levels of circulating eosinophils, high IgE levels, and a high sinus disease score. The other was characterized by low levels of the same parameters. Sputum analysis revealed increased levels of IL-5 in the former cluster and increased levels of IL-6 and osteopontin in the latter. The other three clusters were similar to those previously reported: young onset/atopic, nonsmoker/less eosinophilic, and female/obese. Key clinical variables were confirmed to be stable and consistent 1 year later. This study reveals two distinct phenotypes of severe asthma in current and former cigarette smokers with potentially different biological pathways contributing to fixed airflow limitation. Clinical trial registered with www.umin.ac.jp (000003254).

  11. clusterProfiler: an R package for comparing biological themes among gene clusters.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

    2012-05-01

    Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.

  12. Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

    PubMed

    Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

    2017-05-01

    The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.

  13. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review

    PubMed Central

    Morris, Tom; Gray, Laura

    2017-01-01

    Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637

  14. Identification and characterization of near-fatal asthma phenotypes by cluster analysis.

    PubMed

    Serrano-Pariente, J; Rodrigo, G; Fiz, J A; Crespo, A; Plaza, V

    2015-09-01

    Near-fatal asthma (NFA) is a heterogeneous clinical entity and several profiles of patients have been described according to different clinical, pathophysiological and histological features. However, there are no previous studies that identify in a unbiased way--using statistical methods such as clusters analysis--different phenotypes of NFA. Therefore, the aim of the present study was to identify and to characterize phenotypes of near fatal asthma using a cluster analysis. Over a period of 2 years, 33 Spanish hospitals enrolled 179 asthmatics admitted for an episode of NFA. A cluster analysis using two-steps algorithm was performed from data of 84 of these cases. The analysis defined three clusters of patients with NFA: cluster 1, the largest, including older patients with clinical and therapeutic criteria of severe asthma; cluster 2, with an high proportion of respiratory arrest (68%), impaired consciousness level (82%) and mechanical ventilation (93%); and cluster 3, which included younger patients, characterized by an insufficient anti-inflammatory treatment and frequent sensitization to Alternaria alternata and soybean. These results identify specific asthma phenotypes involved in NFA, confirming in part previous findings observed in studies with a clinical approach. The identification of patients with a specific NFA phenotype could suggest interventions to prevent future severe asthma exacerbations. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. [Study of the clinical phenotype of symptomatic chronic airways disease by hierarchical cluster analysis and two-step cluster analyses].

    PubMed

    Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M

    2016-09-01

    To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire(SGRQ) score, acute exacerbation in the past one year, PEF variability and allergic dermatitis (P<0.05). (2) Four clusters were also identified by two-step cluster analysis as followings, cluster 1, COPD patients with moderate to severe airflow limitation; cluster 2, asthma and COPD patients with heavy smoking, airflow limitation and increased airways reversibility; cluster 3, patients having less smoking and normal pulmonary function with wheezing but no chronic cough; cluster 4, chronic bronchitis patients with normal pulmonary function and chronic cough. Significant differences were revealed regarding gender distribution, respiratory symptoms, pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, MMEF% pred, DLCO/VA% pred, RV% pred, PEF variability, total serum IgE level, cumulative tobacco cigarette consumption (pack-years), and SGRQ score (P<0.05). By different cluster analyses, distinct clinical phenotypes of chronic airway diseases are identified. Thus, individualized treatments may guide doctors to provide based on different phenotypes.

  16. Globular Cluster Abundances from High-resolution, Integrated-light Spectroscopy. II. Expanding the Metallicity Range for Old Clusters and Updated Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew

    2017-01-01

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ˜0.1 dex for GCs with metallicities as high as [Fe/H] = -0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na I, Mg I, Al I, Si I, Ca I, Ti I, Ti II, Sc II, V I, Cr I, Mn I, Co I, Ni I, Cu I, Y II, Zr I, Ba II, La II, Nd II, and Eu II. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe I, Ca I, Si I, Ni I, and Ba II. The elements that show the greatest differences include Mg I and Zr I. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.

  17. Near real-time space-time cluster analysis for detection of enteric disease outbreaks in a community setting.

    PubMed

    Glatman-Freedman, Aharona; Kaufman, Zalman; Kopel, Eran; Bassal, Ravit; Taran, Diana; Valinsky, Lea; Agmon, Vered; Shpriz, Manor; Cohen, Daniel; Anis, Emilia; Shohat, Tamy

    2016-08-01

    To enhance timely surveillance of bacterial enteric pathogens, space-time cluster analysis was introduced in Israel in May 2013. Stool isolation data of Salmonella, Shigella, and Campylobacter from patients of a large Health Maintenance Organization were analyzed weekly by ArcGIS and SaTScan, and cluster results were sent promptly to local departments of health (LDOHs). During eighteen months, we identified 52 Shigella sonnei clusters, two Salmonella clusters, and no Campylobacter clusters. S. sonnei clusters lasted from one to 33 days and included three to 30 individuals. Thirty-one (60%) of the S. sonnei clusters were known to LDOHs prior to cluster analysis. Clusters not previously known by the LDOHs prompted epidemiologic investigations. In 31 of the 37 (84%) confirmed clusters, educational institutes (nursery schools, kindergartens, and a primary school) were involved. Cluster analysis demonstrated capability to complement enteric disease surveillance. Scaling up the system can further enhance timely detection and control of outbreaks. Copyright © 2016 The British Infection Association. Published by Elsevier Ltd. All rights reserved.

  18. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    PubMed Central

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082

  19. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    PubMed

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  20. Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations

    NASA Astrophysics Data System (ADS)

    Žibert, Janez; Pražnikar, Jure

    2012-09-01

    The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of the diurnal concentration showed that various different day-profiles are presented in a cold period, while this is not the case for the hot season. Additional analysis of ship traffic and rain fall data showed that there is no statistically significant difference between the ship gross (bruto) registered tonnage (BRT) values in the case of BC and PM10 clusters, but that there is statistically significant differences between the rain fall in the BC and PM10 clusters. The wind-rose for clusters which included most days in the sampling period indicating that emitted PM10 and BC from Port of Koper were manly transported in the west direction over the sea and in the east direction, where there is in no populated area. Presented analysis showed that both BC and PM10 concentrations were driven by rain intensity and wind speed.

  1. Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis.

    PubMed

    Moore, Wendy C; Hastie, Annette T; Li, Xingnan; Li, Huashi; Busse, William W; Jarjour, Nizar N; Wenzel, Sally E; Peters, Stephen P; Meyers, Deborah A; Bleecker, Eugene R

    2014-06-01

    Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified 5 asthma subphenotypes that represent the severity spectrum of early-onset allergic asthma, late-onset severe asthma, and severe asthma with chronic obstructive pulmonary disease characteristics. Analysis of induced sputum from a subset of SARP subjects showed 4 sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophil (≥2%) and neutrophil (≥40%) percentages had characteristics of very severe asthma. To better understand interactions between inflammation and clinical subphenotypes, we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Participants in SARP who underwent sputum induction at 3 clinical sites were included in this analysis (n = 423). Fifteen variables, including clinical characteristics and blood and sputum inflammatory cell assessments, were selected using factor analysis for unsupervised cluster analysis. Four phenotypic clusters were identified. Cluster A (n = 132) and B (n = 127) subjects had mild-to-moderate early-onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of cluster C (n = 117) and D (n = 47) subjects who had moderate-to-severe asthma with frequent health care use despite treatment with high doses of inhaled or oral corticosteroids and, in cluster D, reduced lung function. The majority of these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophil percentages were the most important variables determining cluster assignment. This multivariate approach identified 4 asthma subphenotypes representing the severity spectrum from mild-to-moderate allergic asthma with minimal or eosinophil-predominant sputum inflammation to moderate-to-severe asthma with neutrophil-predominant or mixed granulocytic inflammation. Published by Mosby, Inc.

  2. Sputum neutrophils are associated with more severe asthma phenotypes using cluster analysis

    PubMed Central

    Moore, Wendy C.; Hastie, Annette T.; Li, Xingnan; Li, Huashi; Busse, William W.; Jarjour, Nizar N.; Wenzel, Sally E.; Peters, Stephen P.; Meyers, Deborah A.; Bleecker, Eugene R.

    2013-01-01

    Background Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified five asthma subphenotypes that represent the severity spectrum of early onset allergic asthma, late onset severe asthma and severe asthma with COPD characteristics. Analysis of induced sputum from a subset of SARP subjects showed four sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophils (≥2%) and neutrophils (≥40%) had characteristics of very severe asthma. Objective To better understand interactions between inflammation and clinical subphenotypes we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Methods Participants in SARP at three clinical sites who underwent sputum induction were included in this analysis (n=423). Fifteen variables including clinical characteristics and blood and sputum inflammatory cell assessments were selected by factor analysis for unsupervised cluster analysis. Results Four phenotypic clusters were identified. Cluster A (n=132) and B (n=127) subjects had mild-moderate early onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of Cluster C (n=117) and D (n=47) subjects who had moderate-severe asthma with frequent health care utilization despite treatment with high doses of inhaled or oral corticosteroids, and in Cluster D, reduced lung function. The majority these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophils were the most important variables determining cluster assignment. Conclusion This multivariate approach identified four asthma subphenotypes representing the severity spectrum from mild-moderate allergic asthma with minimal or eosinophilic predominant sputum inflammation to moderate-severe asthma with neutrophilic predominant or mixed granulocytic inflammation. PMID:24332216

  3. cluML: A markup language for clustering and cluster validity assessment of microarray data.

    PubMed

    Bolshakova, Nadia; Cunningham, Pádraig

    2005-01-01

    cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.

  4. Accounting for multiple births in randomised trials: a systematic review.

    PubMed

    Yelland, Lisa Nicole; Sullivan, Thomas Richard; Makrides, Maria

    2015-03-01

    Multiple births are an important subgroup to consider in trials aimed at reducing preterm birth or its consequences. Including multiples results in a unique mixture of independent and clustered data, which has implications for the design, analysis and reporting of the trial. We aimed to determine how multiple births were taken into account in the design and analysis of recent trials involving preterm infants, and whether key information relevant to multiple births was reported. We conducted a systematic review of multicentre randomised trials involving preterm infants published between 2008 and 2013. Information relevant to multiple births was extracted. Of the 56 trials included in the review, 6 (11%) excluded multiples and 24 (43%) failed to indicate whether multiples were included. Among the 26 trials that reported multiples were included, only one (4%) accounted for clustering in the sample size calculations and eight (31%) took the clustering into account in the analysis of the primary outcome. Of the 20 trials that randomised infants, 12 (60%) failed to report how infants from the same birth were randomised. Information on multiple births is often poorly reported in trials involving preterm infants, and clustering due to multiple births is rarely taken into account. Since ignoring clustering could result in inappropriate recommendations for clinical practice, clustering should be taken into account in the design and analysis of future neonatal and perinatal trials including infants from a multiple birth. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  5. Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh

    2013-11-30

    Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less

  6. Somatotyping using 3D anthropometry: a cluster analysis.

    PubMed

    Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

    2013-01-01

    Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.

  7. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.

    PubMed

    Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra

    2016-11-20

    The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  8. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

    PubMed Central

    Diaz-Ordaz, Karla; Bartlett, Jonathan W

    2016-01-01

    Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885

  9. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

    PubMed

    Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

    2017-06-01

    Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.

  10. Cluster headache and the hypocretin receptor 2 reconsidered: a genetic association study and meta-analysis.

    PubMed

    Weller, Claudia M; Wilbrink, Leopoldine A; Houwing-Duistermaat, Jeanine J; Koelewijn, Stephany C; Vijfhuizen, Lisanne S; Haan, Joost; Ferrari, Michel D; Terwindt, Gisela M; van den Maagdenberg, Arn M J M; de Vries, Boukje

    2015-08-01

    Cluster headache is a severe neurological disorder with a complex genetic background. A missense single nucleotide polymorphism (rs2653349; p.Ile308Val) in the HCRTR2 gene that encodes the hypocretin receptor 2 is the only genetic factor that is reported to be associated with cluster headache in different studies. However, as there are conflicting results between studies, we re-evaluated its role in cluster headache. We performed a genetic association analysis for rs2653349 in our large Leiden University Cluster headache Analysis (LUCA) program study population. Systematic selection of the literature yielded three additional studies comprising five study populations, which were included in our meta-analysis. Data were extracted according to predefined criteria. A total of 575 cluster headache patients from our LUCA study and 874 controls were genotyped for HCRTR2 SNP rs2653349 but no significant association with cluster headache was found (odds ratio 0.91 (95% confidence intervals 0.75-1.10), p = 0.319). In contrast, the meta-analysis that included in total 1167 cluster headache cases and 1618 controls from the six study populations, which were part of four different studies, showed association of the single nucleotide polymorphism with cluster headache (random effect odds ratio 0.69 (95% confidence intervals 0.53-0.90), p = 0.006). The association became weaker, as the odds ratio increased to 0.80, when the meta-analysis was repeated without the initial single South European study with the largest effect size. Although we did not find evidence for association of rs2653349 in our LUCA study, which is the largest investigated study population thus far, our meta-analysis provides genetic evidence for a role of HCRTR2 in cluster headache. Regardless, we feel that the association should be interpreted with caution as meta-analyses with individual populations that have limited power have diminished validity. © International Headache Society 2014.

  11. Clustering of Dietary Patterns, Lifestyles, and Overweight among Spanish Children and Adolescents in the ANIBES Study

    PubMed Central

    Pérez-Rodrigo, Carmen; Gil, Ángel; González-Gross, Marcela; Ortega, Rosa M.; Serra-Majem, Lluis; Varela-Moreiras, Gregorio; Aranceta-Bartrina, Javier

    2015-01-01

    Weight gain has been associated with behaviors related to diet, sedentary lifestyle, and physical activity. We investigated dietary patterns and possible meaningful clustering of physical activity, sedentary behavior, and sleep time in Spanish children and adolescents and whether the identified clusters could be associated with overweight. Analysis was based on a subsample (n = 415) of the cross-sectional ANIBES study in Spain. We performed exploratory factor analysis and subsequent cluster analysis of dietary patterns, physical activity, sedentary behaviors, and sleep time. Logistic regression analysis was used to explore the association between the cluster solutions and overweight. Factor analysis identified four dietary patterns, one reflecting a profile closer to the traditional Mediterranean diet. Dietary patterns, physical activity behaviors, sedentary behaviors and sleep time on weekdays in Spanish children and adolescents clustered into two different groups. A low physical activity-poorer diet lifestyle pattern, which included a higher proportion of girls, and a high physical activity, low sedentary behavior, longer sleep duration, healthier diet lifestyle pattern. Although increased risk of being overweight was not significant, the Prevalence Ratios (PRs) for the low physical activity-poorer diet lifestyle pattern were >1 in children and in adolescents. The healthier lifestyle pattern included lower proportions of children and adolescents from low socioeconomic status backgrounds. PMID:26729155

  12. Cluster analysis in phenotyping a Portuguese population.

    PubMed

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  13. Transcriptome Analysis of Aspergillus flavus Reveals veA-Dependent Regulation of Secondary Metabolite Gene Clusters, Including the Novel Aflavarin Cluster

    PubMed Central

    Cary, J. W.; Han, Z.; Yin, Y.; Lohmar, J. M.; Shantappa, S.; Harris-Coward, P. Y.; Mack, B.; Ehrlich, K. C.; Wei, Q.; Arroyo-Manzanares, N.; Uka, V.; Vanhaecke, L.; Bhatnagar, D.; Yu, J.; Nierman, W. C.; Johns, M. A.; Sorensen, D.; Shen, H.; De Saeger, S.; Diana Di Mavungu, J.

    2015-01-01

    The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. PMID:26209694

  14. Classification of attempted suicide by cluster analysis: A study of 888 suicide attempters presenting to the emergency department.

    PubMed

    Kim, Hyeyoung; Kim, Bora; Kim, Se Hyun; Park, C Hyung Keun; Kim, Eun Young; Ahn, Yong Min

    2018-08-01

    It is essential to understand the latent structure of the population of suicide attempters for effective suicide prevention. The aim of this study was to identify subgroups among Korean suicide attempters in terms of the details of the suicide attempt. A total of 888 people who attempted suicide and were subsequently treated in the emergency rooms of 17 medical centers between May and November of 2013 were included in the analysis. The variables assessed included demographic characteristics, clinical information, and details of the suicide attempt assessed by the Suicide Intent Scale (SIS) and Columbia-Suicide Severity Rating Scale (C-SSRS). Cluster analysis was performed using the Ward method. Of the participants, 85.4% (n = 758) fell into a cluster characterized by less planning, low lethality methods, and ambivalence towards death ("impulsive"). The other cluster (n = 130) involved a more severe and well-planned attempt, used highly lethal methods, and took more precautions to avoid being interrupted ("planned"). The first cluster was dominated by women, while the second cluster was associated more with men, older age, and physical illness. We only included participants who visited the emergency department after their suicide attempt and had no missing values for SIS or C-SSRS. Cluster analysis extracted two distinct subgroups of Korean suicide attempters showing different patterns of suicidal behaviors. Understanding that a significant portion of suicide attempts occur impulsively calls for new prevention strategies tailored to differing subgroup profiles. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    PubMed

    Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

    2015-01-01

    To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  16. A Multivariate Analysis of Galaxy Cluster Properties

    NASA Astrophysics Data System (ADS)

    Ogle, P. M.; Djorgovski, S.

    1993-05-01

    We have assembled from the literature a data base on on 394 clusters of galaxies, with up to 16 parameters per cluster. They include optical and x-ray luminosities, x-ray temperatures, galaxy velocity dispersions, central galaxy and particle densities, optical and x-ray core radii and ellipticities, etc. In addition, derived quantities, such as the mass-to-light ratios and x-ray gas masses are included. Doubtful measurements have been identified, and deleted from the data base. Our goal is to explore the correlations between these parameters, and interpret them in the framework of our understanding of evolution of clusters and large-scale structure, such as the Gott-Rees scaling hierarchy. Among the simple, monovariate correlations we found, the most significant include those between the optical and x-ray luminosities, x-ray temperatures, cluster velocity dispersions, and central galaxy densities, in various mutual combinations. While some of these correlations have been discussed previously in the literature, generally smaller samples of objects have been used. We will also present the results of a multivariate statistical analysis of the data, including a principal component analysis (PCA). Such an approach has not been used previously for studies of cluster properties, even though it is much more powerful and complete than the simple monovariate techniques which are commonly employed. The observed correlations may lead to powerful constraints for theoretical models of formation and evolution of galaxy clusters. P.M.O. was supported by a Caltech graduate fellowship. S.D. acknowledges a partial support from the NASA contract NAS5-31348 and the NSF PYI award AST-9157412.

  17. Phylogenetic diversity in the genus Bacillus as seen by 16S rRNA sequencing studies

    NASA Technical Reports Server (NTRS)

    Rossler, D.; Ludwig, W.; Schleifer, K. H.; Lin, C.; McGill, T. J.; Wisotzkey, J. D.; Jurtshuk, P. Jr; Fox, G. E.

    1991-01-01

    Comparative sequence analysis of 16S ribosomal (r)RNAs or DNAs of Bacillus alvei, B. laterosporus, B. macerans, B. macquariensis, B. polymyxa and B. stearothermophilus revealed the phylogenetic diversity of the genus Bacillus. Based on the presently available data set of 16S rRNA sequences from bacilli and relatives at least four major "Bacillus clusters" can be defined: a "Bacillus subtilis cluster" including B. stearothermophilus, a "B. brevis cluster" including B. laterosporus, a "B. alvei cluster" including B. macerans, B. maquariensis and B. polymyxa and a "B. cycloheptanicus branch".

  18. Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.

    PubMed

    Conley, Samantha

    2017-12-01

    The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.

  19. Nursing home care quality: a cluster analysis.

    PubMed

    Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit

    2017-02-13

    Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ 2 tests and one-way between-groups ANOVA were performed to characterise the clusters ( p<0.05). Findings Two clusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.

  20. GLOBULAR CLUSTER ABUNDANCES FROM HIGH-RESOLUTION, INTEGRATED-LIGHT SPECTROSCOPY. II. EXPANDING THE METALLICITY RANGE FOR OLD CLUSTERS AND UPDATED ANALYSIS TECHNIQUES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew

    2017-01-10

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clustersmore » may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.« less

  1. Cluster analysis of obsessive-compulsive spectrum disorders in patients with obsessive-compulsive disorder: clinical and genetic correlates.

    PubMed

    Lochner, Christine; Hemmings, Sian M J; Kinnear, Craig J; Niehaus, Dana J H; Nel, Daniel G; Corfield, Valerie A; Moolman-Smook, Johanna C; Seedat, Soraya; Stein, Dan J

    2005-01-01

    Comorbidity of certain obsessive-compulsive spectrum disorders (OCSDs; such as Tourette's disorder) in obsessive-compulsive disorder (OCD) may serve to define important OCD subtypes characterized by differing phenomenology and neurobiological mechanisms. Comorbidity of the putative OCSDs in OCD has, however, not often been systematically investigated. The Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition , Axis I Disorders-Patient Version as well as a Structured Clinical Interview for Putative OCSDs (SCID-OCSD) were administered to 210 adult patients with OCD (N = 210, 102 men and 108 women; mean age, 35.7 +/- 13.3). A subset of Caucasian subjects (with OCD, n = 171; control subjects, n = 168), including subjects from the genetically homogeneous Afrikaner population (with OCD, n = 77; control subjects, n = 144), was genotyped for polymorphisms in genes involved in monoamine function. Because the items of the SCID-OCSD are binary (present/absent), a cluster analysis (Ward's method) using the items of SCID-OCSD was conducted. The association of identified clusters with demographic variables (age, gender), clinical variables (age of onset, obsessive-compulsive symptom severity and dimensions, level of insight, temperament/character, treatment response), and monoaminergic genotypes was examined. Cluster analysis of the OCSDs in our sample of patients with OCD identified 3 separate clusters at a 1.1 linkage distance level. The 3 clusters were named as follows: (1) "reward deficiency" (including trichotillomania, Tourette's disorder, pathological gambling, and hypersexual disorder), (2) "impulsivity" (including compulsive shopping, kleptomania, eating disorders, self-injury, and intermittent explosive disorder), and (3) "somatic" (including body dysmorphic disorder and hypochondriasis). Several significant associations were found between cluster scores and other variables; for example, cluster I scores were associated with earlier age of onset of OCD and the presence of tics, cluster II scores were associated with female gender and childhood emotional abuse, and cluster III scores were associated with less insight and with somatic obsessions and compulsions. However, none of these clusters were associated with any particular genetic variant. Analysis of comorbid OCSDs in OCD suggested that these lie on a number of different dimensions. These dimensions are partially consistent with previous theoretical approaches taken toward classifying OCD spectrum disorders. The lack of genetic validation of these clusters in the present study may indicate the involvement of other, as yet untested, genes. Further genetic and cluster analyses of comorbid OCSDs in OCD may ultimately contribute to a better delineation of OCD endophenotypes.

  2. Data Mining of University Philanthropic Giving: Cluster-Discriminant Analysis and Pareto Effects

    ERIC Educational Resources Information Center

    Le Blanc, Louis A.; Rucks, Conway T.

    2009-01-01

    A large sample of 33,000 university alumni records were cluster-analyzed to generate six groups relatively unique in their respective attribute values. The attributes used to cluster the former students included average gift to the university's foundation and to the alumni association for the same institution. Cluster detection is useful in this…

  3. Farm, household, and farmer characteristics associated with changes in management practices and technology adoption among dairy smallholders.

    PubMed

    Martínez-García, Carlos Galdino; Ugoretz, Sarah Janes; Arriaga-Jordán, Carlos Manuel; Wattiaux, Michel André

    2015-02-01

    This study explored whether technology adoption and changes in management practices were associated with farm structure, household, and farmer characteristics and to identify processes that may foster productivity and sustainability of small-scale dairy farming in the central highlands of Mexico. Factor analysis of survey data from 44 smallholders identified three factors-related to farm size, farmer's engagement, and household structure-that explained 70 % of cumulative variance. The subsequent hierarchical cluster analysis yielded three clusters. Cluster 1 included the most senior farmers with fewest years of education but greatest years of experience. Cluster 2 included farmers who reported access to extension, cooperative services, and more management changes. Cluster 2 obtained 25 and 35 % more milk than farmers in clusters 1 and 3, respectively. Cluster 3 included the youngest farmers, with most years of education and greatest availability of family labor. Access to a network and membership in a community of peers appeared as important contributors to success. Smallholders gravitated towards easy to implement technologies that have immediate benefits. Nonusers of high investment technologies found them unaffordable because of cost, insufficient farm size, and lack of knowledge or reliable electricity. Multivariate analysis may be a useful tool in planning extension activities and organizing channels of communication to effectively target farmers with varying needs, constraints, and motivations for change and in identifying farmers who may exemplify models of change for others who manage farms that are structurally similar but performing at a lower level.

  4. Is It Feasible to Identify Natural Clusters of TSC-Associated Neuropsychiatric Disorders (TAND)?

    PubMed

    Leclezio, Loren; Gardner-Lubbe, Sugnet; de Vries, Petrus J

    2018-04-01

    Tuberous sclerosis complex (TSC) is a genetic disorder with multisystem involvement. The lifetime prevalence of TSC-Associated Neuropsychiatric Disorders (TAND) is in the region of 90% in an apparently unique, individual pattern. This "uniqueness" poses significant challenges for diagnosis, psycho-education, and intervention planning. To date, no studies have explored whether there may be natural clusters of TAND. The purpose of this feasibility study was (1) to investigate the practicability of identifying natural TAND clusters, and (2) to identify appropriate multivariate data analysis techniques for larger-scale studies. TAND Checklist data were collected from 56 individuals with a clinical diagnosis of TSC (n = 20 from South Africa; n = 36 from Australia). Using R, the open-source statistical platform, mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Ward's method rendered six TAND clusters with good face validity and significant convergence with a six-factor exploratory factor analysis solution. The "bottom-up" data-driven strategies identified a "scholastic" cluster of TAND manifestations, an "autism spectrum disorder-like" cluster, a "dysregulated behavior" cluster, a "neuropsychological" cluster, a "hyperactive/impulsive" cluster, and a "mixed/mood" cluster. These feasibility results suggest that a combination of cluster analysis and exploratory factor analysis methods may be able to identify clinically meaningful natural TAND clusters. Findings require replication and expansion in larger dataset, and could include quantification of cluster or factor scores at an individual level. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

    PubMed Central

    Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

    2015-01-01

    Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888

  6. Stressful jobs and non-stressful jobs: a cluster analysis of office jobs.

    PubMed

    Carayon, P

    1994-02-01

    The purpose of the study was to determine if office jobs could be characterized by a small number of combinations of stressors that could be related to job-title information and self-report of psychological strain. Two-hundred-and-sixty-two office workers from three public service organizations provided data on nine job stressors and seven indicators of psychological strain. Using cluster analysis on the nine stressors, office jobs were classified into three clusters. The first cluster included jobs with high skill utilization, task clarity, job control and social support and low future ambiguity, but also high on job demands such as quantitative work-load, attention and work pressure. The second cluster included jobs with high demands and future ambiguity and low skill utilization, task clarity, job control and social support. The third cluster was intermediary between the first two clusters. The three clusters were related to job-title information. The second cluster was the highest on a range of psychological strain indicators, while the other two clusters were high on certain strain indicators but low on others. The study showed that office jobs could be characterized by a small number of combinations of stressors that were related to job-title information and psychological strain.

  7. Evaluation of Potential Locations for Siting Small Modular Reactors near Federal Energy Clusters to Support Federal Clean Energy Goals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Belles, Randy J.; Omitaomu, Olufemi A.

    2014-09-01

    Geographic information systems (GIS) technology was applied to analyze federal energy demand across the contiguous US. Several federal energy clusters were previously identified, including Hampton Roads, Virginia, which was subsequently studied in detail. This study provides an analysis of three additional diverse federal energy clusters. The analysis shows that there are potential sites in various federal energy clusters that could be evaluated further for placement of an integral pressurized-water reactor (iPWR) to support meeting federal clean energy goals.

  8. fluff: exploratory analysis and visualization of high-throughput sequencing data

    PubMed Central

    Georgiou, Georgios

    2016-01-01

    Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532

  9. Novel approach to characterising individuals with low back-related leg pain: cluster identification with latent class analysis and 12-month follow-up.

    PubMed

    Stynes, Siobhán; Konstantinou, Kika; Ogollah, Reuben; Hay, Elaine M; Dunn, Kate M

    2018-04-01

    Traditionally, low back-related leg pain (LBLP) is diagnosed clinically as referred leg pain or sciatica (nerve root involvement). However, within the spectrum of LBLP, we hypothesised that there may be other unrecognised patient subgroups. This study aimed to identify clusters of patients with LBLP using latent class analysis and describe their clinical course. The study population was 609 LBLP primary care consulters. Variables from clinical assessment were included in the latent class analysis. Characteristics of the statistically identified clusters were compared, and their clinical course over 1 year was described. A 5 cluster solution was optimal. Cluster 1 (n = 104) had mild leg pain severity and was considered to represent a referred leg pain group with no clinical signs, suggesting nerve root involvement (sciatica). Cluster 2 (n = 122), cluster 3 (n = 188), and cluster 4 (n = 69) had mild, moderate, and severe pain and disability, respectively, and response to clinical assessment items suggested categories of mild, moderate, and severe sciatica. Cluster 5 (n = 126) had high pain and disability, longer pain duration, and more comorbidities and was difficult to map to a clinical diagnosis. Most improvement for pain and disability was seen in the first 4 months for all clusters. At 12 months, the proportion of patients reporting recovery ranged from 27% for cluster 5 to 45% for cluster 2 (mild sciatica). This is the first study that empirically shows the variability in profile and clinical course of patients with LBLP including sciatica. More homogenous groups were identified, which could be considered in future clinical and research settings.

  10. Using sperm morphometry and multivariate analysis to differentiate species of gray Mazama

    PubMed Central

    Duarte, José Maurício Barbanti

    2016-01-01

    There is genetic evidence that the two species of Brazilian gray Mazama, Mazama gouazoubira and Mazama nemorivaga, belong to different genera. This study identified significant differences that separated them into distinct groups, based on characteristics of the spermatozoa and ejaculate of both species. The characteristics that most clearly differentiated between the species were ejaculate colour, white for M. gouazoubira and reddish for M. nemorivaga, and sperm head dimensions. Multivariate analysis of sperm head dimension and format data accurately discriminated three groups for species with total percentage of misclassified of 0.71. The individual analysis, by animal, and the multivariate analysis have also discriminated correctly all five animals (total percentage of misclassified of 13.95%), and the canonical plot has shown three different clusters: Cluster 1, including individuals of M. nemorivaga; Cluster 2, including two individuals of M. gouazoubira; and Cluster 3, including a single individual of M. gouazoubira. The results obtained in this work corroborate the hypothesis of the formation of new genera and species for gray Mazama. Moreover, the easily applied method described herein can be used as an auxiliary tool to identify sibling species of other taxonomic groups. PMID:28018612

  11. Cluster Analysis in Sociometric Research: A Pattern-Oriented Approach to Identifying Temporally Stable Peer Status Groups of Girls

    ERIC Educational Resources Information Center

    Zettergren, Peter

    2007-01-01

    A modern clustering technique was applied to age-10 and age-13 sociometric data with the purpose of identifying longitudinally stable peer status clusters. The study included 445 girls from a Swedish longitudinal study. The identified temporally stable clusters of rejected, popular, and average girls were essentially larger than corresponding…

  12. How Teachers Use and Manage Their Blogs? A Cluster Analysis of Teachers' Blogs in Taiwan

    ERIC Educational Resources Information Center

    Liu, Eric Zhi-Feng; Hou, Huei-Tse

    2013-01-01

    The development of Web 2.0 has ushered in a new set of web-based tools, including blogs. This study focused on how teachers use and manage their blogs. A sample of 165 teachers' blogs in Taiwan was analyzed by factor analysis, cluster analysis and qualitative content analysis. First, the teachers' blogs were analyzed according to six criteria…

  13. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    PubMed Central

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591

  14. Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

    PubMed

    Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana

    2016-08-31

    Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.

  15. Comparative genomic analysis in the fungus Fusarium for production of toxins of concern to food safety

    USDA-ARS?s Scientific Manuscript database

    SUMMARY Comparative analysis of 207 genomes representing 159 species of the fungus Fusarium detected 9403 known and putative secondary metabolite (SM) biosynthetic gene clusters. The clusters included those responsible for synthesis of mycotoxins, plant hormones and pigments, and varied in distribut...

  16. Using cluster analysis to organize and explore regional GPS velocities

    USGS Publications Warehouse

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  17. The application of cluster analysis in the intercomparison of loop structures in RNA.

    PubMed

    Huang, Hung-Chung; Nagaswamy, Uma; Fox, George E

    2005-04-01

    We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.

  18. The application of cluster analysis in the intercomparison of loop structures in RNA

    PubMed Central

    HUANG, HUNG-CHUNG; NAGASWAMY, UMA; FOX, GEORGE E.

    2005-01-01

    We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence. PMID:15769871

  19. Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

    PubMed

    Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

    2018-07-01

    Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.

    PubMed

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques

    2008-07-01

    The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.

  1. Hierarchical clustering of HPV genotype patterns in the ASCUS-LSIL triage study

    PubMed Central

    Wentzensen, Nicolas; Wilson, Lauren E.; Wheeler, Cosette M.; Carreon, Joseph D.; Gravitt, Patti E.; Schiffman, Mark; Castle, Philip E.

    2010-01-01

    Anogenital cancers are associated with about 13 carcinogenic HPV types in a broader group that cause cervical intraepithelial neoplasia (CIN). Multiple concurrent cervical HPV infections are common which complicate the attribution of HPV types to different grades of CIN. Here we report the analysis of HPV genotype patterns in the ASCUS-LSIL triage study using unsupervised hierarchical clustering. Women who underwent colposcopy at baseline (n = 2780) were grouped into 20 disease categories based on histology and cytology. Disease groups and HPV genotypes were clustered using complete linkage. Risk of 2-year cumulative CIN3+, viral load, colposcopic impression, and age were compared between disease groups and major clusters. Hierarchical clustering yielded four major disease clusters: Cluster 1 included all CIN3 histology with abnormal cytology; Cluster 2 included CIN3 histology with normal cytology and combinations with either CIN2 or high-grade squamous intraepithelial lesion (HSIL) cytology; Cluster 3 included older women with normal or low grade histology/cytology and low viral load; Cluster 4 included younger women with low grade histology/cytology, multiple infections, and the highest viral load. Three major groups of HPV genotypes were identified: Group 1 included only HPV16; Group 2 included nine carcinogenic types plus non-carcinogenic HPV53 and HPV66; and Group 3 included non-carcinogenic types plus carcinogenic HPV33 and HPV45. Clustering results suggested that colposcopy missed a prevalent precancer in many women with no biopsy/normal histology and HSIL. This result was confirmed by an elevated 2-year risk of CIN3+ in these groups. Our novel approach to study multiple genotype infections in cervical disease using unsupervised hierarchical clustering can address complex genotype distributions on a population level. PMID:20959485

  2. Factor Analysis and Counseling Research

    ERIC Educational Resources Information Center

    Weiss, David J.

    1970-01-01

    Topics discussed include factor analysis versus cluster analysis, analysis of Q correlation matrices, ipsativity and factor analysis, and tests for the significance of a correlation matrix prior to application of factor analytic techniques. Techniques for factor extraction discussed include principal components, canonical factor analysis, alpha…

  3. Calibrating the Planck cluster mass scale with CLASH

    NASA Astrophysics Data System (ADS)

    Penna-Lima, M.; Bartlett, J. G.; Rozo, E.; Melin, J.-B.; Merten, J.; Evrard, A. E.; Postman, M.; Rykoff, E.

    2017-08-01

    We determine the mass scale of Planck galaxy clusters using gravitational lensing mass measurements from the Cluster Lensing And Supernova survey with Hubble (CLASH). We have compared the lensing masses to the Planck Sunyaev-Zeldovich (SZ) mass proxy for 21 clusters in common, employing a Bayesian analysis to simultaneously fit an idealized CLASH selection function and the distribution between the measured observables and true cluster mass. We used a tiered analysis strategy to explicitly demonstrate the importance of priors on weak lensing mass accuracy. In the case of an assumed constant bias, bSZ, between true cluster mass, M500, and the Planck mass proxy, MPL, our analysis constrains 1-bSZ = 0.73 ± 0.10 when moderate priors on weak lensing accuracy are used, including a zero-mean Gaussian with standard deviation of 8% to account for possible bias in lensing mass estimations. Our analysis explicitly accounts for possible selection bias effects in this calibration sourced by the CLASH selection function. Our constraint on the cluster mass scale is consistent with recent results from the Weighing the Giants program and the Canadian Cluster Comparison Project. It is also consistent, at 1.34σ, with the value needed to reconcile the Planck SZ cluster counts with Planck's base ΛCDM model fit to the primary cosmic microwave background anisotropies.

  4. Alternatives to Multilevel Modeling for the Analysis of Clustered Data

    ERIC Educational Resources Information Center

    Huang, Francis L.

    2016-01-01

    Multilevel modeling has grown in use over the years as a way to deal with the nonindependent nature of observations found in clustered data. However, other alternatives to multilevel modeling are available that can account for observations nested within clusters, including the use of Taylor series linearization for variance estimation, the design…

  5. Ambiguity and judgments of obese individuals: no news could be bad news.

    PubMed

    Ross, Kathryn M; Shivy, Victoria A; Mazzeo, Suzanne E

    2009-08-01

    Stigmatization towards obese individuals has not decreased despite the increasing prevalence of obesity. Nonetheless, stigmatization remains difficult to study, given concerns about social desirability. To address this issue, this study used paired comparisons and cluster analysis to examine how undergraduates (n=189) categorized scenarios describing the health-related behaviors of obese individuals. The cluster analysis found that the scenarios were categorized into two distinct clusters. The first cluster included all scenarios with health behaviors indicating high responsibility for body weight. These individuals were perceived as unattractive, lazy, less likeable, less disciplined, and more deserving of their condition compared to individuals in the second cluster, which included all scenarios with health behaviors indicating low responsibility for body weight. Four scenarios depicted obese individuals with ambiguous information regarding health behaviors; three out of these four individuals were categorized in the high-responsibility cluster. These findings suggested that participants viewed these individuals as negatively as those who were responsible for their condition. These results have practical implications for reducing obesity bias, as the etiology of obesity is typically not known in real-life situations.

  6. Identification of chronic rhinosinusitis phenotypes using cluster analysis.

    PubMed

    Soler, Zachary M; Hyer, J Madison; Ramakrishnan, Viswanathan; Smith, Timothy L; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J

    2015-05-01

    Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the Sino-Nasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Medical Outcomes Study Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT, and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Clustering was largely determined by age, severity of patient reported outcome measures, depression, and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures, including polyp/atopic status, prior surgery, B-SIT and asthma, did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score, and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. © 2015 ARS-AAOA, LLC.

  7. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.

    PubMed

    Lee, Alexandra J; Chang, Ivan; Burel, Julie G; Lindestam Arlehamn, Cecilia S; Mandava, Aishwarya; Weiskopf, Daniela; Peters, Bjoern; Sette, Alessandro; Scheuermann, Richard H; Qian, Yu

    2018-04-17

    Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and the ClusterR package. For cell population identification, DAFi supports multiple options including clustering, bisecting, slope-based gating, and reversed filtering to meet various autogating needs from different scientific use cases. © 2018 International Society for Advancement of Cytometry. © 2018 International Society for Advancement of Cytometry.

  8. Societal burden of cluster headache in the United States: a descriptive economic analysis.

    PubMed

    Ford, Janet H; Nero, Damion; Kim, Gilwan; Chu, Bong Chul; Fowler, Robert; Ahl, Jonna; Martinez, James M

    2018-01-01

    To estimate direct and indirect costs in patients with a diagnosis of cluster headache in the US. Adult patients (18-64 years of age) enrolled in the Marketscan Commercial and Medicare Databases with ≥2 non-diagnostic outpatient (≥30 days apart between the two outpatient claims) or ≥1 inpatient diagnoses of cluster headache (ICD-9-CM code 339.00, 339.01, or 339.02) between January 1, 2009 and June 30, 2014, were included in the analyses. Patients had ≥6 months of continuous enrollment with medical and pharmacy coverage before and after the index date (first cluster headache diagnosis). Three outcomes were evaluated: (1) healthcare resource utilization, (2) direct healthcare costs, and (3) indirect costs associated with work days lost due to absenteeism and short-term disability. Direct costs included costs of all-cause and cluster headache-related outpatient, inpatient hospitalization, surgery, and pharmacy claims. Indirect costs were based on an average daily wage, which was estimated from the 2014 US Bureau of Labor Statistics and inflated to 2015 dollars. There were 9,328 patients with cluster headache claims included in the analysis. Cluster headache-related total direct costs (mean [standard deviation]) were $3,132 [$13,396] per patient per year (PPPY), accounting for 17.8% of the all-cause total direct cost. Cluster headache-related inpatient hospitalizations ($1,604) and pharmacy ($809) together ($2,413) contributed over 75% of the cluster headache-related direct healthcare cost. There were three sub-groups of patients with claims associated with indirect costs that included absenteeism, short-term disability, and absenteeism + short-term disability. Indirect costs PPPY were $4,928 [$4,860] for absenteeism, $803 [$2,621] for short-term disability, and $3,374 [$3,198] for absenteeism + disability. Patients with cluster headache have high healthcare costs that are associated with inpatient admissions and pharmacy fulfillments, and high indirect costs associated with absenteeism and short-term disability.

  9. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    PubMed

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  10. A novel polyketide biosynthesis gene cluster is involved in fruiting body morphogenesis in the filamentous fungi Sordaria macrospora and Neurospora crassa.

    PubMed

    Nowrousian, Minou

    2009-04-01

    During fungal fruiting body development, hyphae aggregate to form multicellular structures that protect and disperse the sexual spores. Analysis of microarray data revealed a gene cluster strongly upregulated during fruiting body development in the ascomycete Sordaria macrospora. Real time PCR analysis showed that the genes from the orthologous cluster in Neurospora crassa are also upregulated during development. The cluster encodes putative polyketide biosynthesis enzymes, including a reducing polyketide synthase. Analysis of knockout strains of a predicted dehydrogenase gene from the cluster showed that mutants in N. crassa and S. macrospora are delayed in fruiting body formation. In addition to the upregulated cluster, the N. crassa genome comprises another cluster containing a polyketide synthase gene, and five additional reducing polyketide synthase (rpks) genes that are not part of clusters. To study the role of these genes in sexual development, expression of the predicted rpks genes in S. macrospora (five genes) and N. crassa (six genes) was analyzed; all but one are upregulated during sexual development. Analysis of knockout strains for the N. crassa rpks genes showed that one of them is essential for fruiting body formation. These data indicate that polyketides produced by RPKSs are involved in sexual development in filamentous ascomycetes.

  11. I. Excluded volume effects in Ising cluster distributions and nuclear multifragmentation. II. Multiple-chance effects in alpha-particle evaporation

    NASA Astrophysics Data System (ADS)

    Breus, Dimitry Eugene

    In Part I, geometric clusters of the Ising model are studied as possible model clusters for nuclear multifragmentation. These clusters may not be considered as non-interacting (ideal gas) due to excluded volume effect which predominantly is the artifact of the cluster's finite size. Interaction significantly complicates the use of clusters in the analysis of thermodynamic systems. Stillinger's theory is used as a basis for the analysis, which within the RFL (Reiss, Frisch, Lebowitz) fluid-of-spheres approximation produces a prediction for cluster concentrations well obeyed by geometric clusters of the Ising model. If thermodynamic condition of phase coexistence is met, these concentrations can be incorporated into a differential equation procedure of moderate complexity to elucidate the liquid-vapor phase diagram of the system with cluster interaction included. The drawback of increased complexity is outweighted by the reward of greater accuracy of the phase diagram, as it is demonstrated by the Ising model. A novel nuclear-cluster analysis procedure is developed by modifying Fisher's model to contain cluster interaction and employing the differential equation procedure to obtain thermodynamic variables. With this procedure applied to geometric clusters, the guidelines are developed to look for excluded volume effect in nuclear multifragmentation. In Part II, an explanation is offered for the recently observed oscillations in the energy spectra of alpha-particles emitted from hot compound nuclei. Contrary to what was previously expected, the oscillations are assumed to be caused by the multiple-chance nature of alpha-evaporation. In a semi-empirical fashion this assumption is successfully confirmed by a technique of two-spectra decomposition which treats experimental alpha-spectra as having contributions from at least two independent emitters. Building upon the success of the multiple-chance explanation of the oscillations, Moretto's single-chance evaporation theory is augmented to include multiple-chance emission and tested on experimental data to yield positive results.

  12. Freud: a software suite for high-throughput simulation analysis

    NASA Astrophysics Data System (ADS)

    Harper, Eric; Spellings, Matthew; Anderson, Joshua; Glotzer, Sharon

    Computer simulation is an indispensable tool for the study of a wide variety of systems. As simulations scale to fill petascale and exascale supercomputing clusters, so too does the size of the data produced, as well as the difficulty in analyzing these data. We present Freud, an analysis software suite for efficient analysis of simulation data. Freud makes no assumptions about the system being analyzed, allowing for general analysis methods to be applied to nearly any type of simulation. Freud includes standard analysis methods such as the radial distribution function, as well as new methods including the potential of mean force and torque and local crystal environment analysis. Freud combines a Python interface with fast, parallel C + + analysis routines to run efficiently on laptops, workstations, and supercomputing clusters. Data analysis on clusters reduces data transfer requirements, a prohibitive cost for petascale computing. Used in conjunction with simulation software, Freud allows for smart simulations that adapt to the current state of the system, enabling the study of phenomena such as nucleation and growth, intelligent investigation of phases and phase transitions, and determination of effective pair potentials.

  13. Cluster analysis of accelerated molecular dynamics simulations: A case study of the decahedron to icosahedron transition in Pt nanoparticles.

    PubMed

    Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F; Perez, Danny

    2017-10-21

    Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.

  14. Cluster analysis of accelerated molecular dynamics simulations: A case study of the decahedron to icosahedron transition in Pt nanoparticles

    NASA Astrophysics Data System (ADS)

    Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F.; Perez, Danny

    2017-10-01

    Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.

  15. Cluster analysis based on dimensional information with applications to feature selection and classification

    NASA Technical Reports Server (NTRS)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  16. The association between mood state and chronobiological characteristics in bipolar I disorder: a naturalistic, variable cluster analysis-based study.

    PubMed

    Gonzalez, Robert; Suppes, Trisha; Zeitzer, Jamie; McClung, Colleen; Tamminga, Carol; Tohen, Mauricio; Forero, Angelica; Dwivedi, Alok; Alvarado, Andres

    2018-02-19

    Multiple types of chronobiological disturbances have been reported in bipolar disorder, including characteristics associated with general activity levels, sleep, and rhythmicity. Previous studies have focused on examining the individual relationships between affective state and chronobiological characteristics. The aim of this study was to conduct a variable cluster analysis in order to ascertain how mood states are associated with chronobiological traits in bipolar I disorder (BDI). We hypothesized that manic symptomatology would be associated with disturbances of rhythm. Variable cluster analysis identified five chronobiological clusters in 105 BDI subjects. Cluster 1, comprising subjective sleep quality was associated with both mania and depression. Cluster 2, which comprised variables describing the degree of rhythmicity, was associated with mania. Significant associations between mood state and cluster analysis-identified chronobiological variables were noted. Disturbances of mood were associated with subjectively assessed sleep disturbances as opposed to objectively determined, actigraphy-based sleep variables. No associations with general activity variables were noted. Relationships between gender and medication classes in use and cluster analysis-identified chronobiological characteristics were noted. Exploratory analyses noted that medication class had a larger impact on these relationships than the number of psychiatric medications in use. In a BDI sample, variable cluster analysis was able to group related chronobiological variables. The results support our primary hypothesis that mood state, particularly mania, is associated with chronobiological disturbances. Further research is required in order to define these relationships and to determine the directionality of the associations between mood state and chronobiological characteristics.

  17. Exploring the Relationship between Autism Spectrum Disorder and Epilepsy Using Latent Class Cluster Analysis

    ERIC Educational Resources Information Center

    Cuccaro, Michael L.; Tuchman, Roberto F.; Hamilton, Kara L.; Wright, Harry H.; Abramson, Ruth K.; Haines, Jonathan L.; Gilbert, John R.; Pericak-Vance, Margaret

    2012-01-01

    Epilepsy co-occurs frequently in autism spectrum disorders (ASD). Understanding this co-occurrence requires a better understanding of the ASD-epilepsy phenotype (or phenotypes). To address this, we conducted latent class cluster analysis (LCCA) on an ASD dataset (N = 577) which included 64 individuals with epilepsy. We identified a 5-cluster…

  18. Corrections for Cluster-Plot Slop

    Treesearch

    Harry T. Valentine; Mark J. Ducey; Jeffery H. Gove; Adrian Lanz; David L.R. Affleck

    2006-01-01

    Cluster-plot designs, including the design used by the Forest Inventory and Analysis program of the USDA Forest Service (FIA), are attended by a complicated boundary slopover problem. Slopover occurs where inclusion zones of objects of interest cross the boundary of the area of interest. The dispersed nature of inclusion zones that arise from the use of cluster plots...

  19. Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…

  20. Comparison of disease clusters in two elderly populations hospitalized in 2008 and 2010.

    PubMed

    Marengoni, A; Nobili, A; Pirali, C; Tettamanti, M; Pasina, L; Salerno, F; Corrao, S; Iorio, A; Marcucci, M; Franchi, C; Mannucci, P M

    2013-01-01

    As chronicity represents one of the major challenges in the healthcare of aging populations, the understanding of how chronic diseases distribute and co-occur in this part of the population is needed. The aims of this study were to evaluate and compare patterns of diseases identified with cluster analysis in two samples of hospitalized elderly. Data were obtained from the multicenter 'Registry Politerapie SIMI (REPOSI)' that included people aged 65 or older hospitalized in internal medicine and geriatric wards in Italy during 2008 and 2010. The study sample from the first wave included 1,411 subjects enrolled in 38 hospitals wards, whereas the second wave included 1,380 subjects in 66 wards located in different regions of Italy. To analyze patterns of multimorbidity, a cluster analysis was performed including the same diseases (19 chronic conditions with a prevalence >5%) collected at hospital discharge during the two waves of the registry. Eight clusters of diseases were identified in the first wave of the REPOSI registry and six in the second wave. Several diseases were included in similar clusters in the two waves, such as malignancy and liver cirrhosis; anemia, gastric and intestinal diseases; diabetes and coronary heart disease; chronic obstructive pulmonary disease and prostate hypertrophy. These findings strengthened the idea of an association other than by chance of diseases in the elderly population. Copyright © 2013 S. Karger AG, Basel.

  1. Cluster analysis of Southeastern U.S. climate stations

    NASA Astrophysics Data System (ADS)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  2. Suicide in the oldest old: an observational study and cluster analysis.

    PubMed

    Sinyor, Mark; Tan, Lynnette Pei Lin; Schaffer, Ayal; Gallagher, Damien; Shulman, Kenneth

    2016-01-01

    The older population are at a high risk for suicide. This study sought to learn more about the characteristics of suicide in the oldest-old and to use a cluster analysis to determine if oldest-old suicide victims assort into clinically meaningful subgroups. Data were collected from a coroner's chart review of suicide victims in Toronto from 1998 to 2011. We compared two age groups (65-79 year olds, n = 335, and 80+ year olds, n = 191) and then conducted a hierarchical agglomerative cluster analysis using Ward's method to identify distinct clusters in the 80+ group. The younger and older age groups differed according to marital status, living circumstances and pattern of stressors. The cluster analysis identified three distinct clusters in the 80+ group. Cluster 1 was the largest (n = 124) and included people who were either married or widowed who had significantly more depression and somewhat more medical health stressors. In contrast, cluster 2 (n = 50) comprised people who were almost all single and living alone with significantly less identified depression and slightly fewer medical health stressors. All members of cluster 3 (n = 17) lived in a retirement residence or nursing home, and this group had the highest rates of depression, dementia, other mental illness and past suicide attempts. This is the first study to use the cluster analysis technique to identify meaningful subgroups among suicide victims in the oldest-old. The results reveal different patterns of suicide in the older population that may be relevant for clinical care. Copyright © 2015 John Wiley & Sons, Ltd.

  3. An investigation about the structures, thermodynamics and kinetics of the formic acid involved molecular clusters

    NASA Astrophysics Data System (ADS)

    Zhang, Rui; Jiang, Shuai; Liu, Yi-Rong; Wen, Hui; Feng, Ya-Juan; Huang, Teng; Huang, Wei

    2018-05-01

    Despite the very important role of atmospheric aerosol nucleation in climate change and air quality, the detailed aerosol nucleation mechanism is still unclear. Here we investigated the formic acid (FA) involved multicomponent nucleation molecular clusters including sulfuric acid (SA), dimethylamine (DMA) and water (W) through a quantum chemical method. The thermodynamics and kinetics analysis was based on the global minima given by Basin-Hopping (BH) algorithm coupled with Density Functional Theory (DFT) and subsequent benchmarked calculations. Then the interaction analysis based on ElectroStatic Potential (ESP), Topological and Atomic Charges analysis was made to characterize the binding features of the clusters. The results show that FA binds weakly with the other molecules in the cluster while W binds more weakly. Further kinetic analysis about the time evolution of the clusters show that even though the formic acid's weak interaction with other nucleation precursors, its effect on sulfuric acid dimer steady state concentration cannot be neglected due to its high concentration in the atmosphere.

  4. Extended phenotype and clinical subgroups in unilateral Meniere disease: A cross-sectional study with cluster analysis.

    PubMed

    Frejo, L; Martin-Sanz, E; Teggi, R; Trinidad, G; Soto-Varela, A; Santos-Perez, S; Manrique, R; Perez, N; Aran, I; Almeida-Branco, M S; Batuecas-Caletrio, A; Fraile, J; Espinosa-Sanchez, J M; Perez-Guillen, V; Perez-Garrigues, H; Oliva-Dominguez, M; Aleman, O; Benitez, J; Perez, P; Lopez-Escamez, J A

    2017-12-01

    To define clinical subgroups by cluster analysis in patients with unilateral Meniere disease (MD) and to compare them with the clinical subgroups found in bilateral MD. A cross-sectional study with a two-step cluster analysis. A tertiary referral multicenter study. Nine hundred and eighty-eight adult patients with unilateral MD. best predictors to define clinical subgroups with potential different aetiologies. We established five clusters in unilateral MD. Group 1 is the most frequently found, includes 53% of patients, and it is defined as the sporadic, classic MD without migraine and without autoimmune disorder (AD). Group 2 is found in 8% of patients, and it is defined by hearing loss, which antedates the vertigo episodes by months or years (delayed MD), without migraine or AD in most of cases. Group 3 involves 13% of patients, and it is considered familial MD, while group 4, which includes 15% of patients, is linked to the presence of migraine in all cases. Group 5 is found in 11% of patients and is defined by a comorbid AD. We found significant differences in the distribution of AD in clusters 3, 4 and 5 between patients with uni- and bilateral MD. Cluster analysis defines clinical subgroups in MD, and it extends the phenotype beyond audiovestibular symptoms. This classification will help to improve the phenotyping in MD and facilitate the selection of patients for randomised clinical trials. © 2017 John Wiley & Sons Ltd.

  5. Phylogenomic and MALDI-TOF MS Analysis of Streptococcus sinensis HKU4T Reveals a Distinct Phylogenetic Clade in the Genus Streptococcus

    PubMed Central

    Tse, Herman; Chen, Jonathan H.K.; Tang, Ying; Lau, Susanna K.P.; Woo, Patrick C.Y.

    2014-01-01

    Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the “sanguinis group.” As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the “mitis group.” On the basis of the findings, we propose a novel group, named “sinensis group,” to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. PMID:25331233

  6. Phylogenomic and MALDI-TOF MS analysis of Streptococcus sinensis HKU4T reveals a distinct phylogenetic clade in the genus Streptococcus.

    PubMed

    Teng, Jade L L; Huang, Yi; Tse, Herman; Chen, Jonathan H K; Tang, Ying; Lau, Susanna K P; Woo, Patrick C Y

    2014-10-20

    Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the "sanguinis group." As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the "mitis group." On the basis of the findings, we propose a novel group, named "sinensis group," to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. A New Classification of Diabetic Gait Pattern Based on Cluster Analysis of Biomechanical Data

    PubMed Central

    Sawacha, Zimi; Guarneri, Gabriella; Avogaro, Angelo; Cobelli, Claudio

    2010-01-01

    Background The diabetic foot, one of the most serious complications of diabetes mellitus and a major risk factor for plantar ulceration, is determined mainly by peripheral neuropathy. Neuropathic patients exhibit decreased stability while standing as well as during dynamic conditions. A new methodology for diabetic gait pattern classification based on cluster analysis has been proposed that aims to identify groups of subjects with similar patterns of gait and verify if three-dimensional gait data are able to distinguish diabetic gait patterns from one of the control subjects. Method The gait of 20 nondiabetic individuals and 46 diabetes patients with and without peripheral neuropathy was analyzed [mean age 59.0 (2.9) and 61.1(4.4) years, mean body mass index (BMI) 24.0 (2.8), and 26.3 (2.0)]. K-means cluster analysis was applied to classify the subjects' gait patterns through the analysis of their ground reaction forces, joints and segments (trunk, hip, knee, ankle) angles, and moments. Results Cluster analysis classification led to definition of four well-separated clusters: one aggregating just neuropathic subjects, one aggregating both neuropathics and non-neuropathics, one including only diabetes patients, and one including either controls or diabetic and neuropathic subjects. Conclusions Cluster analysis was useful in grouping subjects with similar gait patterns and provided evidence that there were subgroups that might otherwise not be observed if a group ensemble was presented for any specific variable. In particular, we observed the presence of neuropathic subjects with a gait similar to the controls and diabetes patients with a long disease duration with a gait as altered as the neuropathic one. PMID:20920432

  8. Cluster Subcutaneous Allergen Specific Immunotherapy for the Treatment of Allergic Rhinitis: A Systematic Review and Meta-Analysis

    PubMed Central

    Sun, Yueqi; Luo, Xi; Li, Huabin

    2014-01-01

    Background Although allergen specific immunotherapy (SIT) represents the only immune- modifying and curative option available for patients with allergic rhinitis (AR), the optimal schedule for specific subcutaneous immunotherapy (SCIT) is still unknown. The objective of this study is to systematically assess the efficacy and safety of cluster SCIT for patients with AR. Methods By searching PubMed, EMBASE and the Cochrane clinical trials database from 1980 through May 10th, 2013, we collected and analyzed the randomized controlled trials (RCTs) of cluster SCIT to assess its efficacy and safety. Results Eight trials involving 567 participants were included in this systematic review. Our meta-analysis showed that cluster SCIT have similar effect in reduction of both rhinitis symptoms and the requirement for anti-allergic medication compared with conventional SCIT, but when comparing cluster SCIT with placebo, no statistic significance were found in reduction of symptom scores or medication scores. Some caution is required in this interpretation as there was significant heterogeneity between studies. Data relating to Rhinoconjunctivitis Quality of Life Questionnaire (RQLQ) in 3 included studies were analyzed, which consistently point to the efficacy of cluster SCIT in improving quality of life compared to placebo. To assess the safety of cluster SCIT, meta-analysis showed that no differences existed in the incidence of either local adverse reaction or systemic adverse reaction between the cluster group and control group. Conclusion Based on the current limited evidence, we still could not conclude affirmatively that cluster SCIT was a safe and efficacious option for the treatment of AR patients. Further large-scale, well-designed RCTs on this topic are still needed. PMID:24489740

  9. Open-Source Sequence Clustering Methods Improve the State Of the Art.

    PubMed

    Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

    2016-01-01

    Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).

  10. Scoring clustering solutions by their biological relevance.

    PubMed

    Gat-Viks, I; Sharan, R; Shamir, R

    2003-12-12

    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.

  11. Demographic but not geographic insularity in HIV transmission among young black MSM.

    PubMed

    Oster, Alexandra M; Pieniazek, Danuta; Zhang, Xinjian; Switzer, William M; Ziebell, Rebecca A; Mena, Leandro A; Wei, Xierong; Johnson, Kendra L; Singh, Sonita K; Thomas, Peter E; Elmore, Kimberlee A; Heffelfinger, James D

    2011-11-13

    To understand patterns of HIV transmission among young black MSM and others in Mississippi. Phylogenetic analysis of HIV-1 polymerase (pol) sequences from 799 antiretroviral-naive persons newly diagnosed with HIV infection in Mississippi during 2005-2008, 130 (16%) of whom were black MSM aged 16-25 years. We identified phylogenetic clusters and used surveillance data to evaluate demographic attributes and risk factors of all persons in clusters that included black MSM aged 16-25 years. We identified 82 phylogenetic clusters, 21 (26%) of which included HIV strains from at least one young black MSM. Of the 69 persons in these clusters, 59 were black MSM and seven were black men with unknown transmission category; the remaining three were MSM of white or Hispanic race/ethnicity. Of these 21 clusters, 10 included residents of one geographic region of Mississippi, whereas 11 included residents of multiple regions or outside of the state. Phylogenetic clusters involving HIV-infected young black MSM were homogeneous with respect to demographic and risk characteristics, suggesting insularity of this population with respect to HIV transmission, but were geographically heterogeneous. Reducing HIV transmission among young black MSM in Mississippi may require prevention strategies that are tailored to young black MSM and those in their sexual networks, and prevention interventions should be delivered in a manner to reach young black MSM throughout the state. Phylogenetic analysis can be a tool for local jurisdictions to understand the transmission dynamics in their areas.

  12. Cluster analysis for determining distribution center location

    NASA Astrophysics Data System (ADS)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  13. Cluster analysis and subgrouping to investigate inter-individual variability to non-invasive brain stimulation: a systematic review.

    PubMed

    Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour

    2018-01-12

    Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.

  14. Salient concerns in using analgesia for cancer pain among outpatients: A cluster analysis study.

    PubMed

    Meghani, Salimah H; Knafl, George J

    2017-02-10

    To identify unique clusters of patients based on their concerns in using analgesia for cancer pain and predictors of the cluster membership. This was a 3-mo prospective observational study ( n = 207). Patients were included if they were adults (≥ 18 years), diagnosed with solid tumors or multiple myelomas, and had at least one prescription of around-the-clock pain medication for cancer or cancer-treatment-related pain. Patients were recruited from two outpatient medical oncology clinics within a large health system in Philadelphia. A choice-based conjoint (CBC) analysis experiment was used to elicit analgesic treatment preferences (utilities). Patients employed trade-offs based on five analgesic attributes (percent relief from analgesics, type of analgesic, type of side-effects, severity of side-effects, out of pocket cost). Patients were clustered based on CBC utilities using novel adaptive statistical methods. Multiple logistic regression was used to identify predictors of cluster membership. The analyses found 4 unique clusters: Most patients made trade-offs based on the expectation of pain relief (cluster 1, 41%). For a subset, the main underlying concern was type of analgesic prescribed, i.e ., opioid vs non-opioid (cluster 2, 11%) and type of analgesic side effects (cluster 4, 21%), respectively. About one in four made trade-offs based on multiple concerns simultaneously including pain relief, type of side effects, and severity of side effects (cluster 3, 28%). In multivariable analysis, to identify predictors of cluster membership, clinical and socioeconomic factors (education, health literacy, income, social support) rather than analgesic attitudes and beliefs were found important; only the belief, i.e ., pain medications can mask changes in health or keep you from knowing what is going on in your body was found significant in predicting two of the four clusters [cluster 1 (-); cluster 4 (+)]. Most patients appear to be driven by a single salient concern in using analgesia for cancer pain. Addressing these concerns, perhaps through real time clinical assessments, may improve patients' analgesic adherence patterns and cancer pain outcomes.

  15. Predicting the points of interaction of small molecules in the NF-κB pathway

    PubMed Central

    2011-01-01

    Background The similarity property principle has been used extensively in drug discovery to identify small compounds that interact with specific drug targets. Here we show it can be applied to identify the interactions of small molecules within the NF-κB signalling pathway. Results Clusters that contain compounds with a predominant interaction within the pathway were created, which were then used to predict the interaction of compounds not included in the clustering analysis. Conclusions The technique successfully predicted the points of interactions of compounds that are known to interact with the NF-κB pathway. The method was also shown to be successful when compounds for which the interaction points were unknown were included in the clustering analysis. PMID:21342508

  16. Space-Time Cluster Analysis to Detect Innovative Clinical Practices: A Case Study of Aripiprazole in the Department of Veterans Affairs.

    PubMed

    Penfold, Robert B; Burgess, James F; Lee, Austin F; Li, Mingfei; Miller, Christopher J; Nealon Seibert, Marjorie; Semla, Todd P; Mohr, David C; Kazis, Lewis E; Bauer, Mark S

    2018-02-01

    To identify space-time clusters of changes in prescribing aripiprazole for bipolar disorder among providers in the VA. VA administrative data from 2002 to 2010 were used to identify prescriptions of aripiprazole for bipolar disorder. Prescriber characteristics were obtained using the Personnel and Accounting Integrated Database. We conducted a retrospective space-time cluster analysis using the space-time permutation statistic. All VA service users with a diagnosis of bipolar disorder were included in the patient population. Individuals with any schizophrenia spectrum diagnoses were excluded. We also identified all clinicians who wrote a prescription for any bipolar disorder medication. The study population included 32,630 prescribers. Of these, 8,643 wrote qualifying prescriptions. We identified three clusters of aripiprazole prescribing centered in Massachusetts, Ohio, and the Pacific Northwest. Clusters were associated with prescribing by VA-employed (vs. contracted) prescribers. Nurses with prescribing privileges were more likely to make a prescription for aripiprazole in cluster locations compared with psychiatrists. Primary care physicians were less likely. Early prescribing of aripiprazole for bipolar disorder clustered geographically and was associated with prescriber subgroups. These methods support prospective surveillance of practice changes and identification of associated health system characteristics. © Health Research and Educational Trust.

  17. The `TTIME' Package: Performance Evaluation in a Cluster Computing Environment

    NASA Astrophysics Data System (ADS)

    Howe, Marico; Berleant, Daniel; Everett, Albert

    2011-06-01

    The objective of translating developmental event time across mammalian species is to gain an understanding of the timing of human developmental events based on known time of those events in animals. The potential benefits include improvements to diagnostic and intervention capabilities. The CRAN `ttime' package provides the functionality to infer unknown event timings and investigate phylogenetic proximity utilizing hierarchical clustering of both known and predicted event timings. The original generic mammalian model included nine eutherian mammals: Felis domestica (cat), Mustela putorius furo (ferret), Mesocricetus auratus (hamster), Macaca mulatta (monkey), Homo sapiens (humans), Mus musculus (mouse), Oryctolagus cuniculus (rabbit), Rattus norvegicus (rat), and Acomys cahirinus (spiny mouse). However, the data for this model is expected to grow as more data about developmental events is identified and incorporated into the analysis. Performance evaluation of the `ttime' package across a cluster computing environment versus a comparative analysis in a serial computing environment provides an important computational performance assessment. A theoretical analysis is the first stage of a process in which the second stage, if justified by the theoretical analysis, is to investigate an actual implementation of the `ttime' package in a cluster computing environment and to understand the parallelization process that underlies implementation.

  18. Gathering Real World Evidence with Cluster Analysis for Clinical Decision Support.

    PubMed

    Xia, Eryu; Liu, Haifeng; Li, Jing; Mei, Jing; Li, Xuejun; Xu, Enliang; Li, Xiang; Hu, Gang; Xie, Guotong; Xu, Meilin

    2017-01-01

    Clinical decision support systems are information technology systems that assist clinical decision-making tasks, which have been shown to enhance clinical performance. Cluster analysis, which groups similar patients together, aims to separate patient cases into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. Useful as it is, the application of cluster analysis in clinical decision support systems is less reported. Here, we describe the usage of cluster analysis in clinical decision support systems, by first dividing patient cases into similar groups and then providing diagnosis or treatment suggestions based on the group profiles. This integration provides data for clinical decisions and compiles a wide range of clinical practices to inform the performance of individual clinicians. We also include an example usage of the system under the scenario of blood lipid management in type 2 diabetes. These efforts represent a step toward promoting patient-centered care and enabling precision medicine.

  19. Substructures in DAFT/FADA survey clusters based on XMM and optical data

    NASA Astrophysics Data System (ADS)

    Durret, F.; DAFT/FADA Team

    2014-07-01

    The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.

  20. Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

    NASA Astrophysics Data System (ADS)

    Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

    2018-04-01

    Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.

  1. Cluster analysis and prediction of treatment outcomes for chronic rhinosinusitis.

    PubMed

    Soler, Zachary M; Hyer, J Madison; Rudmik, Luke; Ramakrishnan, Viswanathan; Smith, Timothy L; Schlosser, Rodney J

    2016-04-01

    Current clinical classifications of chronic rhinosinusitis (CRS) have weak prognostic utility regarding treatment outcomes. Simplified discriminant analysis based on unsupervised clustering has identified novel phenotypic subgroups of CRS, but prognostic utility is unknown. We sought to determine whether discriminant analysis allows prognostication in patients choosing surgery versus continued medical management. A multi-institutional prospective study of patients with CRS in whom initial medical therapy failed who then self-selected continued medical management or surgical treatment was used to separate patients into 5 clusters based on a previously described discriminant analysis using total Sino-Nasal Outcome Test-22 (SNOT-22) score, age, and missed productivity. Patients completed the SNOT-22 at baseline and for 18 months of follow-up. Baseline demographic and objective measures included olfactory testing, computed tomography, and endoscopy scoring. SNOT-22 outcomes for surgical versus continued medical treatment were compared across clusters. Data were available on 690 patients. Baseline differences in demographics, comorbidities, objective disease measures, and patient-reported outcomes were similar to previous clustering reports. Three of 5 clusters identified by means of discriminant analysis had improved SNOT-22 outcomes with surgical intervention when compared with continued medical management (surgery was a mean of 21.2 points better across these 3 clusters at 6 months, P < .05). These differences were sustained at 18 months of follow-up. Two of 5 clusters had similar outcomes when comparing surgery with continued medical management. A simplified discriminant analysis based on 3 common clinical variables is able to cluster patients and provide prognostic information regarding surgical treatment versus continued medical management in patients with CRS. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  2. Functional clustering of time series gene expression data by Granger causality

    PubMed Central

    2012-01-01

    Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425

  3. Analysis of Tropical Cyclone Tracks in the North Indian Ocean

    NASA Astrophysics Data System (ADS)

    Patwardhan, A.; Paliwal, M.; Mohapatra, M.

    2011-12-01

    Cyclones are regarded as one of the most dangerous meteorological phenomena of the tropical region. The probability of landfall of a tropical cyclone depends on its movement (trajectory). Analysis of trajectories of tropical cyclones could be useful for identifying potentially predictable characteristics. There is long history of analysis of tropical cyclones tracks. A common approach is using different clustering techniques to group the cyclone tracks on the basis of certain characteristics. Various clustering method have been used to study the tropical cyclones in different ocean basins like western North Pacific ocean (Elsner and Liu, 2003; Camargo et al., 2007), North Atlantic Ocean (Elsner, 2003; Gaffney et al. 2007; Nakamura et al., 2009). In this study, tropical cyclone tracks in the North Indian Ocean basin, for the period 1961-2010 have been analyzed and grouped into clusters based on their spatial characteristics. A tropical cyclone trajectory is approximated as an open curve and described by its first two moments. The resulting clusters have different centroid locations and also differently shaped variance ellipses. These track characteristics are then used in the standard clustering algorithms which allow the whole track shape, length, and location to be incorporated into the clustering methodology. The resulting clusters have different genesis locations and trajectory shapes. We have also examined characteristics such as life span, maximum sustained wind speed, landfall, seasonality, many of which are significantly different across the identified clusters. The clustering approach groups cyclones with higher maximum wind speed and longest life span in to one cluster. Another cluster includes short duration cyclonic events that are mostly deep depressions and significant for rainfall over Eastern and Central India. The clustering approach is likely to prove useful for analysis of events of significance with regard to impacts.

  4. Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.

    PubMed

    Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T

    2010-01-01

    Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.

  5. Nearest clusters based partial least squares discriminant analysis for the classification of spectral data.

    PubMed

    Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar

    2018-06-07

    Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Determining the Optimal Number of Clusters with the Clustergram

    NASA Technical Reports Server (NTRS)

    Fluegemann, Joseph K.; Davies, Misty D.; Aguirre, Nathan D.

    2011-01-01

    Cluster analysis aids research in many different fields, from business to biology to aerospace. It consists of using statistical techniques to group objects in large sets of data into meaningful classes. However, this process of ordering data points presents much uncertainty because it involves several steps, many of which are subject to researcher judgment as well as inconsistencies depending on the specific data type and research goals. These steps include the method used to cluster the data, the variables on which the cluster analysis will be operating, the number of resulting clusters, and parts of the interpretation process. In most cases, the number of clusters must be guessed or estimated before employing the clustering method. Many remedies have been proposed, but none is unassailable and certainly not for all data types. Thus, the aim of current research for better techniques of determining the number of clusters is generally confined to demonstrating that the new technique excels other methods in performance for several disparate data types. Our research makes use of a new cluster-number-determination technique based on the clustergram: a graph that shows how the number of objects in the cluster and the cluster mean (the ordinate) change with the number of clusters (the abscissa). We use the features of the clustergram to make the best determination of the cluster-number.

  7. Characterizing the course of back pain after osteoporotic vertebral fracture: a hierarchical cluster analysis of a prospective cohort study.

    PubMed

    Toyoda, Hiromitsu; Takahashi, Shinji; Hoshino, Masatoshi; Takayama, Kazushi; Iseki, Kazumichi; Sasaoka, Ryuichi; Tsujio, Tadao; Yasuda, Hiroyuki; Sasaki, Takeharu; Kanematsu, Fumiaki; Kono, Hiroshi; Nakamura, Hiroaki

    2017-09-23

    This study demonstrated four distinct patterns in the course of back pain after osteoporotic vertebral fracture (OVF). Greater angular instability in the first 6 months after the baseline was one factor affecting back pain after OVF. Understanding the natural course of symptomatic acute OVF is important in deciding the optimal treatment strategy. We used latent class analysis to classify the course of back pain after OVF and identify the risk factors associated with persistent pain. This multicenter cohort study included 218 consecutive patients with ≤ 2-week-old OVFs who were enrolled at 11 institutions. Dynamic x-rays and back pain assessment with a visual analog scale (VAS) were obtained at enrollment and at 1-, 3-, and 6-month follow-ups. The VAS scores were used to characterize patient groups, using hierarchical cluster analysis. VAS for 128 patients was used for hierarchical cluster analysis. Analysis yielded four clusters representing different patterns of back pain progression. Cluster 1 patients (50.8%) had stable, mild pain. Cluster 2 patients (21.1%) started with moderate pain and progressed quickly to very low pain. Patients in cluster 3 (10.9%) had moderate pain that initially improved but worsened after 3 months. Cluster 4 patients (17.2%) had persistent severe pain. Patients in cluster 4 showed significant high baseline pain intensity, higher degree of angular instability, and higher number of previous OVFs, and tended to lack regular exercise. In contrast, patients in cluster 2 had significantly lower baseline VAS and less angular instability. We identified four distinct groups of OVF patients with different patterns of back pain progression. Understanding the course of back pain after OVF may help in its management and contribute to future treatment trials.

  8. Subphenotypes of mild-to-moderate COPD by factor and cluster analysis of pulmonary function, CT imaging and breathomics in a population-based survey.

    PubMed

    Fens, Niki; van Rossum, Annelot G J; Zanen, Pieter; van Ginneken, Bram; van Klaveren, Rob J; Zwinderman, Aeilko H; Sterk, Peter J

    2013-06-01

    Classification of COPD is currently based on the presence and severity of airways obstruction. However, this may not fully reflect the phenotypic heterogeneity of COPD in the (ex-) smoking community. We hypothesized that factor analysis followed by cluster analysis of functional, clinical, radiological and exhaled breath metabolomic features identifies subphenotypes of COPD in a community-based population of heavy (ex-) smokers. Adults between 50-75 years with a smoking history of at least 15 pack-years derived from a random population-based survey as part of the NELSON study underwent detailed assessment of pulmonary function, chest CT scanning, questionnaires and exhaled breath molecular profiling using an electronic nose. Factor and cluster analyses were performed on the subgroup of subjects fulfilling the GOLD criteria for COPD (post-BD FEV1/FVC < 0.70). Three hundred subjects were recruited, of which 157 fulfilled the criteria for COPD and were included in the factor and cluster analysis. Four clusters were identified: cluster 1 (n = 35; 22%): mild COPD, limited symptoms and good quality of life. Cluster 2 (n = 48; 31%): low lung function, combined emphysema and chronic bronchitis and a distinct breath molecular profile. Cluster 3 (n = 60; 38%): emphysema predominant COPD with preserved lung function. Cluster 4 (n = 14; 9%): highly symptomatic COPD with mildly impaired lung function. In a leave-one-out validation analysis an accuracy of 97.4% was reached. This unbiased taxonomy for mild to moderate COPD reinforces clusters found in previous studies and thereby allows better phenotyping of COPD in the general (ex-) smoking population.

  9. Cluster Adjusted Regression for Displaced Subject Data (CARDS): Marginal Inference under Potentially Informative Temporal Cluster Size Profiles

    PubMed Central

    Bible, Joe; Beck, James D.; Datta, Somnath

    2016-01-01

    Summary Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of inference drawn on observed data. Much work has been done in order to address the analysis of clustered data with informative cluster size; examples include Inverse Probability Weighting (IPW), Cluster Weighted Generalized Estimating Equations (CWGEE), and Doubly Weighted Generalized Estimating Equations (DWGEE). When cluster size changes with time, i.e., the data set possess temporally varying cluster sizes (TVCS), these methods may produce biased inference for the underlying marginal distribution of interest. We propose a new marginalization that may be appropriate for addressing clustered longitudinal data with TVCS. The principal motivation for our present work is to analyze the periodontal data collected by Beck et al. (1997, Journal of Periodontal Research 6, 497–505). Longitudinal periodontal data often exhibits both ICS and TVCS as the number of teeth possessed by participants at the onset of study is not constant and teeth as well as individuals may be displaced throughout the study. PMID:26682911

  10. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    PubMed

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.

  11. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda†

    PubMed Central

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-01-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882

  12. Testing the Archivas Cluster (Arc) for Ozone Monitoring Instrument (OMI) Scientific Data Storage

    NASA Technical Reports Server (NTRS)

    Tilmes, Curt

    2005-01-01

    The Ozone Monitoring Instrument (OMI) launched on NASA's Aura Spacecraft, the third of the major platforms of the EOS program on July 15,2004. In addition to the long term archive and distribution of the data from OM1 through the Goddard Earth Science Distributed Active Archive Center (GESDAAC), we are evaluating other archive mechanisms that can archive the data in a more immediately available method where it can be used for futher data production and analysis. In 2004, Archivas, Inc. was selected by NASA s Small Business Innovative Research (SBIR) program for the development of their Archivas Cluster (ArC) product. Arc is an online disk based system utilizing self-management and automation on a Linux cluster. Its goal is to produce a low cost solution coupled with the ease of management. The OM1 project is an application partner of the SBIR program, and has deployed a small cluster (5TB) based on the beta Archwas software. We performed extensive testing of the unit using production OM1 data since launch. In 2005, Archivas, Inc. was funded in SBIR Phase II for further development, which will include testing scalability with the deployment of a larger (35TB) cluster at Goddard. We plan to include Arc in the OM1 Team Leader Computing Facility (TLCF) hosting OM1 data for direct access and analysis by the OMI Science Team. This presentation will include a brief technical description of the Archivas Cluster, a summary of the SBIR Phase I beta testing results, and an overview of the OMI ground data processing architecture including its interaction with the Phase II Archivas Cluster and hosting of OMI data for the scientists.

  13. Combining self-organizing mapping and supervised affinity propagation clustering approach to investigate functional brain networks involved in motor imagery and execution with fMRI measurements.

    PubMed

    Zhang, Jiang; Liu, Qi; Chen, Huafu; Yuan, Zhen; Huang, Jin; Deng, Lihua; Lu, Fengmei; Zhang, Junpeng; Wang, Yuqing; Wang, Mingwen; Chen, Liangyin

    2015-01-01

    Clustering analysis methods have been widely applied to identifying the functional brain networks of a multitask paradigm. However, the previously used clustering analysis techniques are computationally expensive and thus impractical for clinical applications. In this study a novel method, called SOM-SAPC that combines self-organizing mapping (SOM) and supervised affinity propagation clustering (SAPC), is proposed and implemented to identify the motor execution (ME) and motor imagery (MI) networks. In SOM-SAPC, SOM was first performed to process fMRI data and SAPC is further utilized for clustering the patterns of functional networks. As a result, SOM-SAPC is able to significantly reduce the computational cost for brain network analysis. Simulation and clinical tests involving ME and MI were conducted based on SOM-SAPC, and the analysis results indicated that functional brain networks were clearly identified with different response patterns and reduced computational cost. In particular, three activation clusters were clearly revealed, which include parts of the visual, ME and MI functional networks. These findings validated that SOM-SAPC is an effective and robust method to analyze the fMRI data with multitasks.

  14. Functional Interference Clusters in Cancer Patients With Bone Metastases: A Secondary Analysis of RTOG 9714

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chow, Edward, E-mail: Edward.Chow@sunnybrook.c; James, Jennifer; Barsevick, Andrea

    Purpose: To explore the relationships (clusters) among the functional interference items in the Brief Pain Inventory (BPI) in patients with bone metastases. Methods: Patients enrolled in the Radiation Therapy Oncology Group (RTOG) 9714 bone metastases study were eligible. Patients were assessed at baseline and 4, 8, and 12 weeks after randomization for the palliative radiotherapy with the BPI, which consists of seven functional items: general activity, mood, walking ability, normal work, relations with others, sleep, and enjoyment of life. Principal component analysis with varimax rotation was used to determine the clusters between the functional items at baseline and the follow-up.more » Cronbach's alpha was used to determine the consistency and reliability of each cluster at baseline and follow-up. Results: There were 448 male and 461 female patients, with a median age of 67 years. There were two functional interference clusters at baseline, which accounted for 71% of the total variance. The first cluster (physical interference) included normal work and walking ability, which accounted for 58% of the total variance. The second cluster (psychosocial interference) included relations with others and sleep, which accounted for 13% of the total variance. The Cronbach's alpha statistics were 0.83 and 0.80, respectively. The functional clusters changed at week 12 in responders but persisted through week 12 in nonresponders. Conclusion: Palliative radiotherapy is effective in reducing bone pain. Functional interference component clusters exist in patients treated for bone metastases. These clusters changed over time in this study, possibly attributable to treatment. Further research is needed to examine these effects.« less

  15. Deconstructing Bipolar Disorder and Schizophrenia: A cross-diagnostic cluster analysis of cognitive phenotypes.

    PubMed

    Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F

    2017-02-01

    Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. HICOSMO - cosmology with a complete sample of galaxy clusters - I. Data analysis, sample selection and luminosity-mass scaling relation

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T. H.

    2017-08-01

    The X-ray regime, where the most massive visible component of galaxy clusters, the intracluster medium, is visible, offers directly measured quantities, like the luminosity, and derived quantities, like the total mass, to characterize these objects. The aim of this project is to analyse a complete sample of galaxy clusters in detail and constrain cosmological parameters, like the matter density, Ωm, or the amplitude of initial density fluctuations, σ8. The purely X-ray flux-limited sample (HIFLUGCS) consists of the 64 X-ray brightest galaxy clusters, which are excellent targets to study the systematic effects, that can bias results. We analysed in total 196 Chandra observations of the 64 HIFLUGCS clusters, with a total exposure time of 7.7 Ms. Here, we present our data analysis procedure (including an automated substructure detection and an energy band optimization for surface brightness profile analysis) that gives individually determined, robust total mass estimates. These masses are tested against dynamical and Planck Sunyaev-Zeldovich (SZ) derived masses of the same clusters, where good overall agreement is found with the dynamical masses. The Planck SZ masses seem to show a mass-dependent bias to our hydrostatic masses; possible biases in this mass-mass comparison are discussed including the Planck selection function. Furthermore, we show the results for the (0.1-2.4) keV luminosity versus mass scaling relation. The overall slope of the sample (1.34) is in agreement with expectations and values from literature. Splitting the sample into galaxy groups and clusters reveals, even after a selection bias correction, that galaxy groups exhibit a significantly steeper slope (1.88) compared to clusters (1.06).

  17. Phenotypes of asthma in low-income children and adolescents: cluster analysis.

    PubMed

    Cabral, Anna Lucia Barros; Sousa, Andrey Wirgues; Mendes, Felipe Augusto Rodrigues; Carvalho, Celso Ricardo Fernandes de

    2017-01-01

    Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. We included 306 children and adolescents (6-18 years of age) with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Three clusters were defined for our population. Cluster 1 (n = 94) included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87) included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108) included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability. Estudos que caracterizam fenótipos de asma predominantemente incluem adultos ou foram realizados em crianças e adolescentes de países desenvolvidos; portanto, sua aplicabilidade em outras populações, tais como as de países em desenvolvimento, permanece indeterminada. Nosso objetivo foi determinar como crianças e adolescentes asmáticas de baixa renda no Brasil são distribuídos através de uma análise de clusters. Foram incluídos 306 crianças e adolescentes (6-18 anos de idade) com diagnóstico clínico de asma e sob tratamento médico por pelo menos um ano de acompanhamento. No momento da inclusão, todos os pacientes estavam clinicamente estáveis. Vinte variáveis comumente determinadas na prática clínica e consideradas importantes na definição dos fenótipos de asma foram selecionadas para a análise de clusters. As variáveis com alta multicolinearidade foram excluídas. Uma análise de clusters foi realizada utilizando-se um teste aglomerativo em duas etapas e log-likelihood distance measure. Três clusters foram definidos para nossa população. O cluster 1 (n = 94) incluiu indivíduos com função pulmonar normal, inflamação eosinofílica leve, poucas exacerbações, início mais tardio da asma e atopia leve. O cluster 2 (n = 87) incluiu pacientes com função pulmonar normal, número moderado de exacerbações, início precoce da asma, inflamação eosinofílica mais grave e atopia moderada. O cluster 3 (n = 108) incluiu pacientes com função pulmonar ruim, exacerbações frequentes, inflamação eosinofílica e atopia graves. A asma foi caracterizada por presença de atopia, número de exacerbações e função pulmonar em crianças e adolescentes de baixa renda no Brasil. As muitas semelhanças entre esta e outras análises de clusters de fenótipos indicam que essa abordagem apresenta boa generalização.

  18. Phylogenetic relationship of Ornithobacterium rhinotracheale strains.

    PubMed

    DE Oca-Jimenez, Roberto Montes; Vega-Sanchez, Vicente; Morales-Erasto, Vladimir; Salgado-Miranda, Celene; Blackall, Patrick J; Soriano-Vargas, Edgardo

    2018-04-10

    The bacterium Ornithobacterium rhinotracheale is associated with respiratory disease in wild birds and poultry. In this study, the phylogenetic analysis of nine reference strains of O. rhinotracheale belonging to serovars A to I, and eight Mexican isolates belonging to serovar A, was performed. The analysis was extended to include available sequences from another 23 strains available in the public domain. The analysis showed that the 40 sequences formed six clusters, I to VI. All eight Mexican field isolates were placed in cluster I. One of the reference strains appears to present genetic diversity not previously recognized and was placed in a new genetic cluster. In conclusion, the phylogenetic analysis of O. rhinotracheale strains, based on the 16S rRNA gene, is a suitable tool for epidemiologic studies.

  19. Severe or life-threatening asthma exacerbation: patient heterogeneity identified by cluster analysis.

    PubMed

    Sekiya, K; Nakatani, E; Fukutomi, Y; Kaneda, H; Iikura, M; Yoshida, M; Takahashi, K; Tomii, K; Nishikawa, M; Kaneko, N; Sugino, Y; Shinkai, M; Ueda, T; Tanikawa, Y; Shirai, T; Hirabayashi, M; Aoki, T; Kato, T; Iizuka, K; Homma, S; Taniguchi, M; Tanaka, H

    2016-08-01

    Severe or life-threatening asthma exacerbation is one of the worst outcomes of asthma because of the risk of death. To date, few studies have explored the potential heterogeneity of this condition. To examine the clinical characteristics and heterogeneity of patients with severe or life-threatening asthma exacerbation. This was a multicentre, prospective study of patients with severe or life-threatening asthma exacerbation and pulse oxygen saturation < 90% who were admitted to 17 institutions across Japan. Cluster analysis was performed using variables from patient- and physician-orientated structured questionnaires. Analysis of data from 175 patients with severe or life-threatening asthma exacerbation revealed five distinct clusters. Cluster 1 (n = 27) was younger-onset asthma with severe symptoms at baseline, including limitation of activities, a higher frequency of treatment with oral corticosteroids and short-acting beta-agonists, and a higher frequency of asthma hospitalizations in the past year. Cluster 2 (n = 35) was predominantly composed of elderly females, with the highest frequency of comorbid, chronic hyperplastic rhinosinusitis/nasal polyposis, and a long disease duration. Cluster 3 (n = 40) was allergic asthma without inhaled corticosteroid use at baseline. Patients in this cluster had a higher frequency of atopy, including allergic rhinitis and furred pet hypersensitivity, and a better prognosis during hospitalization compared with the other clusters. Cluster 4 (n = 34) was characterized by elderly males with concomitant chronic obstructive pulmonary disease (COPD). Although cluster 5 (n = 39) had very mild symptoms at baseline according to the patient questionnaires, 41% had previously been hospitalized for asthma. This study demonstrated that significant heterogeneity exists among patients with severe or life-threatening asthma exacerbation. Differences were observed in the severity of asthma symptoms and use of inhaled corticosteroids at baseline, and the presence of comorbid COPD. These findings may contribute to a deeper understanding and better management of this patient population. © 2016 The Authors. Clinical & Experimental Allergy Published by John Wiley & Sons Ltd.

  20. Comparative genomic analysis of secondary metabolite biosynthetic gene clusters in 207 isolates of Fusarium

    USDA-ARS?s Scientific Manuscript database

    Fusarium species are known for their ability to produce secondary metabolites (SMs), including plant hormones, pigments, mycotoxins, and other compounds with potential agricultural, pharmaceutical, and biotechnological impact. Understanding the distribution of SM biosynthetic gene clusters across th...

  1. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

    DOE PAGES

    Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken; ...

    2016-11-29

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less

  2. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less

  3. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  4. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  5. Cerebral and non-cerebral coenurosis: on the genotypic and phenotypic diversity of Taenia multiceps.

    PubMed

    Christodoulopoulos, Georgios; Dinkel, Anke; Romig, Thomas; Ebi, Dennis; Mackenstedt, Ute; Loos-Frank, Brigitte

    2016-12-01

    We characterised the causative agents of cerebral and non-cerebral coenurosis in livestock by determining the mitochondrial genotypes and morphological phenotypes of 52 Taenia multiceps isolates from a wide geographical range in Europe, Africa, and western Asia. Three studies were conducted: (1) a morphological comparison of the rostellar hooks of cerebral and non-cerebral cysts of sheep and goats, (2) a morphological comparison of adult worms experimentally produced in dogs, and (3) a molecular analysis of three partial mitochondrial genes (nad1, cox1, and 12S rRNA) of the same isolates. No significant morphological or genetic differences were associated with the species of the intermediate host. Adult parasites originating from cerebral and non-cerebral cysts differed morphologically, e.g. the shape of the small hooks and the distribution of the testes in the mature proglottids. The phylogenetic analysis of the mitochondrial haplotypes produced three distinct clusters: one cluster including both cerebral isolates from Greece and non-cerebral isolates from tropical and subtropical countries, and two clusters including cerebral isolates from Greece. The majority of the non-cerebral specimens clustered together but did not form a monophyletic group. No monophyletic groups were observed based on geography, although specimens from the same region tended to cluster. The clustering indicates high intraspecific diversity. The phylogenetic analysis suggests that all variants of T. multiceps can cause cerebral coenurosis in sheep (which may be the ancestral phenotype), and some variants, predominantly from one genetic cluster, acquired the additional capacity to produce non-cerebral forms in goats and more rarely in sheep.

  6. Phylodynamic Analysis Reveals CRF01_AE Dissemination between Japan and Neighboring Asian Countries and the Role of Intravenous Drug Use in Transmission

    PubMed Central

    Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2014-01-01

    Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900

  7. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    PubMed Central

    Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta

    2014-01-01

    Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499

  8. Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma.

    PubMed

    Schatz, Michael; Hsu, Jin-Wen Y; Zeiger, Robert S; Chen, Wansu; Dorenbaum, Alejandro; Chipps, Bradley E; Haselkorn, Tmirah

    2014-06-01

    Asthma phenotyping can facilitate understanding of disease pathogenesis and potential targeted therapies. To further characterize the distinguishing features of phenotypic groups in difficult-to-treat asthma. Children ages 6-11 years (n = 518) and adolescents and adults ages ≥12 years (n = 3612) with severe or difficult-to-treat asthma from The Epidemiology and Natural History of Asthma: Outcomes and Treatment Regimens (TENOR) study were evaluated in this post hoc cluster analysis. Analyzed variables included sex, race, atopy, age of asthma onset, smoking (adolescents and adults), passive smoke exposure (children), obesity, and aspirin sensitivity. Cluster analysis used the hierarchical clustering algorithm with the Ward minimum variance method. The results were compared among clusters by χ(2) analysis; variables with significant (P < .05) differences among clusters were considered as distinguishing feature candidates. Associations among clusters and asthma-related health outcomes were assessed in multivariable analyses by adjusting for socioeconomic status, environmental exposures, and intensity of therapy. Five clusters were identified in each age stratum. Sex, atopic status, and nonwhite race were distinguishing variables in both strata; passive smoke exposure was distinguishing in children and aspirin sensitivity in adolescents and adults. Clusters were not related to outcomes in children, but 2 adult and adolescent clusters distinguished by nonwhite race and aspirin sensitivity manifested poorer quality of life (P < .0001), and the aspirin-sensitive cluster experienced more frequent asthma exacerbations (P < .0001). Distinct phenotypes appear to exist in patients with severe or difficult-to-treat asthma, which is related to outcomes in adolescents and adults but not in children. The study of the therapeutic implications of these phenotypes is warranted. Copyright © 2013 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights reserved.

  9. Gene cluster conservation provides insight into cercosporin biosynthesis and extends production to the genus Colletotrichum.

    PubMed

    de Jonge, Ronnie; Ebert, Malaika K; Huitt-Roehl, Callie R; Pal, Paramita; Suttle, Jeffrey C; Spanner, Rebecca E; Neubauer, Jonathan D; Jurick, Wayne M; Stott, Karina A; Secor, Gary A; Thomma, Bart P H J; Van de Peer, Yves; Townsend, Craig A; Bolton, Melvin D

    2018-06-12

    Species in the genus Cercospora cause economically devastating diseases in sugar beet, maize, rice, soy bean, and other major food crops. Here, we sequenced the genome of the sugar beet pathogen Cercospora beticola and found it encodes 63 putative secondary metabolite gene clusters, including the cercosporin toxin biosynthesis ( CTB ) cluster. We show that the CTB gene cluster has experienced multiple duplications and horizontal transfers across a spectrum of plant pathogenic fungi, including the wide-host range Colletotrichum genus as well as the rice pathogen Magnaporthe oryzae Although cercosporin biosynthesis has been thought to rely on an eight-gene CTB cluster, our phylogenomic analysis revealed gene collinearity adjacent to the established cluster in all CTB cluster-harboring species. We demonstrate that the CTB cluster is larger than previously recognized and includes cercosporin facilitator protein, previously shown to be involved with cercosporin autoresistance, and four additional genes required for cercosporin biosynthesis, including the final pathway enzymes that install the unusual cercosporin methylenedioxy bridge. Lastly, we demonstrate production of cercosporin by Colletotrichum fioriniae , the first known cercosporin producer within this agriculturally important genus. Thus, our results provide insight into the intricate evolution and biology of a toxin critical to agriculture and broaden the production of cercosporin to another fungal genus containing many plant pathogens of important crops worldwide. Copyright © 2018 the Author(s). Published by PNAS.

  10. Behavioral Health Risk Profiles of Undergraduate University Students in England, Wales, and Northern Ireland: A Cluster Analysis.

    PubMed

    El Ansari, Walid; Ssewanyana, Derrick; Stock, Christiane

    2018-01-01

    Limited research has explored clustering of lifestyle behavioral risk factors (BRFs) among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance. We assessed (BRFs), namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities. A two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious) with very high health awareness/consciousness, good nutrition, and physical activity (PA), and relatively low alcohol, tobacco, and other drug (ATOD) use. Cluster 2 (the abstinent) had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious) included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking) showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1), students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4) reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students. Our results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.

  11. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    PubMed

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Analysis of LAC Observations of Clusters of Galaxies and Supernova Remnants

    NASA Technical Reports Server (NTRS)

    Hughes, J.

    1996-01-01

    The following publications are included and serve as the final report: The X-ray Spectrum of Abell 665; Clusters of Galaxies; Ginga Observation of an Oxygen-rich Supernova Remnant; Ginga Observations of the Coma Cluster and Studies of the Spatial Distribution of Iron; A Measurement of the Hubble Constant from the X-ray Properties and the Sunyaev-Zel'dovich Effect of Abell 2218; Non-polytropic Model for the Coma Cluster; and Abundance Gradients in Cooling Flow Clusters: Ginga LAC (Large Area Counter) and Einstein SSS (Solid State Spectrometer) Spectra of A496, A1795, A2142, and A2199.

  13. Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.

    PubMed

    Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan

    2017-01-01

    Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.

  14. Towards a Net Zero Building Cluster Energy Systems Analysis for US Army Installations

    DTIC Science & Technology

    2011-05-01

    depending on the alternative chosen. Since the proposed energy efficiency work includes the implementation of DOAS and high efficiency dehumidification ...cluster Net Zero fossil fuel energy. The recommended, integrated energy solution demonstrates that vastly improved energy efficiency and greenhouse gas

  15. Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.

    PubMed

    Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D; Ober, Carole; Nicolae, Dan L; Barnes, Kathleen C; London, Stephanie J; Gilliland, Frank; Weiss, Scott T; Raby, Benjamin A; Cohn, Lauren; Chupp, Geoffrey L

    2015-05-15

    The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10(-6)) and hospitalization (P = 0.01), respectively. There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma.

  16. Noninvasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma

    PubMed Central

    Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F.; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D.; Ober, Carole; Nicolae, Dan L.; Barnes, Kathleen C.; London, Stephanie J.; Gilliland, Frank; Weiss, Scott T.; Raby, Benjamin A.; Cohn, Lauren

    2015-01-01

    Rationale: The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. Objectives: We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Methods: Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Measurements and Main Results: Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10−6) and hospitalization (P = 0.01), respectively. Conclusions: There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma. PMID:25763605

  17. Strong-lensing analysis of A2744 with MUSE and Hubble Frontier Fields images

    NASA Astrophysics Data System (ADS)

    Mahler, G.; Richard, J.; Clément, B.; Lagattuta, D.; Schmidt, K.; Patrício, V.; Soucail, G.; Bacon, R.; Pello, R.; Bouwens, R.; Maseda, M.; Martinez, J.; Carollo, M.; Inami, H.; Leclercq, F.; Wisotzki, L.

    2018-01-01

    We present an analysis of Multi Unit Spectroscopic Explorer (MUSE) observations obtained on the massive Frontier Fields (FFs) cluster A2744. This new data set covers the entire multiply imaged region around the cluster core. The combined catalogue consists of 514 spectroscopic redshifts (with 414 new identifications). We use this redshift information to perform a strong-lensing analysis revising multiple images previously found in the deep FF images, and add three new MUSE-detected multiply imaged systems with no obvious Hubble Space Telescope counterpart. The combined strong-lensing constraints include a total of 60 systems producing 188 images altogether, out of which 29 systems and 83 images are spectroscopically confirmed, making A2744 one of the most well-constrained clusters to date. Thanks to the large amount of spectroscopic redshifts, we model the influence of substructures at larger radii, using a parametrization including two cluster-scale components in the cluster core and several group scale in the outskirts. The resulting model accurately reproduces all the spectroscopic multiple systems, reaching an rms of 0.67 arcsec in the image plane. The large number of MUSE spectroscopic redshifts gives us a robust model, which we estimate reduces the systematic uncertainty on the 2D mass distribution by up to ∼2.5 times the statistical uncertainty in the cluster core. In addition, from a combination of the parametrization and the set of constraints, we estimate the relative systematic uncertainty to be up to 9 per cent at 200 kpc.

  18. Optical spectroscopy and velocity dispersions of galaxy clusters from the SPT-SZ survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruel, J.; Bayliss, M.; Bazin, G.

    2014-09-01

    We present optical spectroscopy of galaxies in clusters detected through the Sunyaev-Zel'dovich (SZ) effect with the South Pole Telescope (SPT). We report our own measurements of 61 spectroscopic cluster redshifts, and 48 velocity dispersions each calculated with more than 15 member galaxies. This catalog also includes 19 dispersions of SPT-observed clusters previously reported in the literature. The majority of the clusters in this paper are SPT-discovered; of these, most have been previously reported in other SPT cluster catalogs, and five are reported here as SPT discoveries for the first time. By performing a resampling analysis of galaxy velocities, we findmore » that unbiased velocity dispersions can be obtained from a relatively small number of member galaxies (≲ 30), but with increased systematic scatter. We use this analysis to determine statistical confidence intervals that include the effect of membership selection. We fit scaling relations between the observed cluster velocity dispersions and mass estimates from SZ and X-ray observables. In both cases, the results are consistent with the scaling relation between velocity dispersion and mass expected from dark-matter simulations. We measure a ∼30% log-normal scatter in dispersion at fixed mass, and a ∼10% offset in the normalization of the dispersion-mass relation when compared to the expectation from simulations, which is within the expected level of systematic uncertainty.« less

  19. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    NASA Astrophysics Data System (ADS)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  20. Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.

    PubMed

    Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban

    2017-05-01

    Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.

  1. Personalized Medicine in Veterans with Traumatic Brain Injuries

    DTIC Science & Technology

    2013-05-01

    Pair-Group Method using Arithmetic averages ( UPGMA ) based on cosine correlation of row mean centered log2 signal values; this was the top 50%-tile...cluster- ing was performed by the UPGMA method using Cosine correlation as the similarity metric. For comparative purposes, clustered heat maps included...non-mTBI cases were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as the similarity

  2. Analysis of correlated mutations in HIV-1 protease using spectral clustering.

    PubMed

    Liu, Ying; Eyal, Eran; Bahar, Ivet

    2008-05-15

    The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.

  3. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species.

    PubMed

    Wang, Yi; Coleman-Derr, Devin; Chen, Guoping; Gu, Yong Q

    2015-07-01

    Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis.

    PubMed

    Turner, Elizabeth L; Prague, Melanie; Gallis, John A; Li, Fan; Murray, David M

    2017-07-01

    In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have updated that review with developments in analysis of the past 13 years, with a companion article to focus on developments in design. We discuss developments in the topics of the earlier review (e.g., methods for parallel-arm GRTs, individually randomized group-treatment trials, and missing data) and in new topics, including methods to account for multiple-level clustering and alternative estimation methods (e.g., augmented generalized estimating equations, targeted maximum likelihood, and quadratic inference functions). In addition, we describe developments in analysis of alternative group designs (including stepped-wedge GRTs, network-randomized trials, and pseudocluster randomized trials), which require clustering to be accounted for in their design and analysis.

  5. Cluster analysis of the national weight control registry to identify distinct subgroups maintaining successful weight loss.

    PubMed

    Ogden, Lorraine G; Stroebele, Nanette; Wyatt, Holly R; Catenacci, Victoria A; Peters, John C; Stuht, Jennifer; Wing, Rena R; Hill, James O

    2012-10-01

    The National Weight Control Registry (NWCR) is the largest ongoing study of individuals successful at maintaining weight loss; the registry enrolls individuals maintaining a weight loss of at least 13.6 kg (30 lb) for a minimum of 1 year. The current report uses multivariate latent class cluster analysis to identify unique clusters of individuals within the NWCR that have distinct experiences, strategies, and attitudes with respect to weight loss and weight loss maintenance. The cluster analysis considers weight and health history, weight control behaviors and strategies, effort and satisfaction with maintaining weight, and psychological and demographic characteristics. The analysis includes 2,228 participants enrolled between 1998 and 2002. Cluster 1 (50.5%) represents a weight-stable, healthy, exercise conscious group who are very satisfied with their current weight. Cluster 2 (26.9%) has continuously struggled with weight since childhood; they rely on the greatest number of resources and strategies to lose and maintain weight, and report higher levels of stress and depression. Cluster 3 (12.7%) represents a group successful at weight reduction on the first attempt; they were least likely to be overweight as children, are maintaining the longest duration of weight loss, and report the least difficulty maintaining weight. Cluster 4 (9.9%) represents a group less likely to use exercise to control weight; they tend to be older, eat fewer meals, and report more health problems. Further exploration of the unique characteristics of these clusters could be useful for tailoring future weight loss and weight maintenance programs to the specific characteristics of an individual.

  6. Clinical phenotypes and survival of pre-capillary pulmonary hypertension in systemic sclerosis.

    PubMed

    Launay, David; Montani, David; Hassoun, Paul M; Cottin, Vincent; Le Pavec, Jérôme; Clerson, Pierre; Sitbon, Olivier; Jaïs, Xavier; Savale, Laurent; Weatherald, Jason; Sobanski, Vincent; Mathai, Stephen C; Shafiq, Majid; Cordier, Jean-François; Hachulla, Eric; Simonneau, Gérald; Humbert, Marc

    2018-01-01

    Pre-capillary pulmonary hypertension (PH) in systemic sclerosis (SSc) is a heterogeneous condition with an overall bad prognosis. The objective of this study was to identify and characterize homogeneous phenotypes by a cluster analysis in SSc patients with PH. Patients were identified from two prospective cohorts from the US and France. Clinical, pulmonary function, high-resolution chest tomography, hemodynamic and survival data were extracted. We performed cluster analysis using the k-means method and compared survival between clusters using Cox regression analysis. Cluster analysis of 200 patients identified four homogenous phenotypes. Cluster C1 included patients with mild to moderate risk pulmonary arterial hypertension (PAH) with limited or no interstitial lung disease (ILD) and low DLCO with a 3-year survival of 81.5% (95% CI: 71.4-88.2). C2 had pre-capillary PH due to extensive ILD and worse 3-year survival compared to C1 (adjusted hazard ratio [HR] 3.14; 95% CI 1.66-5.94; p = 0.0004). C3 had severe PAH and a trend towards worse survival (HR 2.53; 95% CI 0.99-6.49; p = 0.052). Cluster C4 and C1 were similar with no difference in survival (HR 0.65; 95% CI 0.19-2.27, p = 0.507) but with a higher DLCO in C4. PH in SSc can be characterized into distinct clusters that differ in prognosis.

  7. Improving clustering with metabolic pathway data.

    PubMed

    Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando

    2014-04-10

    It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.

  8. Business and Marketing Cluster. Task Analyses.

    ERIC Educational Resources Information Center

    Henrico County Public Schools, Glen Allen, VA. Virginia Vocational Curriculum and Resource Center.

    Developed in Virginia, this publication contains task analysis guides to support selected tech prep programs that prepare students for careers in the business and marketing cluster. Guides are included for accounting systems, legal systems administration, office systems technology, and retail marketing. Each task analyses guide has the following…

  9. Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hu, Lin; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu; Hammond, Karl D.

    We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes ofmore » helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.« less

  10. Applications of modern statistical methods to analysis of data in physical science

    NASA Astrophysics Data System (ADS)

    Wicker, James Eric

    Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.

  11. Who are the healthy active seniors? A cluster analysis.

    PubMed

    Lai, Claudia K Y; Chan, Engle Angela; Chin, Kenny C W

    2014-12-01

    This paper reports a cluster analysis of a sample recruited from a randomized controlled trial that explored the effect of using a life story work approach to improve the psychological outcomes of older people in the community. 238 subjects from community centers were included in this analysis. After statistical testing, 169 seniors were assigned to the active ageing (AG) cluster and 69 to the inactive ageing (IG) cluster. Those in the AG were younger and healthier, with fewer chronic diseases and fewer depressive symptoms than those in the IG. They were more satisfied with their lives, and had higher self-esteem. They met with their family members more frequently, they engaged in more leisure activities and were more likely to have the ability to move freely. In summary, active ageing was observed in people with better health and functional performance. Our results echoed the limited findings reported in the literature.

  12. Generating a Magellanic star cluster catalog with ASteCA

    NASA Astrophysics Data System (ADS)

    Perren, G. I.; Piatti, A. E.; Vázquez, R. A.

    2016-08-01

    An increasing number of software tools have been employed in the recent years for the automated or semi-automated processing of astronomical data. The main advantages of using these tools over a standard by-eye analysis include: speed (particularly for large databases), homogeneity, reproducibility, and precision. At the same time, they enable a statistically correct study of the uncertainties associated with the analysis, in contrast with manually set errors, or the still widespread practice of simply not assigning errors. We present a catalog comprising 210 star clusters located in the Large and Small Magellanic Clouds, observed with Washington photometry. Their fundamental parameters were estimated through an homogeneous, automatized and completely unassisted process, via the Automated Stellar Cluster Analysis package ( ASteCA). Our results are compared with two types of studies on these clusters: one where the photometry is the same, and another where the photometric system is different than that employed by ASteCA.

  13. Which modifiable health risk behaviours are related? A systematic review of the clustering of Smoking, Nutrition, Alcohol and Physical activity ('SNAP') health risk factors.

    PubMed

    Noble, Natasha; Paul, Christine; Turon, Heidi; Oldmeadow, Christopher

    2015-12-01

    There is a growing body of literature examining the clustering of health risk behaviours, but little consensus about which risk factors can be expected to cluster for which sub groups of people. This systematic review aimed to examine the international literature on the clustering of smoking, poor nutrition, excess alcohol and physical inactivity (SNAP) health behaviours among adults, including associated socio-demographic variables. A literature search was conducted in May 2014. Studies examining at least two SNAP risk factors, and using a cluster or factor analysis technique, or comparing observed to expected prevalence of risk factor combinations, were included. Fifty-six relevant studies were identified. A majority of studies (81%) reported a 'healthy' cluster characterised by the absence of any SNAP risk factors. More than half of the studies reported a clustering of alcohol with smoking, and half reported clustering of all four SNAP risk factors. The methodological quality of included studies was generally weak to moderate. Males and those with greater social disadvantage showed riskier patterns of behaviours; younger age was less clearly associated with riskier behaviours. Clustering patterns reported here reinforce the need for health promotion interventions to target multiple behaviours, and for such efforts to be specifically designed and accessible for males and those who are socially disadvantaged. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.

    PubMed

    Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J

    2017-12-01

    Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the limitations, trade-offs, and considerations for the sensitivities of variables and the biological interpretations of results. The Cluster app is available as a free download for Apple computers at iTunes, with a link to a user guide website.

  15. Influence of birth cohort on age of onset cluster analysis in bipolar I disorder.

    PubMed

    Bauer, M; Glenn, T; Alda, M; Andreassen, O A; Angelopoulos, E; Ardau, R; Baethge, C; Bauer, R; Bellivier, F; Belmaker, R H; Berk, M; Bjella, T D; Bossini, L; Bersudsky, Y; Cheung, E Y W; Conell, J; Del Zompo, M; Dodd, S; Etain, B; Fagiolini, A; Frye, M A; Fountoulakis, K N; Garneau-Fournier, J; Gonzalez-Pinto, A; Harima, H; Hassel, S; Henry, C; Iacovides, A; Isometsä, E T; Kapczinski, F; Kliwicki, S; König, B; Krogh, R; Kunz, M; Lafer, B; Larsen, E R; Lewitzka, U; Lopez-Jaramillo, C; MacQueen, G; Manchia, M; Marsh, W; Martinez-Cengotitabengoa, M; Melle, I; Monteith, S; Morken, G; Munoz, R; Nery, F G; O'Donovan, C; Osher, Y; Pfennig, A; Quiroz, D; Ramesar, R; Rasgon, N; Reif, A; Ritter, P; Rybakowski, J K; Sagduyu, K; Scippa, A M; Severus, E; Simhandl, C; Stein, D J; Strejilevich, S; Hatim Sulaiman, A; Suominen, K; Tagata, H; Tatebayashi, Y; Torrent, C; Vieta, E; Viswanath, B; Wanchoo, M J; Zetin, M; Whybrow, P C

    2015-01-01

    Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset, using a large, international database. The database includes 4037 patients with a diagnosis of bipolar I disorder, previously collected at 36 collection sites in 23 countries. Generalized estimating equations (GEE) were used to adjust the data for country median age, and in some models, birth cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After adjusting for the birth cohort or when considering only those born after 1959, two subgroups were found. With results of either two or three subgroups, the youngest subgroup was more likely to have a family history of mood disorders and a first episode with depressed polarity. However, without adjusting for birth cohort (three subgroups), family history and polarity of the first episode could not be distinguished between the middle and oldest subgroups. These results using international data confirm prior findings using single country data, that there are subgroups of bipolar I disorder based on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more useful for research. Copyright © 2014 Elsevier Masson SAS. All rights reserved.

  16. Cluster Analysis of Longidorus Species (Nematoda: Longidoridae), a New Approach in Species Identification

    PubMed Central

    Ye, Weimin; Robbins, R. T.

    2004-01-01

    Hierarchical cluster analysis based on female morphometric character means including body length, distance from vulva opening to anterior end, head width, odontostyle length, esophagus length, body width, tail length, and tail width were used to examine the morphometric relationships and create dendrograms for (i) 62 populations belonging to 9 Longidorus species from Arkansas, (ii) 137 published Longidorus species, and (iii) 137 published Longidorus species plus 86 populations of 16 Longidorus species from Arkansas and various other locations by using JMP 4.02 software (SAS Institute, Cary, NC). Cluster analysis dendograms visually illustrated the grouping and morphometric relationships of the species and populations. It provided a computerized statistical approach to assist by helping to identify and distinguish species, by indicating morphometric relationships among species, and by assisting with new species diagnosis. The preliminary species identification can be accomplished by running cluster analysis for unknown species together with the data matrix of known published Longidorus species. PMID:19262809

  17. Country clustering applied to the water & sanitation sector: a new tool with potential applications in research & policy

    PubMed Central

    Onda, Kyle; Crocker, Jonny; Kayser, Georgia Lyn; Bartram, Jamie

    2013-01-01

    The fields of global health and international development commonly cluster countries by geography and income to target resources and describe progress. For any given sector of interest, a range of relevant indicators can serve as a more appropriate basis for classification. We create a new typology of country clusters specific to the water and sanitation (WatSan) sector based on similarities across multiple WatSan-related indicators. After a literature review and consultation with experts in the WatSan sector, nine indicators were selected. Indicator selection was based on relevance to and suggested influence on national water and sanitation service delivery, and to maximize data availability across as many countries as possible. A hierarchical clustering method and a gap statistic analysis were used to group countries into a natural number of relevant clusters. Two stages of clustering resulted in five clusters, representing 156 countries or 6.75 billion people. The five clusters were not well explained by income or geography, and were unique from existing country clusters used in international development. Analysis of these five clusters revealed that they were more compact and well separated than United Nations and World Bank country clusters. This analysis and resulting country typology suggest that previous geography- or income-based country groupings can be improved upon for applications in the WatSan sector by utilizing globally available WatSan-related indicators. Potential applications include guiding and discussing research, informing policy, improving resource targeting, describing sector progress, and identifying critical knowledge gaps in the WatSan sector. PMID:24054545

  18. Automatic pole-like object modeling via 3D part-based analysis of point cloud

    NASA Astrophysics Data System (ADS)

    He, Liu; Yang, Haoxiang; Huang, Yuchun

    2016-10-01

    Pole-like objects, including trees, lampposts and traffic signs, are indispensable part of urban infrastructure. With the advance of vehicle-based laser scanning (VLS), massive point cloud of roadside urban areas becomes applied in 3D digital city modeling. Based on the property that different pole-like objects have various canopy parts and similar trunk parts, this paper proposed the 3D part-based shape analysis to robustly extract, identify and model the pole-like objects. The proposed method includes: 3D clustering and recognition of trunks, voxel growing and part-based 3D modeling. After preprocessing, the trunk center is identified as the point that has local density peak and the largest minimum inter-cluster distance. Starting from the trunk centers, the remaining points are iteratively clustered to the same centers of their nearest point with higher density. To eliminate the noisy points, cluster border is refined by trimming boundary outliers. Then, candidate trunks are extracted based on the clustering results in three orthogonal planes by shape analysis. Voxel growing obtains the completed pole-like objects regardless of overlaying. Finally, entire trunk, branch and crown part are analyzed to obtain seven feature parameters. These parameters are utilized to model three parts respectively and get signal part-assembled 3D model. The proposed method is tested using the VLS-based point cloud of Wuhan University, China. The point cloud includes many kinds of trees, lampposts and other pole-like posters under different occlusions and overlaying. Experimental results show that the proposed method can extract the exact attributes and model the roadside pole-like objects efficiently.

  19. A Model-Based Cluster Analysis of Maternal Emotion Regulation and Relations to Parenting Behavior.

    PubMed

    Shaffer, Anne; Whitehead, Monica; Davis, Molly; Morelen, Diana; Suveg, Cynthia

    2017-10-15

    In a diverse community sample of mothers (N = 108) and their preschool-aged children (M age  = 3.50 years), this study conducted person-oriented analyses of maternal emotion regulation (ER) based on a multimethod assessment incorporating physiological, observational, and self-report indicators. A model-based cluster analysis was applied to five indicators of maternal ER: maternal self-report, observed negative affect in a parent-child interaction, baseline respiratory sinus arrhythmia (RSA), and RSA suppression across two laboratory tasks. Model-based cluster analyses revealed four maternal ER profiles, including a group of mothers with average ER functioning, characterized by socioeconomic advantage and more positive parenting behavior. A dysregulated cluster demonstrated the greatest challenges with parenting and dyadic interactions. Two clusters of intermediate dysregulation were also identified. Implications for assessment and applications to parenting interventions are discussed. © 2017 Family Process Institute.

  20. [Difficulties in emotion regulation and personal distress in young adults with social anxiety].

    PubMed

    Contardi, Anna; Farina, Benedetto; Fabbricatore, Mariantonietta; Tamburello, Stella; Scapellato, Paolo; Penzo, Ilaria; Tamburello, Antonino; Innamorati, Marco

    2013-01-01

    The aim of this study was to assess the association between social anxiety and difficulties in emotion regulation in a sample of Italian young adults. Our convenience sample was composed of 298 Italian young adults (184 women and 114 men) aged 18-34 years. Participants were administered the Interaction Anxiousness Scale (IAS), the Audience Anxiousness Scale (AAS), the Difficulties in Emotion Regulation Scale (DERS), and the Interpersonal Reactivity Index (IRI). A Two Step cluster analysis was used to group subjects according to their level of social anxiety. The cluster analysis indicated a two-cluster solution. The first cluster included 163 young adults with higher scores on the AAS and the IAS than those included in cluster 2 (n=135). A generalized linear model with groups as dependent variable indicated that people with higher social anxiety (compared to those with lower social anxiety) have higher scores on the dimension personal distress of the IRI (p<0.01), and on the DERS non acceptance of negative emotions (p<0.001) and lack of emotional clarity (p<0.05). The results are consistent with models of psychopathology, which hypothesize that people who cannot deal effectively with their emotions may develop depressive and anxious disorders.

  1. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes.

    PubMed

    Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N

    2017-01-04

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Phylogenetic Relationships of Citrus and Its Relatives Based on matK Gene Sequences

    PubMed Central

    Penjor, Tshering; Uehara, Miki; Ide, Manami; Matsumoto, Natsumi; Matsumoto, Ryoji

    2013-01-01

    The genus Citrus includes mandarin, orange, lemon, grapefruit and lime, which have high economic and nutritional value. The family Rutaceae can be divided into 7 subfamilies, including Aurantioideae. The genus Citrus belongs to the subfamily Aurantioideae. In this study, we sequenced the chloroplast matK genes of 135 accessions from 22 genera of Aurantioideae and analyzed them phylogenetically. Our study includes many accessions that have not been examined in other studies. The subfamily Aurantioideae has been classified into 2 tribes, Clauseneae and Citreae, and our current molecular analysis clearly discriminate Citreae from Clauseneae by using only 1 chloroplast DNA sequence. Our study confirms previous observations on the molecular phylogeny of Aurantioideae in many aspects. However, we have provided novel information on these genetic relationships. For example, inconsistent with the previous observation, and consistent with our preliminary study using the chloroplast rbcL genes, our analysis showed that Feroniella oblata is not nested in Citrus species and is closely related with Feronia limonia. Furthermore, we have shown that Murraya paniculata is similar to Merrillia caloxylon and is dissimilar to Murraya koenigii. We found that “true citrus fruit trees” could be divided into 2 subclusters. One subcluster included Citrus, Fortunella, and Poncirus, while the other cluster included Microcitrus and Eremocitrus. Compared to previous studies, our current study is the most extensive phylogenetic study of Citrus species since it includes 93 accessions. The results indicate that Citrus species can be classified into 3 clusters: a citron cluster, a pummelo cluster, and a mandarin cluster. Although most mandarin accessions belonged to the mandarin cluster, we found some exceptions. We also obtained the information on the genetic background of various species of acid citrus grown in Japan. Because the genus Citrus contains many important accessions, we have comprehensively discussed the classification of this genus. PMID:23638116

  3. Application of Geostatistical Methods and Machine Learning for spatio-temporal Earthquake Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Schaefer, A. M.; Daniell, J. E.; Wenzel, F.

    2014-12-01

    Earthquake clustering tends to be an increasingly important part of general earthquake research especially in terms of seismic hazard assessment and earthquake forecasting and prediction approaches. The distinct identification and definition of foreshocks, aftershocks, mainshocks and secondary mainshocks is taken into account using a point based spatio-temporal clustering algorithm originating from the field of classic machine learning. This can be further applied for declustering purposes to separate background seismicity from triggered seismicity. The results are interpreted and processed to assemble 3D-(x,y,t) earthquake clustering maps which are based on smoothed seismicity records in space and time. In addition, multi-dimensional Gaussian functions are used to capture clustering parameters for spatial distribution and dominant orientations. Clusters are further processed using methodologies originating from geostatistics, which have been mostly applied and developed in mining projects during the last decades. A 2.5D variogram analysis is applied to identify spatio-temporal homogeneity in terms of earthquake density and energy output. The results are mitigated using Kriging to provide an accurate mapping solution for clustering features. As a case study, seismic data of New Zealand and the United States is used, covering events since the 1950s, from which an earthquake cluster catalogue is assembled for most of the major events, including a detailed analysis of the Landers and Christchurch sequences.

  4. Determinants of the use of dietary supplements among secondary and high school students

    PubMed

    Gajda, Karolina; Zielińska, Monika; Ciecierska, Anna; Hamułka, Jadwiga

    All over the world, including Poland, the sale of dietary supplements is increasing. More and more often, people including children and youths, use dietary supplements on their own initiative and without any medical indications or knowledge in this field. Analysis of the conditions of using the dietary supplements with vitamins and minerals among secondary school and high school students in Poland. The study included 396 students aged 13-18 years (249 girls and 147 boys). Authors’ questionnaire was used to evaluate the intake of dietary supplements. The use of cluster analysis allowed to distinguish groups of students with similar socio-demographic characteristics and the frequency of use of dietary supplements. In the studied population of students three clusters were created that significantly differed in socio-demographic characteristics. In cluster 1 and 2, were mostly students who used dietary supplements (respectively, 56% of respondents and 100%). In cluster 1 there were mostly students coming from rural areas and small city, with a worse financial situation, mainly boys (56%), while cluster 2 was dominated by girls (81%) living in a big city, coming from families with a good financial situation and who were more likely to be underweight (28.8%). In cluster 3 there were mostly older students (62%), not taking dietary supplements. In comparison to cluster 2, they had lower frequency of breakfast consumption (55% vs. 69%), but higher frequency of the consumption of soft drinks, fast-food, coffee as well as salt use at the table. The results show that the use of dietary supplements in adolescence is a common phenomenon and slightly conditioned by eating behaviors. This unfavorable habit of common dietary supplements intake observed among students indicates the need for education on the benefits and risks of the supplements usage.

  5. Mineral constituents profile of biochar derived from diversified waste biomasses: implications for agricultural applications.

    PubMed

    Zhao, Ling; Cao, Xinde; Wang, Qun; Yang, Fan; Xu, Shi

    2013-01-01

    The wide distribution and high heterogeneity of different elements in biochars derived from diverse feedstocks make it difficult to regulate their application in soil and to evaluate the maximum potential contribution of the nutrients and trace metals as well as the potential risk of toxic metals. This study classified 20 biochars, covering six typical categories, into three clusters according to their similarity and distance on nutrients and minerals using cluster analysis. Four principle components (PC) were extracted using factor analysis to reduce dimension and clearly characterize the mineral profile of these biochars. The contribution of each group of elements in the PCs to every cluster was clarified. PC1 had a high loading for Mg, Cu, Zn, Al, and Fe; PC2 was related to N, K, and Mn; and PC3 and PC4 mainly represented P and Ca. Cluster 1 included bone dregs and eggshell biochars with PC3 and PC4 as the main contributors. Cluster 2 included waterweeds and waste paper biochars, which were close to shrimp hull and chlorella biochars, with the main contributions being from PC2 and PC4. Cluster 3 included biochars with PC1 as the main contributor. At a soil biochar amendment rate of 50 t ha, the soil nutrients were significantly elevated, whereas the rise in toxic metals was negligible compared with Class I of the China Environmental Quality Standards for Soil. Biochar can potentially supply soil nutrients and trace metals, and different cluster biochars can be applied appropriately to different soils so that excessive or deficient nutrient and metal applications can be avoided. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.

  6. Genetic diversity of red-grained rice landraces in Hani's terraced fields based on phenotypic characteristics

    NASA Astrophysics Data System (ADS)

    Zhou, Xiaomei; Zheng, Yun; Zhang, Tingting; Zhang, Xiaoqian; Ma, Mengli; Meng, Hengling; Wang, Tiantao; Lu, Bingyue

    2018-06-01

    In order to provide useful information for protection and utilization of red-grained rice landraces from Hani's terraced fields, the phenotypic diversity of 61 red-grained rice landraces were assessed based 20 quantitative traits. The results indicated that the phenotypic diversity was abundant in red-grained rice landraces. Coefficients of variation (CV) ranged from 4.878% to 72.878%, and the largest of CV was the panicle neck length, while grain width was smallest. Shannon-Weaver diversity index (H') of 20 traits ranged from 1.464 to 2.165, the largest and the smallest H' values were observed in filled grain number and chalkiness, respectively. Cluster analysis based on unweighted pair group method showed 61 red-grain rice landraces grouped into eight clusters at a cut-off value of 6.2631. The first cluster included 11 landraces, the main cluster II involved 42 landraces, and the cluster IV included 3 landraces. Laopinzhonghongmi, Chena2, Laojingnuo, Bianhao6 and Baimi were separated from the main clusters.

  7. Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

    1992-01-01

    The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.

  8. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study.

    PubMed

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.

  9. A pyrosequencing assay for the quantitative methylation analysis of the PCDHB gene cluster, the major factor in neuroblastoma methylator phenotype.

    PubMed

    Banelli, Barbara; Brigati, Claudio; Di Vinci, Angela; Casciano, Ida; Forlani, Alessandra; Borzì, Luana; Allemanni, Giorgio; Romani, Massimo

    2012-03-01

    Epigenetic alterations are hallmarks of cancer and powerful biomarkers, whose clinical utilization is made difficult by the absence of standardization and of common methods of data interpretation. The coordinate methylation of many loci in cancer is defined as 'CpG island methylator phenotype' (CIMP) and identifies clinically distinct groups of patients. In neuroblastoma (NB), CIMP is defined by a methylation signature, which includes different loci, but its predictive power on outcome is entirely recapitulated by the PCDHB cluster only. We have developed a robust and cost-effective pyrosequencing-based assay that could facilitate the clinical application of CIMP in NB. This assay permits the unbiased simultaneous amplification and sequencing of 17 out of 19 genes of the PCDHB cluster for quantitative methylation analysis, taking into account all the sequence variations. As some of these variations were at CpG doublets, we bypassed the data interpretation conducted by the methylation analysis software to assign the corrected methylation value at these sites. The final result of the assay is the mean methylation level of 17 gene fragments in the protocadherin B cluster (PCDHB) cluster. We have utilized this assay to compare the methylation levels of the PCDHB cluster between high-risk and very low-risk NB patients, confirming the predictive value of CIMP. Our results demonstrate that the pyrosequencing-based assay herein described is a powerful instrument for the analysis of this gene cluster that may simplify the data comparison between different laboratories and, in perspective, could facilitate its clinical application. Furthermore, our results demonstrate that, in principle, pyrosequencing can be efficiently utilized for the methylation analysis of gene clusters with high internal homologies.

  10. Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of LANDSAT digital data

    NASA Technical Reports Server (NTRS)

    Ballew, G.

    1977-01-01

    The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.

  11. Water clustering in glassy polymers.

    PubMed

    Davis, Eric M; Elabd, Yossef A

    2013-09-12

    In this study, water solubility and water clustering in several glassy polymers, including poly(methyl methacrylate) (PMMA), poly(styrene) (PS), and poly(vinylpyrrolidone) (PVP), were measured using both quartz spring microbalance (QSM) and Fourier transform infrared-attenuated total reflectance (FTIR-ATR) spectroscopy. Specifically, QSM was used to determine water solubility, while FTIR-ATR spectroscopy provided a direct, molecular-level measurement of water clustering. The Flory-Huggins theory was employed to obtain a measure of water-polymer interaction and water solubility, through both prediction and regression, where the theory failed to predict water solubility in both PMMA and PVP. Furthermore, a comparison of water clustering between direct FTIR-ATR spectroscopy measurements and predictions from the Zimm-Lundberg clustering analysis produced contradictory results. The failure of the Flory-Huggins theory and Zimm-Lundberg clustering analysis to describe water solubility and water clustering, respectively, in these glassy polymers is in part due to the equilibrium constraints under which these models are derived in contrast to the nonequilibrium state of glassy polymers. Additionally, FTIR-ATR spectroscopy results were compared to temperature-dependent diffusivity data, where a correlation between the activation energy for diffusion and the measured water clustering was observed.

  12. Data Clustering

    NASA Astrophysics Data System (ADS)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.

  13. Genetic variability in selected date palm (Phoenix dactylifera L.) cultivars of United Arab Emirates using ISSR and DAMD markers.

    PubMed

    Purayil, Fayas T; Robert, Gabriel A; Gothandam, Kodiveri M; Kurup, Shyam S; Subramaniam, Sreeramanan; Cheruth, Abdul Jaleel

    2018-02-01

    Nine (9) different date palm ( Phoenix dactylifera L.) cultivars from UAE, which differ in their flower timings were selected to determine the polymorphism and genetic relationship between these cultivars. Hereditary differences and interrelationships were assessed utilizing inter-simple sequence repeat (ISSR) and directed amplification of minisatellite DNA region (DAMD) primers. Analysis on eight DAMD and five ISSR markers produced total of 113 amplicon including 99 polymorphic and 14 monomorphic alleles with a polymorphic percentage of 85.45. The average polymorphic information content for the two-marker system was almost similar (DAMD, 0.445 and ISSR, 0.459). UPGMA based clustering of DAMD and ISSR revealed that mid-season cultivars, Mkh (Khlas) and MB (Barhee) grouped together to form a subcluster in both the marker systems. The genetic similarity analysis followed by clustering of the cumulative data from the DAMD and ISSR resulted in two major clusters with two early-season cultivars (ENg and Ekn), two mid-season cultivars (MKh and MB) and one late-season cultivar (Lkhs) in cluster 1, cluster 2 includes two late-season cultivars, one early-season cultivar and one mid-season cultivar. The cluster analysis of both DAMD and ISSR marker revealed that, the patterns of variation between some of the tested cultivars were similar in both DNA marker systems. Hence, the present study signifies the applicability of DAMD and ISSR marker system in detecting genetic diversity of date palm cultivars flowering at different seasons. This may facilitate the conservation and improvement of date palm cultivars in the future.

  14. Phylogeny of Bacteroides, Prevotella, and Porphyromonas spp. and related bacteria.

    PubMed Central

    Paster, B J; Dewhirst, F E; Olsen, I; Fraser, G J

    1994-01-01

    The phylogenetic structure of the bacteroides subgroup of the cytophaga-flavobacter-bacteroides (CFB) phylum was examined by 16S rRNA sequence comparative analysis. Approximately 95% of the 16S rRNA sequence was determined for 36 representative strains of species of Prevotella, Bacteroides, and Porphyromonas and related species by a modified Sanger sequencing method. A phylogenetic tree was constructed from a corrected distance matrix by the neighbor-joining method, and the reliability of tree branching was established by bootstrap analysis. The bacteroides subgroup was divided primarily into three major phylogenetic clusters which contained most of the species examined. The first cluster, termed the prevotella cluster, was composed of 16 species of Prevotella, including P. melaninogenica, P. intermedia, P. nigrescens, and the ruminal species P. ruminicola. Two oral species, P. zoogleoformans and P. heparinolytica, which had been recently placed in the genus Prevotella, did not fall within the prevotella cluster. These two species and six species of Bacteroides, including the type species B. fragilis, formed the second cluster, termed the bacteroides cluster. The third cluster, termed the porphyromonas cluster, was divided into two subclusters. The first contained Porphyromonas gingivalis, P. endodontalis, P. asaccharolytica, P. circumdentaria, P. salivosa, [Bacteroides] levii (the brackets around genus are used to indicate that the species does not belong to the genus by the sensu stricto definition), and [Bacteroides] macacae, and the second subcluster contained [Bacteroides] forsythus and [Bacteroides] distasonis. [Bacteroides] splanchnicus fell just outside the three major clusters but still belonged within the bacteroides subgroup. With few exceptions, the 16 S rRNA data were in overall agreement with previously proposed reclassifications of species of Bacteroides, Prevotella, and Porphyromonas. Suggestions are made to accommodate those species which do not fit previous reclassification schemes. PMID:8300528

  15. Fitness as a determinant of arterial stiffness in healthy adult men: a cross-sectional study.

    PubMed

    Chung, Jinwook; Kim, Milyang; Jin, Youngsoo; Kim, Yonghwan; Hong, Jeeyoung

    2018-01-01

    Fitness is known to influence arterial stiffness. This study aimed to assess differences in cardiorespiratory endurance, muscular strength, and flexibility according to arterial stiffness, based on sex and age. We enrolled 1590 healthy adults (men: 1242, women: 348) who were free of metabolic syndrome. We measured cardiorespiratory endurance in an exercise stress test on a treadmill, muscular strength by a grip test, and flexibility by upper body forward-bends from a standing position. The brachial-ankle pulse wave velocity test was performed to measure arterial stiffness before the fitness test. Cluster analysis was performed to divide the patients into groups with low (Cluster 1) and high (Cluster 2) arterial stiffness. According to the k-cluster analysis results, Cluster 1 included 624 men and 180 women, and Cluster 2 included 618 men and 168 women. Men in the middle-aged group with low arterial stiffness demonstrated higher cardiorespiratory endurance, muscular strength, and flexibility than those with high arterial stiffness. Similarly, among men in the old-aged group, the cardiorespiratory endurance and muscular strength, but not flexibility, differed significantly according to arterial stiffness. Women in both clusters showed similar cardiorespiratory endurance, muscular strength, and flexibility regardless of their arterial stiffness. Among healthy adults, arterial stiffness was inversely associated with fitness in men but not in women. Therefore, fitness seems to be a determinant for arterial stiffness in men. Additionally, regular exercise should be recommended for middle-aged men to prevent arterial stiffness.

  16. A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.

    PubMed

    Brusco, Michael J; Shireman, Emilie; Steinley, Douglas

    2017-09-01

    The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  17. Person mobility in the design and analysis of cluster-randomized cohort prevention trials.

    PubMed

    Vuchinich, Sam; Flay, Brian R; Aber, Lawrence; Bickman, Leonard

    2012-06-01

    Person mobility is an inescapable fact of life for most cluster-randomized (e.g., schools, hospitals, clinic, cities, state) cohort prevention trials. Mobility rates are an important substantive consideration in estimating the effects of an intervention. In cluster-randomized trials, mobility rates are often correlated with ethnicity, poverty and other variables associated with disparity. This raises the possibility that estimated intervention effects may generalize to only the least mobile segments of a population and, thus, create a threat to external validity. Such mobility can also create threats to the internal validity of conclusions from randomized trials. Researchers must decide how to deal with persons who leave study clusters during a trial (dropouts), persons and clusters that do not comply with an assigned intervention, and persons who enter clusters during a trial (late entrants), in addition to the persons who remain for the duration of a trial (stayers). Statistical techniques alone cannot solve the key issues of internal and external validity raised by the phenomenon of person mobility. This commentary presents a systematic, Campbellian-type analysis of person mobility in cluster-randomized cohort prevention trials. It describes four approaches for dealing with dropouts, late entrants and stayers with respect to data collection, analysis and generalizability. The questions at issue are: 1) From whom should data be collected at each wave of data collection? 2) Which cases should be included in the analyses of an intervention effect? and 3) To what populations can trial results be generalized? The conclusions lead to recommendations for the design and analysis of future cluster-randomized cohort prevention trials.

  18. REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Glascoe, L G; Glaser, R E; Chin, H S

    2004-06-17

    The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less

  19. Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

    NASA Astrophysics Data System (ADS)

    Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.

  20. Semi-supervised clustering methods.

    PubMed

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

  1. Software system for data management and distributed processing of multichannel biomedical signals.

    PubMed

    Franaszczuk, P J; Jouny, C C

    2004-01-01

    The presented software is designed for efficient utilization of cluster of PC computers for signal analysis of multichannel physiological data. The system consists of three main components: 1) a library of input and output procedures, 2) a database storing additional information about location in a storage system, 3) a user interface for selecting data for analysis, choosing programs for analysis, and distributing computing and output data on cluster nodes. The system allows for processing multichannel time series data in multiple binary formats. The description of data format, channels and time of recording are included in separate text files. Definition and selection of multiple channel montages is possible. Epochs for analysis can be selected both manually and automatically. Implementation of a new signal processing procedures is possible with a minimal programming overhead for the input/output processing and user interface. The number of nodes in cluster used for computations and amount of storage can be changed with no major modification to software. Current implementations include the time-frequency analysis of multiday, multichannel recordings of intracranial EEG of epileptic patients as well as evoked response analyses of repeated cognitive tasks.

  2. Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials

    PubMed Central

    Andridge, Rebecca. R.

    2011-01-01

    In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. PMID:21259309

  3. Symptom clusters and quality of life among patients with advanced heart failure

    PubMed Central

    Yu, Doris SF; Chan, Helen YL; Leung, Doris YP; Hui, Elsie; Sit, Janet WH

    2016-01-01

    Objectives To identify symptom clusters among patients with advanced heart failure (HF) and the independent relationships with their quality of life (QoL). Methods This is the secondary data analysis of a cross-sectional study which interviewed 119 patients with advanced HF in the geriatric unit of a regional hospital in Hong Kong. The symptom profile and QoL were assessed by using the Edmonton Symptom Assessment Scale (ESAS) and the McGill QoL Questionnaire. Exploratory factor analysis was used to identify the symptom clusters. Hierarchical regression analysis was used to examine the independent relationships with their QoL, after adjusting the effects of age, gender, and comorbidities. Results The patients were at an advanced age (82.9 ± 6.5 years). Three distinct symptom clusters were identified: they were the distress cluster (including shortness of breath, anxiety, and depression), the decondition cluster (fatigue, drowsiness, nausea, and reduced appetite), and the discomfort cluster (pain, and sense of generalized discomfort). These three symptom clusters accounted for 63.25% of variance of the patients' symptom experience. The small to moderate correlations between these symptom clusters indicated that they were rather independent of one another. After adjusting the age, gender and comorbidities, the distress (β = −0.635, P < 0.001), the decondition (β = −0.148, P = 0.01), and the discomfort (β = −0.258, P < 0.001) symptom clusters independently predicted their QoL. Conclusions This study identified the distinctive symptom clusters among patients with advanced HF. The results shed light on the need to develop palliative care interventions for optimizing the symptom control for this life-limiting disease. PMID:27403150

  4. Country clustering applied to the water and sanitation sector: a new tool with potential applications in research and policy.

    PubMed

    Onda, Kyle; Crocker, Jonny; Kayser, Georgia Lyn; Bartram, Jamie

    2014-03-01

    The fields of global health and international development commonly cluster countries by geography and income to target resources and describe progress. For any given sector of interest, a range of relevant indicators can serve as a more appropriate basis for classification. We create a new typology of country clusters specific to the water and sanitation (WatSan) sector based on similarities across multiple WatSan-related indicators. After a literature review and consultation with experts in the WatSan sector, nine indicators were selected. Indicator selection was based on relevance to and suggested influence on national water and sanitation service delivery, and to maximize data availability across as many countries as possible. A hierarchical clustering method and a gap statistic analysis were used to group countries into a natural number of relevant clusters. Two stages of clustering resulted in five clusters, representing 156 countries or 6.75 billion people. The five clusters were not well explained by income or geography, and were distinct from existing country clusters used in international development. Analysis of these five clusters revealed that they were more compact and well separated than United Nations and World Bank country clusters. This analysis and resulting country typology suggest that previous geography- or income-based country groupings can be improved upon for applications in the WatSan sector by utilizing globally available WatSan-related indicators. Potential applications include guiding and discussing research, informing policy, improving resource targeting, describing sector progress, and identifying critical knowledge gaps in the WatSan sector. Copyright © 2013 Elsevier GmbH. All rights reserved.

  5. Antioxidant properties of different edible mushroom species and increased bioconversion efficiency of Pleurotus eryngii using locally available casing materials.

    PubMed

    Mishra, K K; Pal, R S; Arunkumar, R; Chandrashekara, C; Jain, S K; Bhatt, J C

    2013-06-01

    Total phenolics, radical scavenging activity (RSA) on DPPH, ascorbic acid content and chelating activity on Fe(2+) of Pleurotus citrinopileatus, Pleurotus djamor, Pleurotus eryngii, Pleurotus flabellatus, Pleurotus florida, Pleurotus ostreatus, Pleurotus sajor-caju and Hypsizygus ulmarius have been evaluated. The assayed mushrooms contained 3.94-21.67 mg TAE of phenolics, 13.63-69.67% DPPH scavenging activity, 3.76-6.76 mg ascorbic acid and 60.25-82.7% chelating activity. Principal Component Analysis (PCA) revealed that significantly higher total phenolics, RSA on DPPH and growth/day was present in P. eryngii whereas P. citrinopileatus showed higher ascorbic acid and chelating activity. Agglomerative hierarchical clustering analysis revealed that studied mushroom species fall into two clusters; Cluster I included P. djamor, P. eryngii and P. flabellatus, while Cluster II included H. ulmarius, P. sajor-caju, P. citrinopileatus, P. ostreatus and P. florida. Enhanced yield of P. eryngii was achieved on spent compost casing material. Use of casing materials enhanced yield by 21-107% over non-cased substrate. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

    PubMed

    Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

    2017-09-01

    We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.

  7. Cluster II Wideband (WBD) Plasma Wave Investigation Mission Operations and Data Analysis

    NASA Technical Reports Server (NTRS)

    Gurnett, Donald A.

    2004-01-01

    This Summary of Research is being submitted to NASA Goddard Space Flight Center. A summary of the significant accomplishments of the Cluster Wideband (WBD) Plasma Wave Investigation team achieved during the period of the grant, October 1,2000 through January 14, 2004, and a listing of all of the publications that resulted from work carried out under the grant is presented. Also included is a listing of the numerous public outreach activities that took place during the period of the grant in which the Cluster mission and Cluster WBD science were discussed.

  8. Constraining the mass–richness relationship of redMaPPer clusters with angular clustering

    DOE PAGES

    Baxter, Eric J.; Rozo, Eduardo; Jain, Bhuvnesh; ...

    2016-08-04

    The potential of using cluster clustering for calibrating the mass–richness relation of galaxy clusters has been recognized theoretically for over a decade. In this paper, we demonstrate the feasibility of this technique to achieve high-precision mass calibration using redMaPPer clusters in the Sloan Digital Sky Survey North Galactic Cap. By including cross-correlations between several richness bins in our analysis, we significantly improve the statistical precision of our mass constraints. The amplitude of the mass–richness relation is constrained to 7 per cent statistical precision by our analysis. However, the error budget is systematics dominated, reaching a 19 per cent total errormore » that is dominated by theoretical uncertainty in the bias–mass relation for dark matter haloes. We confirm the result from Miyatake et al. that the clustering amplitude of redMaPPer clusters depends on galaxy concentration as defined therein, and we provide additional evidence that this dependence cannot be sourced by mass dependences: some other effect must account for the observed variation in clustering amplitude with galaxy concentration. Assuming that the observed dependence of redMaPPer clustering on galaxy concentration is a form of assembly bias, we find that such effects introduce a systematic error on the amplitude of the mass–richness relation that is comparable to the error bar from statistical noise. Finally, the results presented here demonstrate the power of cluster clustering for mass calibration and cosmology provided the current theoretical systematics can be ameliorated.« less

  9. Newspaper coverage of suicide and initiation of suicide clusters in teenagers in the USA, 1988-96: a retrospective, population-based, case-control study.

    PubMed

    Gould, Madelyn S; Kleinman, Marjorie H; Lake, Alison M; Forman, Judith; Midle, Jennifer Bassett

    2014-06-01

    Public health and clinical efforts to prevent suicide clusters are seriously hampered by the unanswered question of why such outbreaks occur. We aimed to establish whether an environmental factor-newspaper reports of suicide-has a role in the emergence of suicide clusters. In this retrospective, population-based, case-control study, we identified suicide clusters in young people aged 13-20 years in the USA from 1988 to 1996 (preceding the advent of social media) using the time-space Scan statistic. For each cluster community, we selected two matched non-cluster control communities in which suicides of similarly aged youth occurred, from non-contiguous counties within the same state as the cluster. We examined newspapers within each cluster community for stories about suicide published in the days between the first and second suicides in the cluster. In non-cluster communities, we examined a matched length of time after the matched control suicide. We used a content-analysis procedure to code the characteristics of each story and compared newspaper stories about suicide published in case and control communities with mixed-effect regression analyses. We identified 53 suicide clusters, of which 48 were included in the media review. For one cluster we could identify only one appropriate control; therefore, 95 matched control communities were included. The mean number of news stories about suicidal individuals published after an index cluster suicide (7·42 [SD 10·02]) was significantly greater than the mean number of suicide stories published after a non-cluster suicide (5·14 [6.00]; p<0·0001). Several story characteristics, including front-page placement, headlines containing the word suicide or a description of the method used, and detailed descriptions of the suicidal individual and act, appeared more often in stories published after the index cluster suicides than after non-cluster suicides. Our identification of an association between newspaper reports about suicide (including specific story characteristics) and the initiation of teenage suicide clusters should provide an empirical basis to support efforts by mental health professionals, community officials, and the media to work together to identify and prevent the onset of suicide clusters. US National Institute of Mental Health and American Foundation for Suicide Prevention. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

    PubMed

    NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

    2017-08-01

    Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.

  11. Cluster analysis of novel isometric strength measures produces a valid and evidence-based classification structure for wheelchair track racing.

    PubMed

    Connick, Mark J; Beckman, Emma; Vanlandewijck, Yves; Malone, Laurie A; Blomqvist, Sven; Tweedy, Sean M

    2017-11-25

    The Para athletics wheelchair-racing classification system employs best practice to ensure that classes comprise athletes whose impairments cause a comparable degree of activity limitation. However, decision-making is largely subjective and scientific evidence which reduces this subjectivity is required. To evaluate whether isometric strength tests were valid for the purposes of classifying wheelchair racers and whether cluster analysis of the strength measures produced a valid classification structure. Thirty-two international level, male wheelchair racers from classes T51-54 completed six isometric strength tests evaluating elbow extensors, shoulder flexors, trunk flexors and forearm pronators and two wheelchair performance tests-Top-Speed (0-15 m) and Top-Speed (absolute). Strength tests significantly correlated with wheelchair performance were included in a cluster analysis and the validity of the resulting clusters was assessed. All six strength tests correlated with performance (r=0.54-0.88). Cluster analysis yielded four clusters with reasonable overall structure (mean silhouette coefficient=0.58) and large intercluster strength differences. Six athletes (19%) were allocated to clusters that did not align with their current class. While the mean wheelchair racing performance of the resulting clusters was unequivocally hierarchical, the mean performance of current classes was not, with no difference between current classes T53 and T54. Cluster analysis of isometric strength tests produced classes comprising athletes who experienced a similar degree of activity limitation. The strength tests reported can provide the basis for a new, more transparent, less subjective wheelchair racing classification system, pending replication of these findings in a larger, representative sample. This paper also provides guidance for development of evidence-based systems in other Para sports. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  12. Semi-supervised clustering methods

    PubMed Central

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

  13. [Genetic polymorphism of Tulipa gesneriana L. evaluated on the basis of the ISSR marking data].

    PubMed

    Kashin, A S; Kritskaya, T A; Schanzer, I A

    2016-10-01

    Using the method of ISSR analysis, the genetic diversity of 18 natural populations of Tulipa gesneriana L. from the north of the Lower Volga region was examined. The ten ISSR primers used in the study provided identification of 102 PCR fragments, of which 50 were polymorphic (49.0%). According to the proportion of polymorphic markers, two population groups were distinguished: (1) the populations in which the proportion of polymorphic markers ranged from 0.35 to 0.41; (2) the populations in which the proportion of polymorphic markers ranged from 0.64 to 0.85. UPGMA clustering analysis provided subdivision of the sample into two large clusters. The unrooted tree constructed using the Neighbor Joining algorithm had similar topology. The first cluster included slightly variable populations and the second cluster included highly variable populations. The AMOVA analysis showed statistically significant differences (F CT = 0.430; p = 0.000) between the two groups. Local populations are considerably genetically differentiated from each other (F ST = 0.632) and have almost no links via modern gene flow, as evidenced by the results of the Mantel test (r =–0.118; p = 0.819). It is suggested that the degree of genetic similarities and differences between the populations depends on the time and the species dispersal patterns on these territories.

  14. Who are the obese? A cluster analysis exploring subgroups of the obese.

    PubMed

    Green, M A; Strong, M; Razak, F; Subramanian, S V; Relton, C; Bissell, P

    2016-06-01

    Body mass index (BMI) can be used to group individuals in terms of their height and weight as obese. However, such a distinction fails to account for the variation within this group across other factors such as health, demographic and behavioural characteristics. The study aims to examine the existence of subgroups of obese individuals. Data were taken from the Yorkshire Health Study (2010-12) including information on demographic, health and behavioural characteristics. Individuals with a BMI of ≥30 were included. A two-step cluster analysis was used to define groups of individuals who shared common characteristics. The cluster analysis found six distinct groups of individuals whose BMI was ≥30. These subgroups were heavy drinking males, young healthy females; the affluent and healthy elderly; the physically sick but happy elderly; the unhappy and anxious middle aged and a cluster with the poorest health. It is important to account for the important heterogeneity within individuals who are obese. Interventions introduced by clinicians and policymakers should not target obese individuals as a whole but tailor strategies depending upon the subgroups that individuals belong to. © The Author 2015. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    PubMed Central

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  16. OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.

    PubMed

    Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A

    2014-10-16

    The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P <0.0001) among the four clusters. The four clusters divided the sample into severity levels: Cluster 1 reflects the lowest average levels across all symptoms, and cluster 4 reflects the highest average levels. Clusters 2 and 3 capture moderate symptoms levels. Clusters 2 and 3 differed mainly in profiles of anxiety and depression, with Cluster 2 having lower levels of depression and anxiety than Cluster 3, despite higher levels of pain. The results of the cluster analysis of the external sample (n = 478) looked very similar to those found in the original cluster analysis, except for a slight difference in sleep problems. This was despite having patients in the validation sample who were significantly younger (P <0.0001) and had more severe symptoms (higher FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a comprehensive symptom profile. Although our exploratory cluster solution needs confirmation in a longitudinal study, this approach could provide a rationale to support the study of individualized clinical evaluation and intervention.

  17. Input frequency and lexical variability in phonological development: a survival analysis of word-initial cluster production.

    PubMed

    Ota, Mitsuhiko; Green, Sam J

    2013-06-01

    Although it has been often hypothesized that children learn to produce new sound patterns first in frequently heard words, the available evidence in support of this claim is inconclusive. To re-examine this question, we conducted a survival analysis of word-initial consonant clusters produced by three children in the Providence Corpus (0 ; 11-4 ; 0). The analysis took account of several lexical factors in addition to lexical input frequency, including the age of first production, production frequency, neighborhood density and number of phonemes. The results showed that lexical input frequency was a significant predictor of the age at which the accuracy level of cluster production in each word first reached 80%. The magnitude of the frequency effect differed across cluster types. Our findings indicate that some of the between-word variance found in the development of sound production can indeed be attributed to the frequency of words in the child's ambient language.

  18. Cluster analysis and quality assessment of logged water at an irrigation project, eastern Saudi Arabia.

    PubMed

    Hussain, Mahbub; Ahmed, Syed Munaf; Abderrahman, Walid

    2008-01-01

    A multivariate statistical technique, cluster analysis, was used to assess the logged surface water quality at an irrigation project at Al-Fadhley, Eastern Province, Saudi Arabia. The principal idea behind using the technique was to utilize all available hydrochemical variables in the quality assessment including trace elements and other ions which are not considered in conventional techniques for water quality assessments like Stiff and Piper diagrams. Furthermore, the area belongs to an irrigation project where water contamination associated with the use of fertilizers, insecticides and pesticides is expected. This quality assessment study was carried out on a total of 34 surface/logged water samples. To gain a greater insight in terms of the seasonal variation of water quality, 17 samples were collected from both summer and winter seasons. The collected samples were analyzed for a total of 23 water quality parameters including pH, TDS, conductivity, alkalinity, sulfate, chloride, bicarbonate, nitrate, phosphate, bromide, fluoride, calcium, magnesium, sodium, potassium, arsenic, boron, copper, cobalt, iron, lithium, manganese, molybdenum, nickel, selenium, mercury and zinc. Cluster analysis in both Q and R modes was used. Q-mode analysis resulted in three distinct water types for both the summer and winter seasons. Q-mode analysis also showed the spatial as well as temporal variation in water quality. R-mode cluster analysis led to the conclusion that there are two major sources of contamination for the surface/shallow groundwater in the area: fertilizers, micronutrients, pesticides, and insecticides used in agricultural activities, and non-point natural sources.

  19. How do components of evidence-based psychological treatment cluster in practice? A survey and cluster analysis.

    PubMed

    Gifford, Elizabeth V; Tavakoli, Sara; Weingardt, Kenneth R; Finney, John W; Pierson, Heather M; Rosen, Craig S; Hagedorn, Hildi J; Cook, Joan M; Curran, Geoff M

    2012-01-01

    Evidence-based psychological treatments (EBPTs) are clusters of interventions, but it is unclear how providers actually implement these clusters in practice. A disaggregated measure of EBPTs was developed to characterize clinicians' component-level evidence-based practices and to examine relationships among these practices. Survey items captured components of evidence-based treatments based on treatment integrity measures. The Web-based survey was conducted with 75 U.S. Department of Veterans Affairs (VA) substance use disorder (SUD) practitioners and 149 non-VA community-based SUD practitioners. Clinician's self-designated treatment orientations were positively related to their endorsement of those EBPT components; however, clinicians used components from a variety of EBPTs. Hierarchical cluster analysis indicated that clinicians combined and organized interventions from cognitive-behavioral therapy, the community reinforcement approach, motivational interviewing, structured family and couples therapy, 12-step facilitation, and contingency management into clusters including empathy and support, treatment engagement and activation, abstinence initiation, and recovery maintenance. Understanding how clinicians use EBPT components may lead to improved evidence-based practice dissemination and implementation. Published by Elsevier Inc.

  20. Species-richness of the Anopheles annulipes Complex (Diptera: Culicidae) Revealed by Tree and Model-Based Allozyme Clustering Analyses

    DTIC Science & Technology

    2007-01-01

    including tree- based methods such as the unweighted pair group method of analysis ( UPGMA ) and Neighbour-joining (NJ) (Saitou & Nei, 1987). By...based Bayesian approach and the tree-based UPGMA and NJ cluster- ing methods. The results obtained suggest that far more species occur in the An...unlikely that groups that differ by more than these levels are conspecific. Genetic distances were clustered using the UPGMA and NJ algorithms in MEGA

  1. Analysis of risk factors for cluster behavior of dental implant failures.

    PubMed

    Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

    2017-08-01

    Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.

  2. A long-term space astrophysics research program: An x-ray perspective of the components and structure of galaxies

    NASA Technical Reports Server (NTRS)

    Fabbiano, G.

    1995-01-01

    X-ray studies of galaxies by the Smithsonian Astrophysical Observatory (SAO) and MIT are described. Activities at SAO include ROSAT PSPC x-ray data reduction and analysis pipeline; x-ray sources in nearby Sc galaxies; optical, x-ray, and radio study of ongoing galactic merger; a radio, far infrared, optical, and x-ray study of the Sc galaxy NGC247; and a multiparametric analysis of the Einstein sample of early-type galaxies. Activities at MIT included continued analysis of observations with ROSAT and ASCA, and continued development of new approaches to spectral analysis with ASCA and AXAF. Also, a new method for characterizing structure in galactic clusters was developed and applied to ROSAT images of a large sample of clusters. An appendix contains preprints generated by the research.

  3. Clustering Suicide Attempters: Impulsive-Ambivalent, Well-Planned, or Frequent.

    PubMed

    Lopez-Castroman, Jorge; Nogue, Erika; Guillaume, Sebastien; Picot, Marie Christine; Courtet, Philippe

    2016-06-01

    Attempts to predict suicidal behavior within high-risk populations have so far shown insufficient accuracy. Although several psychosocial and clinical features have been consistently associated with suicide attempts, investigations of latent structure in well-characterized populations of suicide attempters are lacking. We analyzed a sample of 1,009 hospitalized suicide attempters that were recruited between 1999 and 2012. Eleven clinically relevant items related to the characteristics of suicidal behavior were submitted to a Hierarchical Ascendant Classification. Phenotypic profiles were compared between the resulting clusters. A decisional tree was constructed to facilitate the differentiation of individuals classified within the first 2 clusters. Most individuals were included in a cluster characterized by less lethal means and planning ("impulse-ambivalent"). A second cluster featured more carefully planned attempts ("well-planned"), more alcohol or drug use before the attempt, and more precautions to avoid interruptions. Finally, a small, third cluster included individuals reporting more attempts ("frequent"), more often serious or violent attempts, and an earlier age at first attempt. Differences across clusters by demographic and clinical characteristics were also found, particularly with the third cluster whose participants had experienced high levels of childhood abuse. Cluster analysis consistently supported 3 distinct clusters of individuals with specific features in their suicidal behaviors and phenotypic profiles that could help clinicians to better focus prevention strategies. © Copyright 2016 Physicians Postgraduate Press, Inc.

  4. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  5. On selecting a prior for the precision parameter of Dirichlet process mixture models

    USGS Publications Warehouse

    Dorazio, R.M.

    2009-01-01

    In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter ?? and a base probability measure G0. In problems where ?? is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for ??. In this paper an approach is developed for computing a prior for the precision parameter ?? that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior.

  6. Clusters of midlife women by physical activity and their racial/ethnic differences.

    PubMed

    Im, Eun-Ok; Ko, Young; Chee, Eunice; Chee, Wonshik; Mao, Jun James

    2017-04-01

    The purpose of this study was to identify clusters of midlife women by physical activity and to determine racial/ethnic differences in physical activities in each cluster. This was a secondary analysis of the data from 542 women (157 non-Hispanic [NH] Whites, 127 Hispanics, 135 NH African Americans, and 123 NH Asian) in a larger Internet study on midlife women's attitudes toward physical activity. The instruments included the Barriers to Health Activities Scale, the Physical Activity Assessment Inventory, the Questions on Attitudes toward Physical Activity, Subjective Norm, Perceived Behavioral Control, and Behavioral Intention, and the Kaiser Physical Activity Survey. The data were analyzed using hierarchical cluster analyses, analysis of variance, and multinominal logistic analyses. A three-cluster solution was adopted: cluster 1 (high active living and sports/exercise activity group; 48%), cluster 2 (high household/caregiving and occupational activity group; 27%), and cluster 3 (low active living and sports/exercise activity group; 26%). There were significant racial/ethnic differences in occupational activities of clusters 1 and 3 (all P < 0.01). Compared with cluster 1, cluster 2 tended to have lower family income, less access to health care, higher unemployment, higher perceived barriers scores, and lower social influences scores (all P < 0.01). Compared with cluster 1, cluster 3 tended to have greater obesity, less access to health care, higher perceived barriers scores, more negative attitudes toward physical activity, and lower self-efficacy scores (all P < 0.01). Midlife women's unique patterns of physical activity and their associated factors need to be considered in future intervention development.

  7. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    PubMed

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  8. Identification of the Main Regulator Responsible for Synthesis of the Typical Yellow Pigment Produced by Trichoderma reesei

    PubMed Central

    Derntl, Christian; Rassinger, Alice; Srebotnik, Ewald; Mach, Robert L.

    2016-01-01

    ABSTRACT The industrially used ascomycete Trichoderma reesei secretes a typical yellow pigment during cultivation, while other Trichoderma species do not. A comparative genomic analysis suggested that a putative secondary metabolism cluster, containing two polyketide-synthase encoding genes, is responsible for the yellow pigment synthesis. This cluster is conserved in a set of rather distantly related fungi, including Acremonium chrysogenum and Penicillium chrysogenum. In an attempt to silence the cluster in T. reesei, two genes of the cluster encoding transcription factors were individually deleted. For a complete genetic proof-of-function, the genes were reinserted into the genomes of the respective deletion strains. The deletion of the first transcription factor (termed yellow pigment regulator 1 [Ypr1]) resulted in the full abolishment of the yellow pigment formation and the expression of most genes of this cluster. A comparative high-pressure liquid chromatography (HPLC) analysis of supernatants of the ypr1 deletion and its parent strain suggested the presence of several yellow compounds in T. reesei that are all derived from the same cluster. A subsequent gas chromatography/mass spectrometry analysis strongly indicated the presence of sorbicillin in the major HPLC peak. The presence of the second transcription factor, termed yellow pigment regulator 2 (Ypr2), reduces the yellow pigment formation and the expression of most cluster genes, including the gene encoding the activator Ypr1. IMPORTANCE Trichoderma reesei is used for industry-scale production of carbohydrate-active enzymes. During growth, it secretes a typical yellow pigment. This is not favorable for industrial enzyme production because it makes the downstream process more complicated and thus increases operating costs. In this study, we demonstrate which regulators influence the synthesis of the yellow pigment. Based on these data, we also provide indication as to which genes are under the control of these regulators and are finally responsible for the biosynthesis of the yellow pigment. These genes are organized in a cluster that is also found in other industrially relevant fungi, such as the two antibiotic producers Penicillium chrysogenum and Acremonium chrysogenum. The targeted manipulation of a secondary metabolism cluster is an important option for any biotechnologically applied microorganism. PMID:27520818

  9. The effects of co-morbidity in defining major depression subtypes associated with long-term course and severity.

    PubMed

    Wardenaar, K J; van Loo, H M; Cai, T; Fava, M; Gruber, M J; Li, J; de Jonge, P; Nierenberg, A A; Petukhova, M V; Rose, S; Sampson, N A; Schoevers, R A; Wilcox, M A; Alonso, J; Bromet, E J; Bunting, B; Florescu, S E; Fukao, A; Gureje, O; Hu, C; Huang, Y Q; Karam, A N; Levinson, D; Medina Mora, M E; Posada-Villa, J; Scott, K M; Taib, N I; Viana, M C; Xavier, M; Zarkov, Z; Kessler, R C

    2014-11-01

    Although variation in the long-term course of major depressive disorder (MDD) is not strongly predicted by existing symptom subtype distinctions, recent research suggests that prediction can be improved by using machine learning methods. However, it is not known whether these distinctions can be refined by added information about co-morbid conditions. The current report presents results on this question. Data came from 8261 respondents with lifetime DSM-IV MDD in the World Health Organization (WHO) World Mental Health (WMH) Surveys. Outcomes included four retrospectively reported measures of persistence/severity of course (years in episode; years in chronic episodes; hospitalization for MDD; disability due to MDD). Machine learning methods (regression tree analysis; lasso, ridge and elastic net penalized regression) followed by k-means cluster analysis were used to augment previously detected subtypes with information about prior co-morbidity to predict these outcomes. Predicted values were strongly correlated across outcomes. Cluster analysis of predicted values found three clusters with consistently high, intermediate or low values. The high-risk cluster (32.4% of cases) accounted for 56.6-72.9% of high persistence, high chronicity, hospitalization and disability. This high-risk cluster had both higher sensitivity and likelihood ratio positive (LR+; relative proportions of cases in the high-risk cluster versus other clusters having the adverse outcomes) than in a parallel analysis that excluded measures of co-morbidity as predictors. Although the results using the retrospective data reported here suggest that useful MDD subtyping distinctions can be made with machine learning and clustering across multiple indicators of illness persistence/severity, replication with prospective data is needed to confirm this preliminary conclusion.

  10. Clustering cancer gene expression data by projective clustering ensemble

    PubMed Central

    Yu, Xianxue; Yu, Guoxian

    2017-01-01

    Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920

  11. Cluster analysis of the clinical histories of cattle affected with bovine anaemia associated with Theileria orientalis Ikeda type infection.

    PubMed

    Lawrence, K E; Forsyth, S F; Vaatstra, B L; McFadden, Amj; Pulford, D J; Govindaraju, K; Pomroy, W E

    2017-11-01

    AIM To determine the most commonly used words in the clinical histories of animals naturally infected with Theileria orientalis Ikeda type; whether these words differed between cases categorised by age, farm type or haematocrit (HCT), and if there was any clustering of the common words in relation to these categories. METHODS Clinical histories were transcribed for 605 cases of bovine anaemia associated with T. orientalis (TABA), that were submitted to laboratories with blood samples which tested positive for T. orientalis Ikeda type infection by PCR analysis, between October 2012 and November 2014. χ 2 tests were used to determine whether the proportion of submissions for each word was similar across the categories of HCT (normal, moderate anaemia or severe anaemia), farm type (dairy or beef) and age (young or old). Correspondence analysis (CA) was carried out on a contingency table of the frequency of the 28 most commonly used history words, cross-tabulated by age categories (young, old or unknown). Agglomerative hierarchical clustering, using Ward's method, was then performed on the coordinates from the correspondence analysis. RESULTS The six most commonly used history words were jaundice (204/605), lethargic (162/605), pale mucous membranes (161/605), cow (151/605), anaemia (147/605), and off milk (115/605). The proportion of cases with some history words differed between categories of age, farm type and HCT. The cluster analysis indicated that the recorded history words were grouped in two main clusters. The first included the words weight loss, tachycardia, pale mucous membranes, anaemia, lethargic and thin, and was associated with adult (p<0.001), severe anaemia (p<0.001) and dairy (p<0.001). The second cluster included the words deaths, ill-thrift, calves, calf and diarrhoea, and was associated with young (p<0.001), normal HCT (p<0.001), beef (p<0.001) and moderate anaemia (p<0.001). CONCLUSIONS AND CLINICAL RELEVANCE Cluster analysis of words recorded in clinical histories submitted with blood samples from cases of TABA indicates that two potentially different disease syndromes were associated with T. orientalis Ikeda type infection. One was consistent with the affected cattle suffering from a severe regenerative extravascular haemolytic anaemia, the second displaying as ill thrift and diarrhoea, particularly in young beef cattle.

  12. [Spatial analysis of syphilis and gonorrhea infections in a Public Health Service in Madrid].

    PubMed

    Wijers, Irene G M; Sánchez Gómez, Amaya; Taveira Jiménez, Jose Antonio

    2017-06-21

    Sexually transmitted diseases are a significant public health problem. Within the Madrid Autonomous Region, the districts with the highest syphilis and gonorrhea incidences are part of the same Public Health Service (Servicio de Salud Pública del Área 7, SSPA 7). The objective of this study was to identify, by spatial analysis, clusters of syphilis and gonorrhea infections in this SSPA in Madrid. All confirmed syphilis and gonorrhea cases registered in SSPA 7 in Madrid were selected. Moran's I was calculated in order to identify the existence of spatial autocorrelation and a cluster analysis was performed. Clusters and cumulative incidences (CI) per health zone were mapped. The district with most cases was Centro (CI: 67.5 and 160.7 per 100.000 inhabitants for syphilis and gonorrhea, respectively) with the highest CI (120.0 and 322.6 per 100.000 inhabitants) in the Justicia health zone.91.6% of all syphilis cases and 89.6% of gonorrhea cases were among men who have sex with men (MSM). Moran's I was 0.54 and 0.55 (p=0.001) for syphilis and gonorrhea, respectively. For syphilis, a cluster was identified including the six health zones of the Centro district, with a relative risk (RR)of 6.66 (p=0.001). For gonorrhea, a cluster was found including the Centro district, three health zones of the Chamberí district and one of Latina (RR 5.05; p=0.001). Centro was the district with most cases of syphilis and gonorrhea and the most affected population were MSM. For both infections, clusters were found with an important overlap. By identifying the most vulnerable health zones and populations, these results can help to design public health measures for preventing sexually transmitted diseases.

  13. Association of Mediterranean diet and other health behaviours with barriers to healthy eating and perceived health among British adults of retirement age.

    PubMed

    Lara, Jose; McCrum, Leigh-Ann; Mathers, John C

    2014-11-01

    Health behaviours including diet, smoking, alcohol consumption, and physical activity, predict health risks at the population level. We explored health behaviours, barriers to healthy eating and self-rated health among individuals of retirement age. Study design 82 men and 124 women participated in an observational, cross-sectional online survey. Main outcome measures A 14-item Mediterranean diet score (MDPS), perceived barriers to healthy eating (PBHE), self-reported smoking, physical activity habits, and current and prior perceived health status (PHS) were assessed. A health behaviours score (HBS) including smoking, physical activity, body mass index (BMI) and MDPS was created to evaluate associations with PHS. Two-step cluster analysis identified natural groups based on PBHE. Analysis of variance was used to evaluate between group comparisons. PBHE number was associated with BMI (r=0.28, P<0.001), age (r=-0.19; P=0.006), and MDPS (r=-0.31; P<0.001). PHBE cluster analysis produced three clusters. Cluster-1 members (busy lifestyle) were significantly younger (57 years), more overweight (28kg/m(2)), scored lower on MDPS (4.7) and reported more PBHE (7). Cluster-3 members (no characteristic PBHE) were leaner (25kg/m(2)), reported the lowest number of PBHE (2), and scored higher on HBS (2.7) and MDPS (6.2). Those in PHS categories, bad/fair, good, and very good, reported mean HBS of 2.0, 2.4 and 3.0, respectively (P<0.001). Compared with the previous year, no significant associations between PHS and HBS were observed. PBHE clusters were associated with BMI, MDPS and PHS and could be a useful tool to tailor interventions for those of peri-retirement age. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  14. Typical patterns of modifiable health risk factors (MHRFs) in elderly women in Germany: results from the cross-sectional German Health Update (GEDA) study, 2009 and 2010.

    PubMed

    Jentsch, Franziska; Allen, Jennifer; Fuchs, Judith; von der Lippe, Elena

    2017-04-04

    Modifiable health risk factors (MHRFs) significantly affect morbidity and mortality rates and frequently occur in specific combinations or risk clusters. Using five MHRFs (smoking, high-risk alcohol consumption, physical inactivity, low intake of fruits and vegetables, and obesity) this study investigates the extent to which risk clusters are observed in a representative sample of women aged 65 and older in Germany. Additionally, the structural composition of the clusters is systematically compared with data and findings from other countries. A pooled data set of Germany's representative cross-sectional surveys GEDA09 and GEDA10 was used. The cohort comprised 4,617 women aged 65 and older. Specific risk clusters based on five MHRFs are identified, using hierarchical cluster analysis. The MHRFs were defined as current smoking (daily or occasionally), risk alcohol consumption (according to the Alcohol Use Disorders Identification Test, a sum score of 4 or more points), physical inactivity (less active than 5 days per week for at least 30 min and lack of sports-related activity in the last three months), low intake of fruits and vegetables (less than one serving of fruits and one of vegetables per day), and obesity (a body mass index equal to or greater than 30). A total of 4,292 cases with full information on these factors are included in the cluster analysis. Extended analyses were also performed to include the number of chronic diseases by age and socioeconomic status of group members. A total of seven risk clusters were identified. In a comparison with data from international studies, the seven risk clusters were found to be stable with a high degree of structural equivalency. Evidence of the stability of risk clusters across various study populations provides a useful starting point for long-term targeted health interventions. The structural clusters provide information through which various MHRFs can be evaluated simultaneously.

  15. Information Needs of Rural Malaysians: An Exploratory Study of a Cluster of Three Villages with No Library Service.

    ERIC Educational Resources Information Center

    Anwar, Mumtaz Ali; Supaat, Hana Imam

    1998-01-01

    Presents an analysis of 33 studies on rural information needs of a cluster of three Malaysian villages with no library service. Study found information needs relate to: religious information, family bonding, current affairs, health information, and education. The purposes for seeking information include: fulfillment of need to know, problem…

  16. Weighted similarity-based clustering of chemical structures and bioactivity data in early drug discovery.

    PubMed

    Perualila-Tan, Nolen Joy; Shkedy, Ziv; Talloen, Willem; Göhlmann, Hinrich W H; Moerbeke, Marijke Van; Kasim, Adetayo

    2016-08-01

    The modern process of discovering candidate molecules in early drug discovery phase includes a wide range of approaches to extract vital information from the intersection of biology and chemistry. A typical strategy in compound selection involves compound clustering based on chemical similarity to obtain representative chemically diverse compounds (not incorporating potency information). In this paper, we propose an integrative clustering approach that makes use of both biological (compound efficacy) and chemical (structural features) data sources for the purpose of discovering a subset of compounds with aligned structural and biological properties. The datasets are integrated at the similarity level by assigning complementary weights to produce a weighted similarity matrix, serving as a generic input in any clustering algorithm. This new analysis work flow is semi-supervised method since, after the determination of clusters, a secondary analysis is performed wherein it finds differentially expressed genes associated to the derived integrated cluster(s) to further explain the compound-induced biological effects inside the cell. In this paper, datasets from two drug development oncology projects are used to illustrate the usefulness of the weighted similarity-based clustering approach to integrate multi-source high-dimensional information to aid drug discovery. Compounds that are structurally and biologically similar to the reference compounds are discovered using this proposed integrative approach.

  17. Fatality rate of pedestrians and fatal crash involvement rate of drivers in pedestrian crashes: a case study of Iran.

    PubMed

    Kashani, Ali Tavakoli; Besharati, Mohammad Mehdi

    2017-06-01

    The aim of this study was to uncover patterns of pedestrian crashes. In the first stage, 34,178 pedestrian-involved crashes occurred in Iran during a four-year period were grouped into homogeneous clusters using a clustering analysis. Next, some in-cluster and inter-cluster crash patterns were analysed. The clustering analysis yielded six pedestrian crash groups. Car/van/pickup crashes on rural roads as well as heavy vehicle crashes were found to be less frequent but more likely to be fatal compared to other crash clusters. In addition, after controlling for crash frequency in each cluster, it was found that the fatality rate of each pedestrian age group as well as the fatal crash involvement rate of each driver age group varies across the six clusters. Results of present study has some policy implications including, promoting pedestrian safety training sessions for heavy vehicle drivers, imposing limitations over elderly heavy vehicle drivers, reinforcing penalties toward under 19 drivers and motorcyclists. In addition, road safety campaigns in rural areas may be promoted to inform people about the higher fatality rate of pedestrians on rural roads. The crash patterns uncovered in this study might also be useful for prioritizing future pedestrian safety research areas.

  18. Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of Landsat digital data. [mapping of hydrothermally altered volcanic rocks

    NASA Technical Reports Server (NTRS)

    Ballew, G.

    1977-01-01

    The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed using Johnson's HICLUS program. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.

  19. The relative impact of baryons and cluster shape on weak lensing mass estimates of galaxy clusters

    NASA Astrophysics Data System (ADS)

    Lee, B. E.; Le Brun, A. M. C.; Haq, M. E.; Deering, N. J.; King, L. J.; Applegate, D.; McCarthy, I. G.

    2018-05-01

    Weak gravitational lensing depends on the integrated mass along the line of sight. Baryons contribute to the mass distribution of galaxy clusters and the resulting mass estimates from lensing analysis. We use the cosmo-OWLS suite of hydrodynamic simulations to investigate the impact of baryonic processes on the bias and scatter of weak lensing mass estimates of clusters. These estimates are obtained by fitting NFW profiles to mock data using MCMC techniques. In particular, we examine the difference in estimates between dark matter-only runs and those including various prescriptions for baryonic physics. We find no significant difference in the mass bias when baryonic physics is included, though the overall mass estimates are suppressed when feedback from AGN is included. For lowest-mass systems for which a reliable mass can be obtained (M200 ≈ 2 × 1014M⊙), we find a bias of ≈-10 per cent. The magnitude of the bias tends to decrease for higher mass clusters, consistent with no bias for the most massive clusters which have masses comparable to those found in the CLASH and HFF samples. For the lowest mass clusters, the mass bias is particularly sensitive to the fit radii and the limits placed on the concentration prior, rendering reliable mass estimates difficult. The scatter in mass estimates between the dark matter-only and the various baryonic runs is less than between different projections of individual clusters, highlighting the importance of triaxiality.

  20. Machine-learned cluster identification in high-dimensional data.

    PubMed

    Ultsch, Alfred; Lötsch, Jörn

    2017-02-01

    High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  1. Clusters of Monoisotopic Elements for Calibration in (TOF) Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Kolářová, Lenka; Prokeš, Lubomír; Kučera, Lukáš; Hampl, Aleš; Peňa-Méndez, Eladia; Vaňhara, Petr; Havel, Josef

    2017-03-01

    Precise calibration in TOF MS requires suitable and reliable standards, which are not always available for high masses. We evaluated inorganic clusters of the monoisotopic elements gold and phosphorus (Au n +/Au n - and P n +/P n -) as an alternative to peptides or proteins for the external and internal calibration of mass spectra in various experimental and instrumental scenarios. Monoisotopic gold or phosphorus clusters can be easily generated in situ from suitable precursors by laser desorption/ionization (LDI) or matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Their use offers numerous advantages, including simplicity of preparation, biological inertness, and exact mass determination even at lower mass resolution. We used citrate-stabilized gold nanoparticles to generate gold calibration clusters, and red phosphorus powder to generate phosphorus clusters. Both elements can be added to samples to perform internal calibration up to mass-to-charge ( m/z) 10-15,000 without significantly interfering with the analyte. We demonstrated the use of the gold and phosphorous clusters in the MS analysis of complex biological samples, including microbial standards and total extracts of mouse embryonic fibroblasts. We believe that clusters of monoisotopic elements could be used as generally applicable calibrants for complex biological samples.

  2. Supra-galactic colour patterns in globular cluster systems

    NASA Astrophysics Data System (ADS)

    Forte, Juan C.

    2017-07-01

    An analysis of globular cluster systems associated with galaxies included in the Virgo and Fornax Hubble Space Telescope-Advanced Camera Surveys reveals distinct (g - z) colour modulation patterns. These features appear on composite samples of globular clusters and, most evidently, in galaxies with absolute magnitudes Mg in the range from -20.2 to -19.2. These colour modulations are also detectable on some samples of globular clusters in the central galaxies NGC 1399 and NGC 4486 (and confirmed on data sets obtained with different instruments and photometric systems), as well as in other bright galaxies in these clusters. After discarding field contamination, photometric errors and statistical effects, we conclude that these supra-galactic colour patterns are real and reflect some previously unknown characteristic. These features suggest that the globular cluster formation process was not entirely stochastic but included a fraction of clusters that formed in a rather synchronized fashion over large spatial scales, and in a tentative time lapse of about 1.5 Gy at redshifts z between 2 and 4. We speculate that the putative mechanism leading to that synchronism may be associated with large scale feedback effects connected with violent star-forming events and/or with supermassive black holes.

  3. WIYN OPEN CLUSTER STUDY. LV. ASTROMETRY AND MEMBERSHIP IN NGC 6819

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Platais, Imants; Gosnell, Natalie M.; Meibom, Soren

    2013-08-01

    We present proper motions and astrometric membership analysis for 15,750 stars around the intermediate-age open cluster NGC 6819. The accuracy of relative proper motions for well-measured stars ranges from {approx}0.2 mas yr{sup -1} within 10' of the cluster center to 1.1 mas yr{sup -1} outside this radius. In the proper motion vector-point diagram, the separation between the cluster members and field stars is convincing down to V {approx} 18 and within 10' from the cluster center. The formal sum of membership probabilities indicates a total of {approx}2500 cluster members down to V {approx} 22. We confirm the cluster membership ofmore » several variable stars, including some eclipsing binaries. The estimated absolute proper motion of NGC 6819 is {mu}{sub x}{sup abs}=-2.6{+-}0.5 and {mu}{sub y}{sup abs}=-4.2{+-}0.5 mas yr{sup -1}. A cross-identification between the proper motion catalog and a list of X-ray sources in the field of NGC 6819 resulted in a number of new likely optical counterparts, including a candidate CV. For the first time we show that there is significant differential reddening toward NGC 6819.« less

  4. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

    PubMed Central

    2009-01-01

    Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996

  5. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

    PubMed

    Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

    2009-10-12

    Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.

  6. Identification of different nutritional status groups in institutionalized elderly people by cluster analysis.

    PubMed

    López-Contreras, María José; López, Maria Ángeles; Canteras, Manuel; Candela, María Emilia; Zamora, Salvador; Pérez-Llamas, Francisca

    2014-03-01

    To apply a cluster analysis to groups of individuals of similar characteristics in an attempt to identify undernutrition or the risk of undernutrition in this population. A cross-sectional study. Seven public nursing homes in the province of Murcia, on the Mediterranean coast of Spain. 205 subjects aged 65 and older (131 women and 74 men). Dietary intake (energy and nutrients), anthropometric (body mass index, skinfold thickness, mid-arm muscle circumference, mid-arm muscle area, corrected arm muscle area, waist to hip ratio) and biochemical and haematological (serum albumin, transferrin, total cholesterol, total lymphocyte count). Variables were analyzed by cluster analysis. The results of the cluster analysis, including intake, anthropometric and analytical data showed that, of the 205 elderly subjects, 66 (32.2%) were over - weight/obese, 72 (35.1%) had an adequate nutritional status and 67 (32.7%) were undernourished or at risk of undernutrition. The undernourished or at risk of undernutrition group showed the lowest values for dietary intake and the anthropometric and analytical parameters measured. Our study shows that cluster analysis is a useful statistical method for assessing the nutritional status of institutionalized elderly populations. In contrast, use of the specific reference values frequently described in the literature might fail to detect real cases of undernourishment or those at risk of undernutrition. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.

  7. Low physical activity as a key differentiating factor in the potential high-risk profile for depressive symptoms in older adults.

    PubMed

    Holmquist, Sofie; Mattsson, Sabina; Schele, Ingrid; Nordström, Peter; Nordström, Anna

    2017-09-01

    The identification of potential high-risk groups for depression is of importance. The purpose of the present study was to identify high-risk profiles for depressive symptoms in older individuals, with a focus on functional performance. The population-based Healthy Ageing Initiative included 2,084 community-dwelling individuals (49% women) aged 70. Explorative cluster analysis was used to group participants according to functional performance level, using measures of basic mobility skills, gait variability, and grip strength. Intercluster differences in depressive symptoms (measured by the Geriatric Depression Scale [GDS]-15), physical activity (PA; measured objectively with the ActiGraph GT3X+), and a rich set of covariates were examined. The cluster analysis yielded a seven-cluster solution. One potential high-risk cluster was identified, with overrepresentation of individuals with GDS scores >5 (15.1 vs. 2.7% expected; relative risk = 6.99, P < .001); the prevalence of depressive symptoms was significantly lower in the other clusters (all P < .01). The potential high-risk cluster had significant overrepresentations of obese individuals (39.7 vs. 17.4% expected) and those with type 2 diabetes (24.7 vs. 8.5% expected), and underrepresentation of individuals who fulfilled the World Health Organization's PA recommendations (15.6 vs. 59.1% expected; all P < .01), as well as low levels of functional performance. The present study provided a potential high-risk profile for depressive symptoms among elderly community-dwelling individuals, which included low levels functional performance combined with low levels of PA. Including PA in medical screening of the elderly may aid in identification of potential high-risk individuals for depressive symptoms. © 2017 Wiley Periodicals, Inc.

  8. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design

    PubMed Central

    Li, Fan; Gallis, John A.; Prague, Melanie; Murray, David M.

    2017-01-01

    In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis. PMID:28426295

  9. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1-Design.

    PubMed

    Turner, Elizabeth L; Li, Fan; Gallis, John A; Prague, Melanie; Murray, David M

    2017-06-01

    In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis.

  10. Molecular Eigensolution Symmetry Analysis and Fine Structure

    PubMed Central

    Harter, William G.; Mitchell, Justin C.

    2013-01-01

    Spectra of high-symmetry molecules contain fine and superfine level cluster structure related to J-tunneling between hills and valleys on rovibronic energy surfaces (RES). Such graphic visualizations help disentangle multi-level dynamics, selection rules, and state mixing effects including widespread violation of nuclear spin symmetry species. A review of RES analysis compares it to that of potential energy surfaces (PES) used in Born–Oppenheimer approximations. Both take advantage of adiabatic coupling in order to visualize Hamiltonian eigensolutions. RES of symmetric and D2 asymmetric top rank-2-tensor Hamiltonians are compared with Oh spherical top rank-4-tensor fine-structure clusters of 6-fold and 8-fold tunneling multiplets. Then extreme 12-fold and 24-fold multiplets are analyzed by RES plots of higher rank tensor Hamiltonians. Such extreme clustering is rare in fundamental bands but prevalent in hot bands, and analysis of its superfine structure requires more efficient labeling and a more powerful group theory. This is introduced using elementary examples involving two groups of order-6 (C6 and D3~C3v), then applied to families of Oh clusters in SF6 spectra and to extreme clusters. PMID:23344041

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biewer, Theodore M.; Marcus, Chris; Klepper, C Christopher

    The divertor-specific ITER Diagnostic Residual Gas Analyzer (DRGA) will provide essential information relating to DT fusion plasma performance. This includes pulse-resolving measurements of the fuel isotopic mix reaching the pumping ducts, as well as the concentration of the helium generated as the ash of the fusion reaction. In the present baseline design, the cluster of sensors attached to this diagnostic's differentially pumped analysis chamber assembly includes a radiation compatible version of a commercial quadrupole mass spectrometer, as well as an optical gas analyzer using a plasma-based light excitation source. This paper reports on a laboratory study intended to validate themore » performance of this sensor cluster, with emphasis on the detection limit of the isotopic measurement. This validation study was carried out in a laboratory set-up that closely prototyped the analysis chamber assembly configuration of the baseline design. This includes an ITER-specific placement of the optical gas measurement downstream from the first turbine of the chamber's turbo-molecular pump to provide sufficient light emission while preserving the gas dynamics conditions that allow for \\textasciitilde 1 s response time from the sensor cluster [1].« less

  12. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    PubMed Central

    Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters. PMID:27391786

  13. Diversity of lactic acid bacteria associated with fish and the fish farm environment, established by amplified rRNA gene restriction analysis.

    PubMed

    Michel, Christian; Pelletier, Claire; Boussaha, Mekki; Douet, Diane-Gaëlle; Lautraite, Armand; Tailliez, Patrick

    2007-05-01

    Lactic acid bacteria have become a major source of concern for aquaculture in recent decades. In addition to true pathogenic species of worldwide significance, such as Streptococcus iniae and Lactococcus garvieae, several species have been reported to produce occasional fish mortalities in limited geographic areas, and many unidentifiable or ill-defined isolates are regularly isolated from fish or fish products. To clarify the nature and prevalence of different fish-associated bacteria belonging to the lactic acid bacterium group, a collection of 57 isolates of different origins was studied and compared with a set of 22 type strains, using amplified rRNA gene restriction analysis (ARDRA). Twelve distinct clusters were delineated on the basis of ARDRA profiles and were confirmed by sequencing of sodA and 16S rRNA genes. These clusters included the following: Lactococcus raffinolactis, L. garvieae, Lactococcus l., S. iniae, S. dysgalactiae, S. parauberis, S. agalactiae, Carnobacterium spp., the Enterococcus "faecium" group, a heterogeneous Enterococcus-like cluster comprising indiscernible representatives of Vagococcus fluvialis or the recently recognized V. carniphilus, V. salmoninarum, and Aerococcus spp. Interestingly, the L. lactis and L. raffinolactis clusters appeared to include many commensals of fish, so opportunistic infections caused by these species cannot be disregarded. The significance for fish populations and fish food processing of three or four genetic clusters of uncertain or complex definition, namely, Aerococcus and Enterococcus clusters, should be established more accurately.

  14. The acceptability among young Hindus and Muslims of actively ending the lives of newborns with genetic defects.

    PubMed

    Kamble, Shanmukh; Ahmed, Ramadan; Sorum, Paul Clay; Mullet, Etienne

    2014-03-01

    To explore the views in non-Western cultures about ending the lives of damaged newborns. 254 university students from India and 150 from Kuwait rated the acceptability of ending the lives of newborns with genetic defects in 54 vignettes consisting of all combinations of four factors: gestational age (term or 7 months); severity of genetic defect (trisomy 21 alone, trisomy 21 with serious morphological abnormalities or trisomy 13 with impending death); the parents' attitude about prolonging care (unknown, in favour or opposed); and the procedure used (withholding treatment, withdrawing it or injecting a lethal substance). Four clusters were identified by cluster analysis and subjected to analysis of variance. Cluster I, labelled 'Never Acceptable', included 4% of the Indians and 59% of the Kuwaitis. Cluster II, 'No Firm Opinion', had little variation in rating from one scenario to the next; it included 38% of the Indians and 18% of the Kuwaitis. In Cluster III, 'Parents' Attitude+Severity+Procedure', all three factors affected the ratings; it was composed of 18% of the Indians and 16% of the Kuwaitis. Cluster IV was called 'Severity+Parents' Attitude' because these had the strongest impact; it was composed of 40% of the Indians and 7% of the Kuwaitis. In accordance with the teachings of Islam versus Hinduism, Kuwaiti students were more likely to oppose ending a newborn's life under all conditions, Indian students more likely to favour it and to judge its acceptability in light of the different circumstances.

  15. Genotypic diversity of oscillatoriacean strains belonging to the genera Geitlerinema and Spirulina determined by 16S rDNA restriction analysis.

    PubMed

    Margheri, Maria C; Piccardi, Raffaella; Ventura, Stefano; Viti, Carlo; Giovannetti, Luciana

    2003-05-01

    Genotypic diversity of several cyanobacterial strains mostly isolated from marine or brackish waters, belonging to the genera Geitlerinema and Spirulina, was investigated by amplified 16S ribosomal DNA restriction analysis and compared with morphological features and response to salinity. Cluster analysis was performed on amplified 16S rDNA restriction profiles of these strains along with profiles obtained from sequence data of five Spirulina-like strains, including three representatives of the new genus Halospirulina. Our strains with tightly coiled trichomes from hypersaline waters could be assigned to the Halospirulina genus. Among the uncoiled strains, the two strains of hypersaline origin clustered together and were found to be distant from their counterparts of marine and freshwater habitat. Moreover, another cluster, formed by alkali-tolerant strains with tightly coiled trichomes, was well delineated.

  16. Effects of additional data on Bayesian clustering.

    PubMed

    Yamazaki, Keisuke

    2017-10-01

    Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. A Search for Ram-pressure Stripping in the Hydra I Cluster

    NASA Technical Reports Server (NTRS)

    Brown, B.

    2005-01-01

    Ram-pressure stripping is a method by which hot interstellar gas can be removed from a galaxy moving through a group or cluster of galaxies. Indirect evidence of ram-pressure stripping includes lowered X-ray brightness in a galaxy due to less X-ray emitting gas remaining in the galaxy. Here we present the initial results of our program to determine whether cluster elliptical galaxies have lower hot gas masses than their counterparts in less rich environments. This test requires the use of the high-resolution imaging of the Chandra Observatory and we present our analysis of the galaxies in the nearby cluster Hydra I.

  18. A Search for Ram-pressure Stripping in the Hydra I Cluster

    NASA Technical Reports Server (NTRS)

    Brown, B. A.

    2005-01-01

    Ram-pressure stripping is a method by which hot interstellar gas can be removed from a galaxy moving through a group or cluster of galaxies. Indirect evidence of ram-pressure stripping includes lowered X- ray brightness in a galaxy due to less X-ray emitting gas remaining in the galaxy. Here we present the initial results of our program to determine whether cluster elliptical galaxies have lower hot gas masses than their counterparts in less rich environments. This test requires the use of the high-resolution imaging of the Chundru Observatory and we present our analysis of the galaxies in the nearby cluster Hydra I.

  19. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less

  20. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGES

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less

  1. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study

    PubMed Central

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637

  2. Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

    NASA Astrophysics Data System (ADS)

    Yen, Chi-Fu; Sivasankar, Sanjeevi

    2018-03-01

    Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.

  3. Scientific Cluster Deployment and Recovery - Using puppet to simplify cluster management

    NASA Astrophysics Data System (ADS)

    Hendrix, Val; Benjamin, Doug; Yao, Yushu

    2012-12-01

    Deployment, maintenance and recovery of a scientific cluster, which has complex, specialized services, can be a time consuming task requiring the assistance of Linux system administrators, network engineers as well as domain experts. Universities and small institutions that have a part-time FTE with limited time for and knowledge of the administration of such clusters can be strained by such maintenance tasks. This current work is the result of an effort to maintain a data analysis cluster (DAC) with minimal effort by a local system administrator. The realized benefit is the scientist, who is the local system administrator, is able to focus on the data analysis instead of the intricacies of managing a cluster. Our work provides a cluster deployment and recovery process (CDRP) based on the puppet configuration engine allowing a part-time FTE to easily deploy and recover entire clusters with minimal effort. Puppet is a configuration management system (CMS) used widely in computing centers for the automatic management of resources. Domain experts use Puppet's declarative language to define reusable modules for service configuration and deployment. Our CDRP has three actors: domain experts, a cluster designer and a cluster manager. The domain experts first write the puppet modules for the cluster services. A cluster designer would then define a cluster. This includes the creation of cluster roles, mapping the services to those roles and determining the relationships between the services. Finally, a cluster manager would acquire the resources (machines, networking), enter the cluster input parameters (hostnames, IP addresses) and automatically generate deployment scripts used by puppet to configure it to act as a designated role. In the event of a machine failure, the originally generated deployment scripts along with puppet can be used to easily reconfigure a new machine. The cluster definition produced in our CDRP is an integral part of automating cluster deployment in a cloud environment. Our future cloud efforts will further build on this work.

  4. A Case-Control Study of Molecular Epidemiology in Relation to Azithromycin Resistance in Neisseria gonorrhoeae Isolates Collected in Amsterdam, the Netherlands, between 2008 and 2015

    PubMed Central

    Wind, Carolien M.; Bruisten, Sylvia M.; Schim van der Loeff, Maarten F.; Dierdorp, Mirjam; de Vries, Henry J. C.

    2017-01-01

    ABSTRACT Neisseria gonorrhoeae resistance to ceftriaxone and azithromycin is increasing, which threatens the recommended dual therapy. We used molecular epidemiology to identify N. gonorrhoeae clusters and associations with azithromycin resistance in Amsterdam, the Netherlands. N. gonorrhoeae isolates (n = 143) were selected from patients visiting the Amsterdam STI Outpatient Clinic from January 2008 through September 2015. We included all 69 azithromycin-resistant isolates (MIC ≥ 2.0 mg/liter) and 74 frequency-matched susceptible controls (MIC ≤ 0.25 mg/liter). The methods used were 23S rRNA and mtrR sequencing, N. gonorrhoeae multiantigen sequence typing (NG-MAST), N. gonorrhoeae multilocus variable-number tandem-repeat analysis (NG-MLVA), and a specific PCR to detect mosaic penA genes. A hierarchical cluster analysis of NG-MLVA related to resistance and epidemiological characteristics was performed. Azithromycin-resistant isolates had C2611T mutations in 23S rRNA (n = 62, 89.9%, P < 0.001) and were NG-MAST genogroup G2992 (P < 0.001), G5108 (P < 0.001), or G359 (P = 0.02) significantly more often than susceptible isolates and were more often part of NG-MLVA clusters (P < 0.001). Two resistant isolates (2.9%) had A2059G mutations, and five (7.3%) had wild-type 23S rRNA. No association between mtrR mutations and azithromycin resistance was found. Twenty-four isolates, including 10 azithromycin-resistant isolates, showed reduced susceptibility to extended-spectrum cephalosporins. Of these, five contained a penA mosaic gene. Four of the five NG-MLVA clusters contained resistant and susceptible isolates. Two clusters consisting mainly of resistant isolates included strains from men who have sex with men and from heterosexual males and females. The co-occurrence of resistant and susceptible strains in NG-MLVA clusters and the frequent occurrence of resistant strains outside of clusters suggest that azithromycin resistance develops independently from the background genome. PMID:28373191

  5. Sunyaev-Zel'dovich Effect and X-ray Scaling Relations from Weak-Lensing Mass Calibration of 32 SPT Selected Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dietrich, J.P.; et al.

    Uncertainty in the mass-observable scaling relations is currently the limiting factor for galaxy cluster based cosmology. Weak gravitational lensing can provide a direct mass calibration and reduce the mass uncertainty. We present new ground-based weak lensing observations of 19 South Pole Telescope (SPT) selected clusters and combine them with previously reported space-based observations of 13 galaxy clusters to constrain the cluster mass scaling relations with the Sunyaev-Zel'dovich effect (SZE), the cluster gas massmore » $$M_\\mathrm{gas}$$, and $$Y_\\mathrm{X}$$, the product of $$M_\\mathrm{gas}$$ and X-ray temperature. We extend a previously used framework for the analysis of scaling relations and cosmological constraints obtained from SPT-selected clusters to make use of weak lensing information. We introduce a new approach to estimate the effective average redshift distribution of background galaxies and quantify a number of systematic errors affecting the weak lensing modelling. These errors include a calibration of the bias incurred by fitting a Navarro-Frenk-White profile to the reduced shear using $N$-body simulations. We blind the analysis to avoid confirmation bias. We are able to limit the systematic uncertainties to 6.4% in cluster mass (68% confidence). Our constraints on the mass-X-ray observable scaling relations parameters are consistent with those obtained by earlier studies, and our constraints for the mass-SZE scaling relation are consistent with the the simulation-based prior used in the most recent SPT-SZ cosmology analysis. We can now replace the external mass calibration priors used in previous SPT-SZ cosmology studies with a direct, internal calibration obtained on the same clusters.« less

  6. Clustering P-Wave Receiver Functions To Constrain Subsurface Seismic Structure

    NASA Astrophysics Data System (ADS)

    Chai, C.; Larmat, C. S.; Maceira, M.; Ammon, C. J.; He, R.; Zhang, H.

    2017-12-01

    The acquisition of high-quality data from permanent and temporary dense seismic networks provides the opportunity to apply statistical and machine learning techniques to a broad range of geophysical observations. Lekic and Romanowicz (2011) used clustering analysis on tomographic velocity models of the western United States to perform tectonic regionalization and the velocity-profile clusters agree well with known geomorphic provinces. A complementary and somewhat less restrictive approach is to apply cluster analysis directly to geophysical observations. In this presentation, we apply clustering analysis to teleseismic P-wave receiver functions (RFs) continuing efforts of Larmat et al. (2015) and Maceira et al. (2015). These earlier studies validated the approach with surface waves and stacked EARS RFs from the USArray stations. In this study, we experiment with both the K-means and hierarchical clustering algorithms. We also test different distance metrics defined in the vector space of RFs following Lekic and Romanowicz (2011). We cluster data from two distinct data sets. The first, corresponding to the western US, was by smoothing/interpolation of receiver-function wavefield (Chai et al. 2015). Spatial coherence and agreement with geologic region increase with this simpler, spatially smoothed set of observations. The second data set is composed of RFs for more than 800 stations of the China Digital Seismic Network (CSN). Preliminary results show a first order agreement between clusters and tectonic region and each region cluster includes a distinct Ps arrival, which probably reflects differences in crustal thickness. Regionalization remains an important step to characterize a model prior to application of full waveform and/or stochastic imaging techniques because of the computational expense of these types of studies. Machine learning techniques can provide valuable information that can be used to design and characterize formal geophysical inversion, providing information on spatial variability in the subsurface geology.

  7. The JCMT Gould Belt Survey: Dense Core Clusters in Orion B

    NASA Astrophysics Data System (ADS)

    Kirk, H.; Johnstone, D.; Di Francesco, J.; Lane, J.; Buckle, J.; Berry, D. S.; Broekhoven-Fiene, H.; Currie, M. J.; Fich, M.; Hatchell, J.; Jenness, T.; Mottram, J. C.; Nutter, D.; Pattle, K.; Pineda, J. E.; Quinn, C.; Salji, C.; Tisi, S.; Hogerheijde, M. R.; Ward-Thompson, D.; The JCMT Gould Belt Survey Team

    2016-04-01

    The James Clerk Maxwell Telescope Gould Belt Legacy Survey obtained SCUBA-2 observations of dense cores within three sub-regions of Orion B: LDN 1622, NGC 2023/2024, and NGC 2068/2071, all of which contain clusters of cores. We present an analysis of the clustering properties of these cores, including the two-point correlation function and Cartwright’s Q parameter. We identify individual clusters of dense cores across all three regions using a minimal spanning tree technique, and find that in each cluster, the most massive cores tend to be centrally located. We also apply the independent M-Σ technique and find a strong correlation between core mass and the local surface density of cores. These two lines of evidence jointly suggest that some amount of mass segregation in clusters has happened already at the dense core stage.

  8. Maternal-child overweight/obesity and undernutrition in Kenya: a geographic analysis.

    PubMed

    Pawloski, Lisa R; Curtin, Kevin M; Gewa, Constance; Attaway, David

    2012-11-01

    The purpose of the study was to examine geographic relationships of nutritional status (BMI), including underweight, overweight and obesity, among Kenyan mothers and children. Spatial relationships were examined concerning BMI of the mothers and BMI-for-age percentiles of their children. These included spatial statistical measures of the clustering of segments of the population, in addition to inspection of co-location of significant clusters. Rural and urban areas of Kenya, including the cities of Nairobi and Mombasa, and the Kisumu region. Mother-child pairs from Demographic and Health Survey data including 1541 observations in 2003 and 1592 observations in 2009. These mother-child pairs were organized into 399 locational clusters. There is extremely strong evidence that high BMI values exhibit strong spatial clustering. There were co-locations of overweight mothers and overweight children only in the Nairobi region, while both underweight mothers and children tended to cluster in rural areas. In Mombasa clusters of overweight mothers were associated with normal-weight children, while in the Kisumu region clusters of overweight children were associated with normal-weight mothers. These findings show there is geographic variability as well as some defined patterns concerning the distribution of malnutrition among mothers and children in Kenya, and suggest the need for further geographic analyses concerning the potential factors which influence nutritional status in this population. In addition, the methods used in this research may be easily applied to other Demographic and Health Survey data in order to begin to understand the geographic determinants of health in low-income countries.

  9. Spatio-temporal analysis of wildfire ignitions in the St. Johns River Water Management District, Florida

    Treesearch

    Marc G. Genton; David T. Butry; Marcia L. Gumpertz; Jeffrey P. Prestemon

    2006-01-01

    We analyse the spatio-temporal structure of wildfire ignitions in the St. Johns River Water Management District in north-eastern Florida. We show, using tools to analyse point patterns (e.g. the L-function), that wildfire events occur in clusters. Clustering of these events correlates with irregular distribution of fire ignitions, including lightning...

  10. Genome Sequences of Three Cluster AU Arthrobacter Phages, Caterpillar, Nightmare, and Teacup

    PubMed Central

    Adair, Tamarah L.; Stowe, Emily; Pizzorno, Marie C.; Krukonis, Gregory; Harrison, Melinda; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah

    2017-01-01

    ABSTRACT Caterpillar, Nightmare, and Teacup are cluster AU siphoviral phages isolated from enriched soil on Arthrobacter sp. strain ATCC 21022. These genomes are 58 kbp long with an average G+C content of 50%. Sequence analysis predicts 86 to 92 protein-coding genes, including a large number of small proteins with predicted transmembrane domains. PMID:29122860

  11. A phylogenetic study of ubiquinone-7 species of the genus Candida based on 18S ribosomal DNA sequence divergence.

    PubMed

    Suzuki, Motofumi; Nakase, Takashi

    2002-02-01

    To clarify phylogenetic relationships among ubiquinone 7 (Q7)-forming species of the genus Candida, we analyzed the nearly complete sequences of 18S ribosomal RNA genes (18S rDNAs) from fifty strains (including 46 type strains) of Candida species, and from 8 type strains of species/varieties of the genera Issatchenkia, Pichia and Saturnispora. Q7-forming Candida species were divided into three major groups (Group I, II, and III) and were phylogenetically distant from a group that includes the type species of the genus Candida. Group I included four clusters with basal branches that were weakly supported. The first cluster comprised C. vartiovaarae, C. maritima, C. utilis, C. freyschussii, C. odintsovae, C. melinii, C. quercuum, Williopsis saturnus var. saturnus, and W. mucosa. The second cluster comprised C. norvegica, C. montana, C. stellimalicola, C. solani, C. berthetii, and C. dendrica. Williopsis pratensis, W. californica, Pichia opuntiae and 2 related species, P. amethionina (two varieties), and P. caribaea were also included in this cluster. The third cluster comprised C. pelliculosa (anamorph of P. anomala), C. nitrativorans, and C. silvicultrix. The fourth cluster comprised C. wickerhamii and C. peltata, which were placed in the P. holstii - C. ernobii clade with Q8-containing species. Group II comprised C. pignaliae, C. nemodendra, C. methanolovescens, C. maris, C. sonorensis, C. pini, C. llanquihuensis, C. cariosilignicola, C. ovalis, C. succiphila (including its two synonyms), C. methanosorbosa, C. nitratophila, C. nanaspora, C. boidinii (including its two synonyms), W. salicorniae, and P. methanolica. Group III was composed of four clusters with strong bootstrap support. The first cluster comprised C. valida (anamorph of P. membranifaciens), C. ethanolica, C. pseudolambica, C. citrea, C. inconspicua, C. norvegensis, C. rugopelliculosa, and C. lambica. Three species and two varieties of the genus Issatchenkia were also included in this cluster. The second cluster comprised C. diversa, C. silvae, 4 Saturnispora species, and P. besseyi. The third comprised C. sorboxylosa, and the fourth comprised C. vini. Based on this 18S rDNA sequence analysis, it is evident that Q7-forming Candida species and the genera Pichia and Williopsis are polyphyletic. The genus Issatchenkia is suggested to be congeneric with the genus Pichia. The genus Saturnispora is phylogenetically definable.

  12. Untangling Magmatic Processes and Hydrothermal Alteration of in situ Superfast Spreading Ocean Crust at ODP/IODP Site 1256 with Fuzzy c-means Cluster Analysis of Rock Magnetic Properties

    NASA Astrophysics Data System (ADS)

    Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.

    2014-12-01

    Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.

  13. Identification of atypical flight patterns

    NASA Technical Reports Server (NTRS)

    Statler, Irving C. (Inventor); Ferryman, Thomas A. (Inventor); Amidan, Brett G. (Inventor); Whitney, Paul D. (Inventor); White, Amanda M. (Inventor); Willse, Alan R. (Inventor); Cooley, Scott K. (Inventor); Jay, Joseph Griffith (Inventor); Lawrence, Robert E. (Inventor); Mosbrucker, Chris (Inventor)

    2005-01-01

    Method and system for analyzing aircraft data, including multiple selected flight parameters for a selected phase of a selected flight, and for determining when the selected phase of the selected flight is atypical, when compared with corresponding data for the same phase for other similar flights. A flight signature is computed using continuous-valued and discrete-valued flight parameters for the selected flight parameters and is optionally compared with a statistical distribution of other observed flight signatures, yielding atypicality scores for the same phase for other similar flights. A cluster analysis is optionally applied to the flight signatures to define an optimal collection of clusters. A level of atypicality for a selected flight is estimated, based upon an index associated with the cluster analysis.

  14. Global optimization of small bimetallic Pd-Co binary nanoalloy clusters: a genetic algorithm approach at the DFT level.

    PubMed

    Aslan, Mikail; Davis, Jack B A; Johnston, Roy L

    2016-03-07

    The global optimisation of small bimetallic PdCo binary nanoalloys are systematically investigated using the Birmingham Cluster Genetic Algorithm (BCGA). The effect of size and composition on the structures, stability, magnetic and electronic properties including the binding energies, second finite difference energies and mixing energies of Pd-Co binary nanoalloys are discussed. A detailed analysis of Pd-Co structural motifs and segregation effects is also presented. The maximal mixing energy corresponds to Pd atom compositions for which the number of mixed Pd-Co bonds is maximised. Global minimum clusters are distinguished from transition states by vibrational frequency analysis. HOMO-LUMO gap, electric dipole moment and vibrational frequency analyses are made to enable correlation with future experiments.

  15. [High risk groups in health behavior defined by clustering of smoking, alcohol, and exercise habits: National Heath and Nutrition Examination Survey].

    PubMed

    Kang, Kiwon; Sung, Joohon; Kim, Chang Yup

    2010-01-01

    We investigated the clustering of selected lifestyle factors (cigarette smoking, heavy alcohol consumption, lack of physical exercise) and identified the population characteristics associated with increasing lifestyle risks. Data on lifestyle risk factors, sociodemographic characteristics, and history of chronic diseases were obtained from 7,694 individuals >/=20 years of age who participated in the 2005 Korea National Health and Nutrition Examination Survey (KNHANES). Clustering of lifestyle risks involved the observed prevalence of multiple risks and those expected from marginal exposure prevalence of the three selected risk factors. Prevalence odds ratio was adopted as a measurement of clustering. Multiple correspondence analysis, Kendall tau correlation, Man-Whitney analysis, and ordinal logistic regression analysis were conducted to identify variables increasing lifestyle risks. In both men and women, increased lifestyle risks were associated with clustering of: (1) cigarette smoking and excessive alcohol consumption, and (2) smoking, excessive alcohol consumption, and lack of physical exercise. Patterns of clustering for physical exercise were different from those for cigarette smoking and alcohol consumption. The increased unhealthy clustering was found among men 20-64 years of age with mild or moderate stress, and among women 35-49 years of age who were never-married, with mild stress, and increased body mass index (>30 kg/m(2)). Addressing a lack of physical exercise considering individual characteristics including gender, age, employment activity, and stress levels should be a focus of health promotion efforts.

  16. KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

    PubMed

    Laetsch, Dominik R; Blaxter, Mark L

    2017-10-05

    The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.

  17. RELICS: Strong-lensing Analysis of the Massive Clusters MACS J0308.9+2645 and PLCK G171.9‑40.7

    NASA Astrophysics Data System (ADS)

    Acebron, Ana; Cibirka, Nathália; Zitrin, Adi; Coe, Dan; Agulli, Irene; Sharon, Keren; Bradač, Maruša; Frye, Brenda; Livermore, Rachael C.; Mahler, Guillaume; Salmon, Brett; Umetsu, Keiichi; Bradley, Larry; Andrade-Santos, Felipe; Avila, Roberto; Carrasco, Daniela; Cerny, Catherine; Czakon, Nicole G.; Dawson, William A.; Hoag, Austin T.; Huang, Kuang-Han; Johnson, Traci L.; Jones, Christine; Kikuchihara, Shotaro; Lam, Daniel; Lovisari, Lorenzo; Mainali, Ramesh; Oesch, Pascal A.; Ogaz, Sara; Ouchi, Masami; Past, Matthew; Paterno-Mahler, Rachel; Peterson, Avery; Ryan, Russell E.; Sendra-Server, Irene; Stark, Daniel P.; Strait, Victoria; Toft, Sune; Trenti, Michele; Vulcani, Benedetta

    2018-05-01

    Strong gravitational lensing by galaxy clusters has become a powerful tool for probing the high-redshift universe, magnifying distant and faint background galaxies. Reliable strong-lensing (SL) models are crucial for determining the intrinsic properties of distant, magnified sources and for constructing their luminosity function. We present here the first SL analysis of MACS J0308.9+2645 and PLCK G171.9‑40.7, two massive galaxy clusters imaged with the Hubble Space Telescope, in the framework of the Reionization Lensing Cluster Survey (RELICS). We use the light-traces-mass modeling technique to uncover sets of multiply imaged galaxies and constrain the mass distribution of the clusters. Our SL analysis reveals that both clusters have particularly large Einstein radii (θ E > 30″ for a source redshift of z s = 2), providing fairly large areas with high magnifications, useful for high-redshift galaxy searches (∼2 arcmin2 with μ > 5 to ∼1 arcmin2 with μ > 10, similar to a typical Hubble Frontier Fields cluster). We also find that MACS J0308.9+2645 hosts a promising, apparently bright (J ∼ 23.2–24.6 AB), multiply imaged high-redshift candidate at z ∼ 6.4. These images are among the brightest high-redshift candidates found in RELICS. Our mass models, including magnification maps, are made publicly available for the community through the Mikulski Archive for Space Telescopes.

  18. Identification of five clusters of comorbidities in a longitudinal Japanese chronic obstructive pulmonary disease cohort.

    PubMed

    Chubachi, Shotaro; Sato, Minako; Kameyama, Naofumi; Tsutsumi, Akihiro; Sasaki, Mamoru; Tateno, Hiroki; Nakamura, Hidetoshi; Asano, Koichiro; Betsuyaku, Tomoko

    2016-08-01

    Patients with chronic obstructive pulmonary disease (COPD) frequently suffer from various comorbidities. Recently, cluster analysis has been proposed to examine the phenotypic heterogeneity in COPD. In order to comprehensively understand the comorbidities of COPD in Japan, we conducted multicenter, longitudinal cohort study, called the Keio COPD Comorbidity Research (K-CCR). In this cohort, comorbid diagnoses were established by both objective examination and review of clinical records, in addition to self-report. We aimed to investigate the clustering of nineteen clinically relevant comorbidities and the meaningful outcomes of the clusters over a two-year follow-up period. The present study analyzed data from COPD patients whose data of comorbidities were completed (n = 311). Cluster analysis was performed using Ward's minimum-variance method. Five comorbidity clusters were identified: less comorbidity; malignancy; metabolic and cardiovascular; gastroesophageal reflux disease (GERD) and psychological; and underweight and anemic. FEV1 did not differ among the clusters. GERD and psychological cluster had worse COPD assessment test (CAT) and Saint George's respiratory questionnaire (SGRQ) at baseline compared to the other clusters (CAT: p = 0.0003 and SGRQ: p = 0.00046). The rate of change in these scores did not differ within 2 years. The underweight and anemic cluster included subjects with lower baseline ratio of predicted diffusing capacity (DLco/VA) compared to the malignancy cluster (p = 0.036). Five clusters of comorbidities were identified in Japanese COPD patients. The clinical characteristics and health-related quality of life were different among these clusters during a follow-up of two years. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. The WAGGS project - I. The WiFeS Atlas of Galactic Globular cluster Spectra

    NASA Astrophysics Data System (ADS)

    Usher, Christopher; Pastorello, Nicola; Bellstedt, Sabine; Alabi, Adebusola; Cerulo, Pierluigi; Chevalier, Leonie; Fraser-McKelvie, Amelia; Penny, Samantha; Foster, Caroline; McDermid, Richard M.; Schiavon, Ricardo P.; Villaume, Alexa

    2017-07-01

    We present the WiFeS Atlas of Galactic Globular cluster Spectra, a library of integrated spectra of Milky Way and Local Group globular clusters. We used the WiFeS integral field spectrograph on the Australian National University 2.3 m telescope to observe the central regions of 64 Milky Way globular clusters and 22 globular clusters hosted by the Milky Way's low-mass satellite galaxies. The spectra have wider wavelength coverage (3300-9050 Å) and higher spectral resolution (R = 6800) than existing spectral libraries of Milky Way globular clusters. By including Large and Small Magellanic Cloud star clusters, we extend the coverage of parameter space of existing libraries towards young and intermediate ages. While testing stellar population synthesis models and analysis techniques is the main aim of this library, the observations may also further our understanding of the stellar populations of Local Group globular clusters and make possible the direct comparison of extragalactic globular cluster integrated light observations with well-understood globular clusters in the Milky Way. The integrated spectra are publicly available via the project website.

  20. Demographic characterization and spatial cluster analysis of human Salmonella 1,4,[5],12:i:- infections in Portugal: A 10year study.

    PubMed

    Seixas, R; Nunes, T; Machado, J; Tavares, L; Owen, S P; Bernardo, F; Oliveira, M

    Salmonella 1,4,[5],12:i:- is presently considered one of the major serovars responsible for human salmonellosis worldwide. Due to its recent emergence, studies assessing the demographic characterization and spatial epidemiology of salmonellosis 1,4,[5],12:i:- at local- or country-level are lacking. In this study, a analysis was conducted over a 10year period, from 2000 to the first quarter of 2011 at the Portuguese National Laboratory in Portugal mainland, with a total of 215 Salmonella 1,4,[5],12:i:- serotyped isolates obtained from human infections by a passive surveillance system. Data regarding source, year and month of sampling, gender, age, district and municipality of the patients were registered. Descriptive statistical analysis and a spatial scan statistic combined with a geographic information system were employed to characterize the epidemiology and identify spatial clusters. Results showed that most districts have reports of Salmonella 1,4,[5],12:i:-, with a higher number of cases at the Portuguese coastland, including districts like Porto (n=60, 27.9%), Lisboa (n=29, 13.5%) and Aveiro (n=28, 13.0%). An increased incidence was observed in the period from 2004 to 2011 and most infections occurred during May and October. Spatial analysis revealed 4 clusters of higher than expected infection rates. Three were located in the north of Portugal, including two at the coastland (Cluster 1 [RR=3.58, p≤0.001] and 4 [RR=10.42 p≤0.230]), and one at the countryside (Cluster 3 [RR=17.76, p≤0.001]). A larger cluster was detected involving the center and south of Portugal (Cluster 2 [RR=4.85, p≤0.001]). The present study was elaborated with data provided by a passive surveillance system, which may originate an underestimation of disease burden. However, this is the first report describing the incidence and the distribution of areas with higher risk of infection in Portugal, revealing that Salmonella 1,4,[5],12:i:- displayed a significant geographic clustering and these areas should be further evaluated to identify risk factors in order to establish prevention programs. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  1. A measurement of CMB cluster lensing with SPT and DES year 1 data

    NASA Astrophysics Data System (ADS)

    Baxter, E. J.; Raghunathan, S.; Crawford, T. M.; Fosalba, P.; Hou, Z.; Holder, G. P.; Omori, Y.; Patil, S.; Rozo, E.; Abbott, T. M. C.; Annis, J.; Aylor, K.; Benoit-Lévy, A.; Benson, B. A.; Bertin, E.; Bleem, L.; Buckley-Geer, E.; Burke, D. L.; Carlstrom, J.; Carnero Rosell, A.; Carrasco Kind, M.; Carretero, J.; Chang, C. L.; Cho, H.-M.; Crites, A. T.; Crocce, M.; Cunha, C. E.; da Costa, L. N.; D'Andrea, C. B.; Davis, C.; de Haan, T.; Desai, S.; Dietrich, J. P.; Dobbs, M. A.; Dodelson, S.; Doel, P.; Drlica-Wagner, A.; Estrada, J.; Everett, W. B.; Fausti Neto, A.; Flaugher, B.; Frieman, J.; García-Bellido, J.; George, E. M.; Gaztanaga, E.; Giannantonio, T.; Gruen, D.; Gruendl, R. A.; Gschwend, J.; Gutierrez, G.; Halverson, N. W.; Harrington, N. L.; Hartley, W. G.; Holzapfel, W. L.; Honscheid, K.; Hrubes, J. D.; Jain, B.; James, D. J.; Jarvis, M.; Jeltema, T.; Knox, L.; Krause, E.; Kuehn, K.; Kuhlmann, S.; Kuropatkin, N.; Lahav, O.; Lee, A. T.; Leitch, E. M.; Li, T. S.; Lima, M.; Luong-Van, D.; Manzotti, A.; March, M.; Marrone, D. P.; Marshall, J. L.; Martini, P.; McMahon, J. J.; Melchior, P.; Menanteau, F.; Meyer, S. S.; Miller, C. J.; Miquel, R.; Mocanu, L. M.; Mohr, J. J.; Natoli, T.; Nord, B.; Ogando, R. L. C.; Padin, S.; Plazas, A. A.; Pryke, C.; Rapetti, D.; Reichardt, C. L.; Romer, A. K.; Roodman, A.; Ruhl, J. E.; Rykoff, E.; Sako, M.; Sanchez, E.; Sayre, J. T.; Scarpine, V.; Schaffer, K. K.; Schindler, R.; Schubnell, M.; Sevilla-Noarbe, I.; Shirokoff, E.; Smith, M.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Staniszewski, Z.; Stark, A.; Story, K.; Suchyta, E.; Tarle, G.; Thomas, D.; Troxel, M. A.; Vanderlinde, K.; Vieira, J. D.; Walker, A. R.; Williamson, R.; Zhang, Y.; Zuntz, J.

    2018-05-01

    Clusters of galaxies gravitationally lens the cosmic microwave background (CMB) radiation, resulting in a distinct imprint in the CMB on arcminute scales. Measurement of this effect offers a promising way to constrain the masses of galaxy clusters, particularly those at high redshift. We use CMB maps from the South Pole Telescope Sunyaev-Zel'dovich (SZ) survey to measure the CMB lensing signal around galaxy clusters identified in optical imaging from first year observations of the Dark Energy Survey. The cluster catalogue used in this analysis contains 3697 members with mean redshift of \\bar{z} = 0.45. We detect lensing of the CMB by the galaxy clusters at 8.1σ significance. Using the measured lensing signal, we constrain the amplitude of the relation between cluster mass and optical richness to roughly 17 {per cent} precision, finding good agreement with recent constraints obtained with galaxy lensing. The error budget is dominated by statistical noise but includes significant contributions from systematic biases due to the thermal SZ effect and cluster miscentring.

  2. Voxel-based statistical analysis of cerebral blood flow using Tc-99m ECD brain SPECT in patients with traumatic brain injury: group and individual analyses.

    PubMed

    Shin, Yong Beom; Kim, Seong-Jang; Kim, In-Ju; Kim, Yong-Ki; Kim, Dong-Soo; Park, Jae Heung; Yeom, Seok-Ran

    2006-06-01

    Statistical parametric mapping (SPM) was applied to brain perfusion single photon emission computed tomography (SPECT) images in patients with traumatic brain injury (TBI) to investigate regional cerebral abnormalities compared to age-matched normal controls. Thirteen patients with TBI underwent brain perfusion SPECT were included in this study (10 males, three females, mean age 39.8 +/- 18.2, range 21 - 74). SPM2 software implemented in MATLAB 5.3 was used for spatial pre-processing and analysis and to determine the quantitative differences between TBI patients and age-matched normal controls. Three large voxel clusters of significantly decreased cerebral blood perfusion were found in patients with TBI. The largest clusters were area including medial frontal gyrus (voxel number 3642, peak Z-value = 4.31, 4.27, p = 0.000) in both hemispheres. The second largest clusters were areas including cingulated gyrus and anterior cingulate gyrus of left hemisphere (voxel number 381, peak Z-value = 3.67, 3.62, p = 0.000). Other clusters were parahippocampal gyrus (voxel number 173, peak Z-value = 3.40, p = 0.000) and hippocampus (voxel number 173, peak Z-value = 3.23, p = 0.001) in the left hemisphere. The false discovery rate (FDR) was less than 0.04. From this study, group and individual analyses of SPM2 could clearly identify the perfusion abnormalities of brain SPECT in patients with TBI. Group analysis of SPM2 showed hypoperfusion pattern in the areas including medial frontal gyrus of both hemispheres, cingulate gyrus, anterior cingulate gyrus, parahippocampal gyrus and hippocampus in the left hemisphere compared to age-matched normal controls. Also, left parahippocampal gyrus and left hippocampus were additional hypoperfusion areas. However, these findings deserve further investigation on a larger number of patients to be performed to allow a better validation of objective SPM analysis in patients with TBI.

  3. Spatial characterization of dissolved trace elements and heavy metals in the upper Han River (China) using multivariate statistical techniques.

    PubMed

    Li, Siyue; Zhang, Quanfa

    2010-04-15

    A data matrix (4032 observations), obtained during a 2-year monitoring period (2005-2006) from 42 sites in the upper Han River is subjected to various multivariate statistical techniques including cluster analysis, principal component analysis (PCA), factor analysis (FA), correlation analysis and analysis of variance to determine the spatial characterization of dissolved trace elements and heavy metals. Our results indicate that waters in the upper Han River are primarily polluted by Al, As, Cd, Pb, Sb and Se, and the potential pollutants include Ba, Cr, Hg, Mn and Ni. Spatial distribution of trace metals indicates the polluted sections mainly concentrate in the Danjiang, Danjiangkou Reservoir catchment and Hanzhong Plain, and the most contaminated river is in the Hanzhong Plain. Q-model clustering depends on geographical location of sampling sites and groups the 42 sampling sites into four clusters, i.e., Danjiang, Danjiangkou Reservoir region (lower catchment), upper catchment and one river in headwaters pertaining to water quality. The headwaters, Danjiang and lower catchment, and upper catchment correspond to very high polluted, moderate polluted and relatively low polluted regions, respectively. Additionally, PCA/FA and correlation analysis demonstrates that Al, Cd, Mn, Ni, Fe, Si and Sr are controlled by natural sources, whereas the other metals appear to be primarily controlled by anthropogenic origins though geogenic source contributing to them. 2009 Elsevier B.V. All rights reserved.

  4. A new artefacts resistant method for automatic lineament extraction using Multi-Hillshade Hierarchic Clustering (MHHC)

    NASA Astrophysics Data System (ADS)

    Šilhavý, Jakub; Minár, Jozef; Mentlík, Pavel; Sládek, Ján

    2016-07-01

    This paper presents a new method of automatic lineament extraction which includes the removal of the 'artefacts effect' which is associated with the process of raster based analysis. The core of the proposed Multi-Hillshade Hierarchic Clustering (MHHC) method incorporates a set of variously illuminated and rotated hillshades in combination with hierarchic clustering of derived 'protolineaments'. The algorithm also includes classification into positive and negative lineaments. MHHC was tested in two different territories in Bohemian Forest and Central Western Carpathians. The original vector-based algorithm was developed for comparison of the individual lineaments proximity. Its use confirms the compatibility of manual and automatic extraction and their similar relationships to structural data in the study areas.

  5. Using Cluster Analysis to Examine Husband-Wife Decision Making

    ERIC Educational Resources Information Center

    Bonds-Raacke, Jennifer M.

    2006-01-01

    Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…

  6. Fast clustering using adaptive density peak detection.

    PubMed

    Wang, Xiao-Feng; Xu, Yifan

    2017-12-01

    Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

  7. Discrete Wavelet Transform-Based Whole-Spectral and Subspectral Analysis for Improved Brain Tumor Clustering Using Single Voxel MR Spectroscopy.

    PubMed

    Yang, Guang; Nawaz, Tahir; Barrick, Thomas R; Howe, Franklyn A; Slabaugh, Greg

    2015-12-01

    Many approaches have been considered for automatic grading of brain tumors by means of pattern recognition with magnetic resonance spectroscopy (MRS). Providing an improved technique which can assist clinicians in accurately identifying brain tumor grades is our main objective. The proposed technique, which is based on the discrete wavelet transform (DWT) of whole-spectral or subspectral information of key metabolites, combined with unsupervised learning, inspects the separability of the extracted wavelet features from the MRS signal to aid the clustering. In total, we included 134 short echo time single voxel MRS spectra (SV MRS) in our study that cover normal controls, low grade and high grade tumors. The combination of DWT-based whole-spectral or subspectral analysis and unsupervised clustering achieved an overall clustering accuracy of 94.8% and a balanced error rate of 7.8%. To the best of our knowledge, it is the first study using DWT combined with unsupervised learning to cluster brain SV MRS. Instead of dimensionality reduction on SV MRS or feature selection using model fitting, our study provides an alternative method of extracting features to obtain promising clustering results.

  8. Cluster analysis identifies three urodynamic patterns in patients with orthotopic neobladder reconstruction.

    PubMed

    Kim, Kwang Hyun; Yoon, Hyun Suk; Song, Wan; Choo, Hee Jung; Yoon, Hana; Chung, Woo Sik; Sim, Bong Suk; Lee, Dong Hyeon

    2017-01-01

    To classify patients with orthotopic neobladder based on urodynamic parameters using cluster analysis and to characterize the voiding function of each group. From January 2012 to November 2015, 142 patients with bladder cancer underwent radical cystectomy and Studer neobladder reconstruction at our institute. Of the 142 patients, 103 with complete urodynamic data and information on urinary functional outcomes were included in this study. K-means clustering was performed with urodynamic parameters which included maximal cystometric capacity, residual volume, maximal flow rate, compliance, and detrusor pressure at maximum flow rate. Three groups emerged by cluster analysis. Urodynamic parameters and urinary function outcomes were compared between three groups. Group 1 (n = 44) had ideal urodynamic parameters with a mean maximal bladder capacity of 513.3 ml and mean residual urine volume of 33.1 ml. Group 2 (n = 42) was characterized by small bladder capacity with low compliance. Patients in group 2 had higher rates of daytime incontinence and nighttime incontinence than patients in group 1. Group 3 (n = 17) was characterized by large residual urine volume with high compliance. When we examined gender differences in urodynamics and functional outcomes, residual urine volume and the rate of daytime incontinence were only marginally significant. However, females were significantly more likely to belong to group 2 or 3 (P = 0.003). In multivariate analysis to identify factors associated with group 1 which has the most ideal urodynamic pattern, age (OR 0.95, P = 0.017) and male gender (OR 7.57, P = 0.003) were identified as significant factors. While patients with ileal neobladder present with various voiding symptoms, three urodynamic patterns were identified by cluster analysis. Approximately half of patients had ideal urodynamic parameters. The other two groups were characterized by large residual urine and small capacity bladder with low compliance. Young age and male gender appear to have a favorable impact on urodynamic and voiding outcomes in patients undergoing orthotopic neobladder reconstruction.

  9. Analysis of infant isolates of Bifidobacterium breve by comparative genome hybridization indicates the existence of new subspecies with marked infant specificity.

    PubMed

    Boesten, Rolf; Schuren, Frank; Wind, Richèle D; Knol, Jan; de Vos, Willem M

    2011-09-01

    A total of 20 Bifidobacterium strains were isolated from fecal samples of 4 breast- and bottle-fed infants and all were characterized as Bifidobacterium breve based on 16S rRNA gene sequence and metabolic analysis. These isolates were further characterized and compared to the type strains of B. breve and 7 other Bifidobacterium spp. by comparative genome hybridization. For this purpose, we constructed and used a DNA-based microarray containing over 2000 randomly cloned DNA fragments from B. breve type strain LMG13208. This molecular analysis revealed a high degree of genomic variation between the isolated strains and allowed the vast majority to be grouped into 4 clusters. One cluster contained a single isolate that was virtually indistinguishable from the B. breve type strain. The 3 other clusters included 19 B. breve strains that differed considerably from all type strains. Remarkably, each of the 4 clusters included strains that were isolated from a single infant, indicating that a niche adaptation may contribute to variation within the B. breve species. Based on genomic hybridization data, the new B. breve isolates were estimated to contain approximately 60-90% of the genes of the B. breve type strain, attesting to the existence of various subspecies within the species B. breve. Further bioinformatic analysis identified several hundred diagnostic clones specific to the genomic clustering of the B. breve isolates. Molecular analysis of representatives of these revealed that annotated genes from the conserved B. breve core encoded mainly housekeeping functions, while the strain-specific genes were predicted to code for functions related to life style, such as carbohydrate metabolism and transport. This is compatible with genetic adaptation of the strains to their niche, a combination of infants and diet. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  10. Emergence of clusters of CRF02_AG and B human immunodeficiency viral strains among men having sex with men exhibiting HIV primary infection in southeastern France.

    PubMed

    Tamalet, Catherine; Ravaux, Isabelle; Moreau, Jacques; Brégigeon, Sylvie; Tourres, Christian; Richet, Hervé; Abat, Cedric; Colson, Philippe

    2015-08-01

    The number of new HIV diagnoses is increasing in the western world and transmission clusters have been recently identified among men having sex with men despite Highly Active Antiretroviral Therapy efficacy. The objective of this study was to assess temporal trends, epidemiological, clinical and virological characteristics of primary HIV infections. A retrospective analysis of 79 patients presenting primary HIV infections from 2005 to 2012 was performed in Marseille University Hospitals, southeastern France. Clinical, epidemiological and immunovirological data including phylogeny based on the polymerase gene were collected. 65 males and 14 females were enrolled. The main transmission route was homosexual contact (60.8%). Patients were mostly infected with subtype B (73.4%) and CRF02_AG (21.5%) HIV-1 strains. An increase in the annual number of HIV seroconversions among new HIV diagnoses from 5% in 2005 to 11.2% in 2012 (P = 0.06) and of the proportion of CRF02_AG HIV strains among primary HIV infections in 2011-2012 as compared to 2005-2010 (P = 0.055) was observed. Phylogenetic analysis revealed four transmission clusters including three transmission clusters among men having sex with men: two large clusters of nine CRF02_AG, six B HIV strains; and one small cluster of three B HIV strains. Clusters involved more frequently men (P = 0.01) belonging to caucasian ethicity (P = 0.05), with a higher HIV RNA load at inclusion (P = 0.03). These data highlight the importance of improving epidemiological surveillance and of implementing suitable prevention strategies to control the spread of HIV transmission among men having sex with men. © 2015 Wiley Periodicals, Inc.

  11. Internet Gamblers Differ on Social Variables: A Latent Class Analysis.

    PubMed

    Khazaal, Yasser; Chatton, Anne; Achab, Sophia; Monney, Gregoire; Thorens, Gabriel; Dufour, Magali; Zullino, Daniele; Rothen, Stephane

    2017-09-01

    Online gambling has gained popularity in the last decade, leading to an important shift in how consumers engage in gambling and in the factors related to problem gambling and prevention. Indebtedness and loneliness have previously been associated with problem gambling. The current study aimed to characterize online gamblers in relation to indebtedness, loneliness, and several in-game social behaviors. The data set was obtained from 584 Internet gamblers recruited online through gambling websites and forums. Of these gamblers, 372 participants completed all study assessments and were included in the analyses. Questionnaires included those on sociodemographics and social variables (indebtedness, loneliness, in-game social behaviors), as well as the Gambling Motives Questionnaire, Gambling Related Cognitions Scale, Internet Addiction Test, Problem Gambling Severity Index, Short Depression-Happiness Scale, and UPPS-P Impulsive Behavior Scale. Social variables were explored with a latent class model. The clusters obtained were compared for psychological measures and three clusters were found: lonely indebted gamblers (cluster 1: 6.5%), not lonely not indebted gamblers (cluster 2: 75.4%), and not lonely indebted gamblers (cluster 3: 18%). Participants in clusters 1 and 3 (particularly in cluster 1) were at higher risk of problem gambling than were those in cluster 2. The three groups differed on most assessed variables, including the Problem Gambling Severity Index, the Short Depression-Happiness Scale, and the UPPS-P subscales (except the sensation seeking subscore). Results highlight significant between-group differences, suggesting that Internet gamblers are not a homogeneous group. Specific intervention strategies could be implemented for groups at risk.

  12. Network-constrained spatio-temporal clustering analysis of traffic collisions in Jianghan District of Wuhan, China

    PubMed Central

    Fan, Yaxin; Zhu, Xinyan; Guo, Wei; Guo, Tao

    2018-01-01

    The analysis of traffic collisions is essential for urban safety and the sustainable development of the urban environment. Reducing the road traffic injuries and the financial losses caused by collisions is the most important goal of traffic management. In addition, traffic collisions are a major cause of traffic congestion, which is a serious issue that affects everyone in the society. Therefore, traffic collision analysis is essential for all parties, including drivers, pedestrians, and traffic officers, to understand the road risks at a finer spatio-temporal scale. However, traffic collisions in the urban context are dynamic and complex. Thus, it is important to detect how the collision hotspots evolve over time through spatio-temporal clustering analysis. In addition, traffic collisions are not isolated events in space. The characteristics of the traffic collisions and their surrounding locations also present an influence of the clusters. This work tries to explore the spatio-temporal clustering patterns of traffic collisions by combining a set of network-constrained methods. These methods were tested using the traffic collision data in Jianghan District of Wuhan, China. The results demonstrated that these methods offer different perspectives of the spatio-temporal clustering patterns. The weighted network kernel density estimation provides an intuitive way to incorporate attribute information. The network cross K-function shows that there are varying clustering tendencies between traffic collisions and different types of POIs. The proposed network differential Local Moran’s I and network local indicators of mobility association provide straightforward and quantitative measures of the hotspot changes. This case study shows that these methods could help researchers, practitioners, and policy-makers to better understand the spatio-temporal clustering patterns of traffic collisions. PMID:29672551

  13. Paternal age related schizophrenia (PARS): Latent subgroups detected by k-means clustering analysis.

    PubMed

    Lee, Hyejoo; Malaspina, Dolores; Ahn, Hongshik; Perrin, Mary; Opler, Mark G; Kleinhaus, Karine; Harlap, Susan; Goetz, Raymond; Antonius, Daniel

    2011-05-01

    Paternal age related schizophrenia (PARS) has been proposed as a subgroup of schizophrenia with distinct etiology, pathophysiology and symptoms. This study uses a k-means clustering analysis approach to generate hypotheses about differences between PARS and other cases of schizophrenia. We studied PARS (operationally defined as not having any family history of schizophrenia among first and second-degree relatives and fathers' age at birth ≥ 35 years) in a series of schizophrenia cases recruited from a research unit. Data were available on demographic variables, symptoms (Positive and Negative Syndrome Scale; PANSS), cognitive tests (Wechsler Adult Intelligence Scale-Revised; WAIS-R) and olfaction (University of Pennsylvania Smell Identification Test; UPSIT). We conducted a series of k-means clustering analyses to identify clusters of cases containing high concentrations of PARS. Two analyses generated clusters with high concentrations of PARS cases. The first analysis (N=136; PARS=34) revealed a cluster containing 83% PARS cases, in which the patients showed a significant discrepancy between verbal and performance intelligence. The mean paternal and maternal ages were 41 and 33, respectively. The second analysis (N=123; PARS=30) revealed a cluster containing 71% PARS cases, of which 93% were females; the mean age of onset of psychosis, at 17.2, was significantly early. These results strengthen the evidence that PARS cases differ from other patients with schizophrenia. Hypothesis-generating findings suggest that features of PARS may include a discrepancy between verbal and performance intelligence, and in females, an early age of onset. These findings provide a rationale for separating these phenotypes from others in future clinical, genetic and pathophysiologic studies of schizophrenia and in considering responses to treatment. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. Copy number gain at 8q12.1-q22.1 is associated with a malignant tumor phenotype in salivary gland myoepitheliomas.

    PubMed

    Vékony, Hedy; Röser, Kerstin; Löning, Thomas; Ylstra, Bauke; Meijer, Gerrit A; van Wieringen, Wessel N; van de Wiel, Mark A; Carvalho, Beatriz; Kok, Klaas; Leemans, C René; van der Waal, Isaäc; Bloemena, Elisabeth

    2009-02-01

    Salivary gland myoepithelial tumors are relatively uncommon tumors with an unpredictable clinical course. More knowledge about their genetic profiles is necessary to identify novel predictors of disease. In this study, we subjected 27 primary tumors (15 myoepitheliomas and 12 myoepithelial carcinomas) to genome-wide microarray-based comparative genomic hybridization (array CGH). We set out to delineate known chromosomal aberrations in more detail and to unravel chromosomal differences between benign myoepitheliomas and myoepithelial carcinomas. Patterns of DNA copy number aberrations were analyzed by unsupervised hierarchical cluster analysis. Both benign and malignant tumors revealed a limited amount of chromosomal alterations (median of 5 and 7.5, respectively). In both tumor groups, high frequency gains (> or =20%) were found mainly at loci of growth factors and growth factor receptors (e.g., PDGF, FGF(R)s, and EGFR). In myoepitheliomas, high frequency losses (> or =20%) were detected at regions of proto-cadherins. Cluster analysis of the array CGH data identified three clusters. Differential copy numbers on chromosome arm 8q and chromosome 17 set the clusters apart. Cluster 1 contained a mixture of the two phenotypes (n = 10), cluster 2 included mostly benign tumors (n = 10), and cluster 3 only contained carcinomas (n = 7). Supervised analysis between malignant and benign tumors revealed a 36 Mbp-region at 8q being more frequently gained in malignant tumors (P = 0.007, FDR = 0.05). This is the first study investigating genomic differences between benign and malignant myoepithelial tumors of the salivary glands at a genomic level. Both unsupervised and supervised analysis of the genomic profiles revealed chromosome arm 8q to be involved in the malignant phenotype of salivary gland myoepitheliomas.

  15. Bruker Biotyper Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry System for Identification of Nocardia, Rhodococcus, Kocuria, Gordonia, Tsukamurella, and Listeria Species

    PubMed Central

    Lee, Tai-Fen; Du, Shin-Hei; Teng, Shih-Hua; Liao, Chun-Hsing; Sheng, Wang-Hui; Teng, Lee-Jene

    2014-01-01

    We evaluated whether the Bruker Biotyper matrix-associated laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) system provides accurate species-level identifications of 147 isolates of aerobically growing Gram-positive rods (GPRs). The bacterial isolates included Nocardia (n = 74), Listeria (n = 39), Kocuria (n = 15), Rhodococcus (n = 10), Gordonia (n = 7), and Tsukamurella (n = 2) species, which had all been identified by conventional methods, molecular methods, or both. In total, 89.7% of Listeria monocytogenes, 80% of Rhodococcus species, 26.7% of Kocuria species, and 14.9% of Nocardia species (n = 11, all N. nova and N. otitidiscaviarum) were correctly identified to the species level (score values, ≥2.0). A clustering analysis of spectra generated by the Bruker Biotyper identified six clusters of Nocardia species, i.e., cluster 1 (N. cyriacigeorgica), cluster 2 (N. brasiliensis), cluster 3 (N. farcinica), cluster 4 (N. puris), cluster 5 (N. asiatica), and cluster 6 (N. beijingensis), based on the six peaks generated by ClinProTools with the genetic algorithm, i.e., m/z 2,774.477 (cluster 1), m/z 5,389.792 (cluster 2), m/z 6,505.720 (cluster 3), m/z 5,428.795 (cluster 4), m/z 6,525.326 (cluster 5), and m/z 16,085.216 (cluster 6). Two clusters of L. monocytogenes spectra were also found according to the five peaks, i.e., m/z 5,594.85, m/z 6,184.39, and m/z 11,187.31, for cluster 1 (serotype 1/2a) and m/z 5,601.21 and m/z 11,199.33 for cluster 2 (serotypes 1/2b and 4b). The Bruker Biotyper system was unable to accurately identify Nocardia (except for N. nova and N. otitidiscaviarum), Tsukamurella, or Gordonia species. Continuous expansion of the MALDI-TOF MS databases to include more GPRs is necessary. PMID:24759706

  16. Multivariate analysis of molecular and morphological diversity in fig (Ficus carica L.)

    USDA-ARS?s Scientific Manuscript database

    Genetic polymorphism across 15 microsatellite loci among 194 fig accessions including Common, Smyrna, San Pedro, and Caprifig were analyzed using a cluster analysis (CA) and the principal components analysis (PCA). The collection was moderately variable with observed number of alleles per locus rang...

  17. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    PubMed

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  18. Validating clustering of molecular dynamics simulations using polymer models.

    PubMed

    Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn

    2011-11-14

    Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.

  19. Validating clustering of molecular dynamics simulations using polymer models

    PubMed Central

    2011-01-01

    Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218

  20. Spatial patterns in electoral wards with high lymphoma incidence in Yorkshire health region.

    PubMed Central

    Barnes, N.; Cartwright, R. A.; O'Brien, C.; Roberts, B.; Richards, I. D.; Bird, C. C.

    1987-01-01

    The possibilities of clustering between those electoral wards which display higher than expected incidences of cases of the lymphomas occurring between 1978 and 1982 are examined. Clusters are defined as being those wards with cases in excess (at a probability of less than 10%) which are geographically adjacent to each other. A separate analysis extends the definition of cluster to include high incidence wards that are adjacent or separated by one other ward. The results indicate that many high incidence lymphoma wards do occur close together and when computer simulations are used to compute expected results, many of the observed results are shown to be highly improbable both in the overall number of clustering wards and in the largest number of wards comprising a 'cluster'. PMID:3663469

  1. CLASH-VLT: DISSECTING THE FRONTIER FIELDS GALAXY CLUSTER MACS J0416.1-2403 WITH ∼800 SPECTRA OF MEMBER GALAXIES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balestra, I.; Sartoris, B.; Girardi, M.

    2016-06-01

    We present VIMOS-Very Large Telescope (VLT) spectroscopy of the Frontier Fields cluster MACS J0416.1-2403 ( z  = 0.397). Taken as part of the CLASH-VLT survey, the large spectroscopic campaign provided more than 4000 reliable redshifts over ∼600 arcmin{sup 2}, including ∼800 cluster member galaxies. The unprecedented sample of cluster members at this redshift allows us to perform a highly detailed dynamical and structural analysis of the cluster out to ∼2.2 r {sub 200} (∼4 Mpc). Our analysis of substructures reveals a complex system composed of a main massive cluster ( M {sub 200} ∼ 0.9 × 10{sup 15} M {sub ⊙} and σ{sub V,r200} ∼ 1000 km s{supmore » −1}) presenting two major features: (i) a bimodal velocity distribution, showing two central peaks separated by Δ V {sub rf} ∼ 1100 km s{sup −1} with comparable galaxy content and velocity dispersion, and (ii) a projected elongation of the main substructures along the NE–SW direction, with a prominent sub-clump ∼600 kpc SW of the center and an isolated BCG approximately halfway between the center and the SW clump. We also detect a low-mass structure at z  ∼ 0.390, ∼10′ south of the cluster center, projected at ∼3 Mpc, with a relative line-of-sight velocity of Δ V{sub rf} ∼ −1700 km s{sup −1}. The cluster mass profile that we obtain through our dynamical analysis deviates significantly from the “universal” NFW, being best fit by a Softened Isothermal Sphere model instead. The mass profile measured from the galaxy dynamics is found to be in relatively good agreement with those obtained from strong and weak lensing, as well as with that from the X-rays, despite the clearly unrelaxed nature of the cluster. Our results reveal an overall complex dynamical state of this massive cluster and support the hypothesis that the two main subclusters are being observed in a pre-collisional phase, in agreement with recent findings from radio and deep X-ray data. In this article, we also release the entire redshift catalog of 4386 sources in the field of this cluster, which includes 60 identified Chandra X-ray sources and 105 JVLA radio sources.« less

  2. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Pattern Activity Clustering and Evaluation (PACE)

    NASA Astrophysics Data System (ADS)

    Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna

    2012-06-01

    With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.

  4. A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition

    PubMed Central

    Austin, Elena; Coull, Brent A.; Zanobetti, Antonella; Koutrakis, Petros

    2013-01-01

    Background Heterogeneity in the response to PM2.5 is hypothesized to be related to differences in particle composition across monitoring sites which reflect differences in source types as well as climatic and topographic conditions impacting different geographic locations. Identifying spatial patterns in particle composition is a multivariate problem that requires novel methodologies. Objectives Use cluster analysis methods to identify spatial patterns in PM2.5 composition. Verify that the resulting clusters are distinct and informative. Methods 109 monitoring sites with 75% reported speciation data during the period 2003–2008 were selected. These sites were categorized based on their average PM2.5 composition over the study period using k-means cluster analysis. The obtained clusters were validated and characterized based on their physico-chemical characteristics, geographic locations, emissions profiles, population density and proximity to major emission sources. Results Overall 31 clusters were identified. These include 21 clusters with 2 or more sites which were further grouped into 4 main types using hierarchical clustering. The resulting groupings are chemically meaningful and represent broad differences in emissions. The remaining clusters, encompassing single sites, were characterized based on their particle composition and geographic location. Conclusions The framework presented here provides a novel tool which can be used to identify and further classify sites based on their PM2.5 composition. The solution presented is fairly robust and yielded groupings that were meaningful in the context of air-pollution research. PMID:23850585

  5. Retrospective space-time cluster analysis of whooping cough, re-emergence in Barcelona, Spain, 2000-2011.

    PubMed

    Solano, Rubén; Gómez-Barroso, Diana; Simón, Fernando; Lafuente, Sarah; Simón, Pere; Rius, Cristina; Gorrindo, Pilar; Toledo, Diana; Caylà, Joan A

    2014-05-01

    A retrospective, space-time study of whooping cough cases reported to the Public Health Agency of Barcelona, Spain between the years 2000 and 2011 is presented. It is based on 633 individual whooping cough cases and the 2006 population census from the Spanish National Statistics Institute, stratified by age and sex at the census tract level. Cluster identification was attempted using space-time scan statistic assuming a Poisson distribution and restricting temporal extent to 7 days and spatial distance to 500 m. Statistical calculations were performed with Stata 11 and SatScan and mapping was performed with ArcGis 10.0. Only clusters showing statistical significance (P <0.05) were mapped. The most likely cluster identified included five census tracts located in three neighbourhoods in central Barcelona during the week from 17 to 23 August 2011. This cluster included five cases compared with the expected level of 0.0021 (relative risk = 2436, P <0.001). In addition, 11 secondary significant space-time clusters were detected with secondary clusters occurring at different times and localizations. Spatial statistics is felt to be useful by complementing epidemiological surveillance systems through visualizing excess in the number of cases in space and time and thus increase the possibility of identifying outbreaks not reported by the surveillance system.

  6. Examination of Previously Published Data to Identify Patterns in the Social Representation of “Loud Music” in Young Adults Across Countries

    PubMed Central

    Manchaiah, Vinaya; Zhao, Fei; Oladeji, Susan; Ratinaud, Pierre

    2018-01-01

    Purpose: The current study was aimed at understanding the patterns in the social representation of loud music reported by young adults in different countries. Materials and Methods: The study included a sample of 534 young adults (18–25 years) from India, Iran, Portugal, United Kingdom, and United States. Participants were recruited using a convince sampling, and data were collected using the free association task. Participants were asked to provide up to five words or phrases that come to mind when thinking about “loud music.” The data were first analyzed using the qualitative content analysis. This was followed by quantitative cluster analysis and chi-square analysis. Results: The content analysis suggested 19 main categories of responses related to loud music. The cluster analysis resulted in for main clusters, namely: (1) emotional oriented perception; (2) problem oriented perception; (3) music and enjoyment oriented perception; and (4) positive emotional and recreation-oriented perception. Country of origin was associated with the likelihood of participants being in each of these clusters. Conclusion: The current study highlights the differences and similarities in young adults’ perception of loud music. These results may have implications to hearing health education to facilitate healthy listening habits. PMID:29457602

  7. An Empirical Taxonomy of Hospital Governing Board Roles

    PubMed Central

    Lee, Shoou-Yih D; Alexander, Jeffrey A; Wang, Virginia; Margolin, Frances S; Combes, John R

    2008-01-01

    Objective To develop a taxonomy of governing board roles in U.S. hospitals. Data Sources 2005 AHA Hospital Governance Survey, 2004 AHA Annual Survey of Hospitals, and Area Resource File. Study Design A governing board taxonomy was developed using cluster analysis. Results were validated and reviewed by industry experts. Differences in hospital and environmental characteristics across clusters were examined. Data Extraction Methods One-thousand three-hundred thirty-four hospitals with complete information on the study variables were included in the analysis. Principal Findings Five distinct clusters of hospital governing boards were identified. Statistical tests showed that the five clusters had high internal reliability and high internal validity. Statistically significant differences in hospital and environmental conditions were found among clusters. Conclusions The developed taxonomy provides policy makers, health care executives, and researchers a useful way to describe and understand hospital governing board roles. The taxonomy may also facilitate valid and systematic assessment of governance performance. Further, the taxonomy could be used as a framework for governing boards themselves to identify areas for improvement and direction for change. PMID:18355260

  8. VizieR Online Data Catalog: LAMOST survey of star clusters in M31. II. (Chen+, 2016)

    NASA Astrophysics Data System (ADS)

    Chen, B.; Liu, X.; Xiang, M.; Yuan, H.; Huang, Y.; Shi, J.; Fan, Z.; Huo, Z.; Wang, C.; Ren, J.; Tian, Z.; Zhang, H.; Liu, G.; Cao, Z.; Zhang, Y.; Hou, Y.; Wang, Y.

    2016-09-01

    We select a sample of 306 massive star clusters observed with the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) in the vicinity fields of M31 and M33. Massive clusters in our sample are all selected from the catalog presented in Paper I (Chen et al. 2015, Cat. J/other/RAA/15.1392), including five newly discovered clusters selected with the SDSS photometry, three newly confirmed, and 298 previously known clusters from Revised Bologna Catalogue (RBC; Galleti et al. 2012, Cat. V/143; http://www.bo.astro.it/M31/). Since then another two objects, B341 and B207, have also been observed with LAMOST, and they are included in the current analysis. The current sample does not include those listed in Paper I but is selected from Johnson et al. 2012 (Cat. J/ApJ/752/95) since most of them are young but not so massive. All objects are observed with LAMOST between 2011 September and 2014 June. Table1 lists the name, position, and radial velocity of all sample clusters analyzed in the current work. The LAMOST spectra cover the wavelength range 3700-9000Å at a resolving power of R~1800. Details about the observations and data reduction can be found in Paper I. The median signal-to-noise ratio (S/N) per pixel at 4750 and 7450Å of spectra of all clusters in the current sample are, respectively, 14 and 37. Essentially all spectra have S/N(4750Å)>5 except for the spectra of 18 clusters. The latter have S/N(7540Å)>10. Peacock et al. 2010 (Cat. J/MNRAS/402/803) retrieved images of M31 star clusters and candidates from the SDSS archive and extracted ugriz aperture photometric magnitudes from those objects using the SExtractor. They present a catalog containing homogeneous ugriz photometry of 572 star clusters and 373 candidates. Among them, 299 clusters are in our sample. (2 data files).

  9. Clustering of financial time series with application to index and enhanced index tracking portfolio

    NASA Astrophysics Data System (ADS)

    Dose, Christian; Cincotti, Silvano

    2005-09-01

    A stochastic-optimization technique based on time series cluster analysis is described for index tracking and enhanced index tracking problems. Our methodology solves the problem in two steps, i.e., by first selecting a subset of stocks and then setting the weight of each stock as a result of an optimization process (asset allocation). Present formulation takes into account constraints on the number of stocks and on the fraction of capital invested in each of them, whilst not including transaction costs. Computational results based on clustering selection are compared to those of random techniques and show the importance of clustering in noise reduction and robust forecasting applications, in particular for enhanced index tracking.

  10. Establishment of the Inducible Tet-On System for the Activation of the Silent Trichosetin Gene Cluster in Fusarium fujikuroi

    PubMed Central

    Janevska, Slavica; Arndt, Birgit; Baumann, Leonie; Apken, Lisa Helene; Mauriz Marques, Lucas Maciel; Humpf, Hans-Ulrich; Tudzynski, Bettina

    2017-01-01

    The PKS-NRPS-derived tetramic acid equisetin and its N-desmethyl derivative trichosetin exhibit remarkable biological activities against a variety of organisms, including plants and bacteria, e.g., Staphylococcus aureus. The equisetin biosynthetic gene cluster was first described in Fusarium heterosporum, a species distantly related to the notorious rice pathogen Fusarium fujikuroi. Here we present the activation and characterization of a homologous, but silent, gene cluster in F. fujikuroi. Bioinformatic analysis revealed that this cluster does not contain the equisetin N-methyltransferase gene eqxD and consequently, trichosetin was isolated as final product. The adaption of the inducible, tetracycline-dependent Tet-on promoter system from Aspergillus niger achieved a controlled overproduction of this toxic metabolite and a functional characterization of each cluster gene in F. fujikuroi. Overexpression of one of the two cluster-specific transcription factor (TF) genes, TF22, led to an activation of the three biosynthetic cluster genes, including the PKS-NRPS key gene. In contrast, overexpression of TF23, encoding a second Zn(II)2Cys6 TF, did not activate adjacent cluster genes. Instead, TF23 was induced by the final product trichosetin and was required for expression of the transporter-encoding gene MFS-T. TF23 and MFS-T likely act in consort and contribute to detoxification of trichosetin and therefore, self-protection of the producing fungus. PMID:28379186

  11. Establishment of the Inducible Tet-On System for the Activation of the Silent Trichosetin Gene Cluster in Fusarium fujikuroi.

    PubMed

    Janevska, Slavica; Arndt, Birgit; Baumann, Leonie; Apken, Lisa Helene; Mauriz Marques, Lucas Maciel; Humpf, Hans-Ulrich; Tudzynski, Bettina

    2017-04-05

    The PKS-NRPS-derived tetramic acid equisetin and its N -desmethyl derivative trichosetin exhibit remarkable biological activities against a variety of organisms, including plants and bacteria, e.g., Staphylococcus aureus . The equisetin biosynthetic gene cluster was first described in Fusarium heterosporum , a species distantly related to the notorious rice pathogen Fusarium fujikuroi . Here we present the activation and characterization of a homologous, but silent, gene cluster in F. fujikuroi . Bioinformatic analysis revealed that this cluster does not contain the equisetin N -methyltransferase gene eqxD and consequently, trichosetin was isolated as final product. The adaption of the inducible, tetracycline-dependent Tet-on promoter system from Aspergillus niger achieved a controlled overproduction of this toxic metabolite and a functional characterization of each cluster gene in F. fujikuroi . Overexpression of one of the two cluster-specific transcription factor (TF) genes, TF22 , led to an activation of the three biosynthetic cluster genes, including the PKS-NRPS key gene. In contrast, overexpression of TF23 , encoding a second Zn(II)₂Cys₆ TF, did not activate adjacent cluster genes. Instead, TF23 was induced by the final product trichosetin and was required for expression of the transporter-encoding gene MFS-T . TF23 and MFS-T likely act in consort and contribute to detoxification of trichosetin and therefore, self-protection of the producing fungus.

  12. U.S. consumer demand for restaurant calorie information: targeting demographic and behavioral segments in labeling initiatives.

    PubMed

    Kolodinsky, Jane; Reynolds, Travis William; Cannella, Mark; Timmons, David; Bromberg, Daniel

    2009-01-01

    To identify different segments of U.S. consumers based on food choices, exercise patterns, and desire for restaurant calorie labeling. Using a stratified (by region) random sample of the U.S. population, trained interviewers collected data for this cross-sectional study through telephone surveys. Center for Rural Studies U.S. national health survey. The final sample included 580 responses (22% response rate); data were weighted to be representative of age and gender characteristics of the U.S. population. Self-reported behaviors related to food choices, exercise patterns, desire for calorie information in restaurants, and sample demographics. Clusters were identified using Schwartz Bayesian criteria. Impacts of demographic characteristics on cluster membership were analyzed using bivariate tests of association and multinomial logit regression. Cluster analysis revealed three clusters based on respondents' food choices, activity levels, and desire for restaurant labeling. Two clusters, comprising three quarters of the sample, desired calorie labeling in restaurants. The remaining cluster opposed restaurant labeling. Demographic variables significantly predicting cluster membership included region of residence (p < .10), income (p < .05), gender (p < .01), and age (p < .10). Though limited by a low response and potential self-reporting bias in the phone survey, this study suggests that several groups are likely to benefit from restaurant calorie labeling. Specific demographic clusters could be targeted through labeling initiatives.

  13. Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats.

    PubMed

    Seman, Ali; Sapawi, Azizian Mohd; Salleh, Mohd Zaki

    2015-06-01

    Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data.

  14. A ground truth based comparative study on clustering of gene expression data.

    PubMed

    Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue

    2008-05-01

    Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

  15. Mass spectrometric identification of intermediates in the O2-driven [4Fe-4S] to [2Fe-2S] cluster conversion in FNR

    PubMed Central

    Crack, Jason C.; Thomson, Andrew J.

    2017-01-01

    The iron-sulfur cluster containing protein Fumarate and Nitrate Reduction (FNR) is the master regulator for the switch between anaerobic and aerobic respiration in Escherichia coli and many other bacteria. The [4Fe-4S] cluster functions as the sensory module, undergoing reaction with O2 that leads to conversion to a [2Fe-2S] form with loss of high-affinity DNA binding. Here, we report studies of the FNR cluster conversion reaction using time-resolved electrospray ionization mass spectrometry. The data provide insight into the reaction, permitting the detection of cluster conversion intermediates and products, including a [3Fe-3S] cluster and persulfide-coordinated [2Fe-2S] clusters [[2Fe-2S](S)n, where n = 1 or 2]. Analysis of kinetic data revealed a branched mechanism in which cluster sulfide oxidation occurs in parallel with cluster conversion and not as a subsequent, secondary reaction to generate [2Fe-2S](S)n species. This methodology shows great potential for broad application to studies of protein cofactor–small molecule interactions. PMID:28373574

  16. Network-based spatial clustering technique for exploring features in regional industry

    NASA Astrophysics Data System (ADS)

    Chou, Tien-Yin; Huang, Pi-Hui; Yang, Lung-Shih; Lin, Wen-Tzu

    2008-10-01

    In the past researches, industrial cluster mainly focused on single or particular industry and less on spatial industrial structure and mutual relations. Industrial cluster could generate three kinds of spillover effects, including knowledge, labor market pooling, and input sharing. In addition, industrial cluster indeed benefits industry development. To fully control the status and characteristics of district industrial cluster can facilitate to improve the competitive ascendancy of district industry. The related researches on industrial spatial cluster were of great significance for setting up industrial policies and promoting district economic development. In this study, an improved model, GeoSOM, that combines DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and SOM (Self-Organizing Map) was developed for analyzing industrial cluster. Different from former distance-based algorithm for industrial cluster, the proposed GeoSOM model can calculate spatial characteristics between firms based on DBSCAN algorithm and evaluate the similarity between firms based on SOM clustering analysis. The demonstrative data sets, the manufacturers around Taichung County in Taiwan, were analyzed for verifying the practicability of the proposed model. The analyzed results indicate that GeoSOM is suitable for evaluating spatial industrial cluster.

  17. Relation between the Dynamics of Glassy Clusters and Characteristic Features of their Energy Landscape

    NASA Astrophysics Data System (ADS)

    De, Sandip; Schaefer, Bastian; Sadeghi, Ali; Sicher, Michael; Kanhere, D. G.; Goedecker, Stefan

    2014-02-01

    Based on a recently introduced metric for measuring distances between configurations, we introduce distance-energy (DE) plots to characterize the potential energy surface of clusters. Producing such plots is computationally feasible on the density functional level since it requires only a few hundred stable low energy configurations including the global minimum. By using standard criteria based on disconnectivity graphs and the dynamics of Lennard-Jones clusters, we show that the DE plots convey the necessary information about the character of the potential energy surface and allow us to distinguish between glassy and nonglassy systems. We then apply this analysis to real clusters at the density functional theory level and show that both glassy and nonglassy clusters can be found in simulations. It turns out that among our investigated clusters only those can be synthesized experimentally which exhibit a nonglassy landscape.

  18. Construction and Utilization of a Beowulf Computing Cluster: A User's Perspective

    NASA Technical Reports Server (NTRS)

    Woods, Judy L.; West, Jeff S.; Sulyma, Peter R.

    2000-01-01

    Lockheed Martin Space Operations - Stennis Programs (LMSO) at the John C Stennis Space Center (NASA/SSC) has designed and built a Beowulf computer cluster which is owned by NASA/SSC and operated by LMSO. The design and construction of the cluster are detailed in this paper. The cluster is currently used for Computational Fluid Dynamics (CFD) simulations. The CFD codes in use and their applications are discussed. Examples of some of the work are also presented. Performance benchmark studies have been conducted for the CFD codes being run on the cluster. The results of two of the studies are presented and discussed. The cluster is not currently being utilized to its full potential; therefore, plans are underway to add more capabilities. These include the addition of structural, thermal, fluid, and acoustic Finite Element Analysis codes as well as real-time data acquisition and processing during test operations at NASA/SSC. These plans are discussed as well.

  19. THE JCMT GOULD BELT SURVEY: DENSE CORE CLUSTERS IN ORION A

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lane, J.; Kirk, H.; Johnstone, D.

    The Orion A molecular cloud is one of the most well-studied nearby star-forming regions, and includes regions of both highly clustered and more dispersed star formation across its full extent. Here, we analyze dense, star-forming cores identified in the 850 and 450 μ m SCUBA-2 maps from the JCMT Gould Belt Legacy Survey. We identify dense cores in a uniform manner across the Orion A cloud and analyze their clustering properties. Using two independent lines of analysis, we find evidence that clusters of dense cores tend to be mass segregated, suggesting that stellar clusters may have some amount of primordial mass segregationmore » already imprinted in them at an early stage. We also demonstrate that the dense core clusters have a tendency to be elongated, perhaps indicating a formation mechanism linked to the filamentary structure within molecular clouds.« less

  20. Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

    PubMed

    Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

    2017-12-01

    To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  1. Genetic diversity among air yam (Dioscorea bulbifera) varieties based on single sequence repeat markers.

    PubMed

    Silva, D M; Siqueira, M V B M; Carrasco, N F; Mantello, C C; Nascimento, W F; Veasey, E A

    2016-05-23

    Dioscorea is the largest genus in the Dioscoreaceae family, and includes a number of economically important species including the air yam, D. bulbifera L. This study aimed to develop new single sequence repeat primers and characterize the genetic diversity of local varieties that originated in several municipalities of Brazil. We developed an enriched genomic library for D. bulbifera resulting in seven primers, six of which were polymorphic, and added four polymorphic loci developed for other Dioscorea species. This resulted in 10 polymorphic primers to evaluate 42 air yam accessions. Thirty-three alleles (bands) were found, with an average of 3.3 alleles per locus. The discrimination power ranged from 0.113 to 0.834, with an average of 0.595. Both principal coordinate and cluster analyses (using the Jaccard Index) failed to clearly separate the accessions according to their origins. However, the 13 accessions from Conceição dos Ouros, Minas Gerais State were clustered above zero on the principal coordinate 2 axis, and were also clustered into one subgroup in the cluster analysis. Accessions from Ubatuba, São Paulo State were clustered below zero on the same principal coordinate 2 axis, except for one accession, although they were scattered in several subgroups in the cluster analysis. Therefore, we found little spatial structure in the accessions, although those from Conceição dos Ouros and Ubatuba exhibited some spatial structure, and that there is a considerable level of genetic diversity in D. bulbifera maintained by traditional farmers in Brazil.

  2. Constraining AGN triggering mechanisms through the clustering analysis of active black holes

    NASA Astrophysics Data System (ADS)

    Gatti, M.; Shankar, F.; Bouillot, V.; Menci, N.; Lamastra, A.; Hirschmann, M.; Fiore, F.

    2016-02-01

    The triggering mechanisms for active galactic nuclei (AGN) are still debated. Some of the most popular ones include galaxy interactions (IT) and disc instabilities (DIs). Using an advanced semi-analytic model (SAM) of galaxy formation, coupled to accurate halo occupation distribution modelling, we investigate the imprint left by each separate triggering process on the clustering strength of AGN at small and large scales. Our main results are as follows: (I) DIs, irrespective of their exact implementation in the SAM, tend to fall short in triggering AGN activity in galaxies at the centre of haloes with Mh > 1013.5 h-1 M⊙. On the contrary, the IT scenario predicts abundance of active central galaxies that generally agrees well with observations at every halo mass. (II) The relative number of satellite AGN in DIs at intermediate-to-low luminosities is always significantly higher than in IT models, especially in groups and clusters. The low AGN satellite fraction predicted for the IT scenario might suggest that different feeding modes could simultaneously contribute to the triggering of satellite AGN. (III) Both scenarios are quite degenerate in matching large-scale clustering measurements, suggesting that the sole average bias might not be an effective observational constraint. (IV) Our analysis suggests the presence of both a mild luminosity and a more consistent redshift dependence in the AGN clustering, with AGN inhabiting progressively less massive dark matter haloes as the redshift increases. We also discuss the impact of different observational selection cuts in measuring AGN clustering, including possible discrepancies between optical and X-ray surveys.

  3. Pseudomonas aeruginosa in Dairy Goats: Genotypic and Phenotypic Comparison of Intramammary and Environmental Isolates

    PubMed Central

    Scaccabarozzi, Licia; Leoni, Livia; Ballarini, Annalisa; Barberio, Antonio; Locatelli, Clara; Casula, Antonio; Bronzo, Valerio; Pisoni, Giuliano; Jousson, Olivier; Morandi, Stefano; Rapetti, Luca; García-Fernández, Aurora; Moroni, Paolo

    2015-01-01

    Following the identification of a case of severe clinical mastitis in a Saanen dairy goat (goat A), an average of 26 lactating goats in the herd was monitored over a period of 11 months. Milk microbiological analysis revealed the presence of Pseudomonas aeruginosa in 7 of the goats. Among these 7 does, only goat A showed clinical signs of mastitis. The 7 P. aeruginosa isolates from the goat milk and 26 P. aeruginosa isolates from environmental samples were clustered by RAPD-PCR and PFGE analyses in 3 genotypes (G1, G2, G3) and 4 clusters (A, B, C, D), respectively. PFGE clusters A and B correlated with the G1 genotype and included the 7 milk isolates. Although it was not possible to identify the infection source, these results strongly suggest a spreading of the infection from goat A. Clusters C and D overlapped with genotypes G2 and G3, respectively, and included only environmental isolates. The outcome of the antimicrobial susceptibility test performed on the isolates revealed 2 main patterns of multiple resistance to beta-lactam antibiotics and macrolides. Virulence related phenotypes were analyzed, such as swarming and swimming motility, production of biofilm and production of secreted virulence factors. The isolates had distinct phenotypic profiles, corresponding to genotypes G1, G2 and G3. Overall, correlation analysis showed a strong correlation between sampling source, RAPD genotype, PFGE clusters, and phenotypic clusters. The comparison of the levels of virulence related phenotypes did not indicate a higher pathogenic potential in the milk isolates as compared to the environmental isolates. PMID:26606430

  4. Goal Profiles, Mental Toughness and its Influence on Performance Outcomes among Wushu Athletes

    PubMed Central

    Roy, Jolly

    2007-01-01

    This study examined the association between goal orientations and mental toughness and its influence on performance outcomes in competition. Wushu athletes (n = 40) competing in Intervarsity championships in Malaysia completed Task and Ego Orientations in Sport Questionnaire (TEOSQ) and Psychological Performance Inventory (PPI). Using cluster analysis techniques including hierarchical methods and the non-hierarchical method (k-means cluster) to examine goal profiles, a three cluster solution emerged viz. cluster 1 - high task and moderate ego (HT/ME), cluster 2 - moderate task and low ego (MT/LE) and, cluster 3 - moderate task and moderate ego (MT/ME). Analysis of the fundamental areas of mental toughness based on goal profiles revealed that athletes in cluster 1 scored significantly higher on negative energy control than athletes in cluster 2. Further, athletes in cluster 1 also scored significantly higher on positive energy control than athletes in cluster 3. Chi-square (χ2) test revealed no significant differences among athletes with different goal profiles on performance outcomes in the competition. However, significant differences were observed between athletes (medallist and non medallist) in self- confidence (p = 0.001) and negative energy control (p = 0.042). Medallist’s scored significantly higher on self-confidence (mean = 21.82 ± 2.72) and negative energy control (mean = 19.59 ± 2.32) than the non-medallists (self confidence-mean = 18.76 ± 2.49; negative energy control mean = 18.14 ± 1.91). Key points Mental toughness can be influenced by certain goal profile combination. Athletes with successful outcomes in performance (medallist) displayed greater mental toughness. PMID:24198700

  5. Galaxy evolution in the densest environments: HST imaging

    NASA Astrophysics Data System (ADS)

    Jorgensen, Inger

    2013-10-01

    We propose to process in a consistent fashion all available HST/ACS and WFC3 imaging of seven rich clusters of galaxies at z=1.2-1.6. The clusters are part of our larger project aimed at constraining models for galaxy evolution in dense environments from observations of stellar populations in rich z=1.2-2 galaxy clusters. The main objective is to establish the star formation {SF} history and structural evolution over this epoch during which large changes in SF rates and galaxy structure are expected to take place in cluster galaxies.The observational data required to meet our main objective are deep HST imaging and high S/N spectroscopy of individual cluster members. The HST imaging already exists for the seven rich clusters at z=1.2-1.6 included in this archive proposal. However, the data have not been consistently processed to derive colors, magnitudes, sizes and morphological parameters for all potential cluster members bright enough to be suitable for spectroscopic observations with 8-m class telescopes. We propose to carry out this processing and make all derived parameters publicly available. We will use the parameters derived from the HST imaging to {1} study the structural evolution of the galaxies, {2} select clusters and galaxies for spectroscopic observations, and {3} use the photometry and spectroscopy together for a unified analysis aimed at the SF history and structural changes. The analysis will also utilize data from the Gemini/HST Cluster Galaxy Project, which covers rich clusters at z=0.2-1.0 and for which we have similar HST imaging and high S/N spectroscopy available.

  6. Symptom clusters predict mortality among dialysis patients in Norway: a prospective observational cohort study.

    PubMed

    Amro, Amin; Waldum, Bård; von der Lippe, Nanna; Brekke, Fredrik Barth; Dammen, Toril; Miaskowski, Christine; Os, Ingrid

    2015-01-01

    Patients with end-stage renal disease on dialysis have reduced survival rates compared with the general population. Symptoms are frequent in dialysis patients, and a symptom cluster is defined as two or more related co-occurring symptoms. The aim of this study was to explore the associations between symptom clusters and mortality in dialysis patients. In a prospective observational cohort study of dialysis patients (n = 301), Kidney Disease and Quality of Life Short Form and Beck Depression Inventory questionnaires were administered. To generate symptom clusters, principal component analysis with varimax rotation was used on 11 kidney-specific self-reported physical symptoms. A Beck Depression Inventory score of 16 or greater was defined as clinically significant depressive symptoms. Physical and mental component summary scores were generated from Short Form-36. Multivariate Cox regression analysis was used for the survival analysis, Kaplan-Meier curves and log-rank statistics were applied to compare survival rates between the groups. Three different symptom clusters were identified; one included loading of several uremic symptoms. In multivariate analyses and after adjustment for health-related quality of life and depressive symptoms, the worst perceived quartile of the "uremic" symptom cluster independently predicted all-cause mortality (hazard ratio 2.47, 95% CI 1.44-4.22, P = 0.001) compared with the other quartiles during a follow-up period that ranged from four to 52 months. The two other symptom clusters ("neuromuscular" and "skin") or the individual symptoms did not predict mortality. Clustering of uremic symptoms predicted mortality. Assessing co-occurring symptoms rather than single symptoms may help to identify dialysis patients at high risk for mortality. Copyright © 2015 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  7. Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

    NASA Astrophysics Data System (ADS)

    Takahashi, A.; Hashimoto, M.

    2015-12-01

    Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.

  8. Amplification of the entire kanamycin biosynthetic gene cluster during empirical strain improvement of Streptomyces kanamyceticus.

    PubMed

    Yanai, Koji; Murakami, Takeshi; Bibb, Mervyn

    2006-06-20

    Streptomyces kanamyceticus 12-6 is a derivative of the wild-type strain developed for industrial kanamycin (Km) production. Southern analysis and DNA sequencing revealed amplification of a large genomic segment including the entire Km biosynthetic gene cluster in the chromosome of strain 12-6. At 145 kb, the amplifiable unit of DNA (AUD) is the largest AUD reported in Streptomyces. Striking repetitive DNA sequences belonging to the clustered regularly interspaced short palindromic repeats family were found in the AUD and may play a role in its amplification. Strain 12-6 contains a mixture of different chromosomes with varying numbers of AUDs, sometimes exceeding 36 copies and producing an amplified region >5.7 Mb. The level of Km production depended on the copy number of the Km biosynthetic gene cluster, suggesting that DNA amplification occurred during strain improvement as a consequence of selection for increased Km resistance. Amplification of DNA segments including entire antibiotic biosynthetic gene clusters might be a common mechanism leading to increased antibiotic production in industrial strains.

  9. Cluster: Mission Overview and End-of-Life Analysis

    NASA Technical Reports Server (NTRS)

    Pallaschke, S.; Munoz, I.; Rodriquez-Canabal, J.; Sieg, D.; Yde, J. J.

    2007-01-01

    The Cluster mission is part of the scientific programme of the European Space Agency (ESA) and its purpose is the analysis of the Earth's magnetosphere. The Cluster project consists of four satellites. The selected polar orbit has a shape of 4.0 and 19.2 Re which is required for performing measurements near the cusp and the tail of the magnetosphere. When crossing these regions the satellites form a constellation which in most of the cases so far has been a regular tetrahedron. The satellite operations are carried out by the European Space Operations Centre (ESOC) at Darmstadt, Germany. The paper outlines the future orbit evolution and the envisaged operations from a Flight Dynamics point of view. In addition a brief summary of the LEOP and routine operations is included beforehand.

  10. Common factor analysis versus principal component analysis: choice for symptom cluster research.

    PubMed

    Kim, Hee-Ju

    2008-03-01

    The purpose of this paper is to examine differences between two factor analytical methods and their relevance for symptom cluster research: common factor analysis (CFA) versus principal component analysis (PCA). Literature was critically reviewed to elucidate the differences between CFA and PCA. A secondary analysis (N = 84) was utilized to show the actual result differences from the two methods. CFA analyzes only the reliable common variance of data, while PCA analyzes all the variance of data. An underlying hypothetical process or construct is involved in CFA but not in PCA. PCA tends to increase factor loadings especially in a study with a small number of variables and/or low estimated communality. Thus, PCA is not appropriate for examining the structure of data. If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research), CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.

  11. Relation between financial market structure and the real economy: comparison between clustering methods.

    PubMed

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].

  12. Assessment of the climatic potential for tourism in Iran through biometeorology clustering.

    PubMed

    Roshan, Gholamreza; Yousefi, Robabe; Błażejczyk, Krzysztof

    2018-04-01

    This study presents a spatiotemporal analysis of bioclimatic comfort conditions for Iran using mean daily meteorological data from 1995 to 2014, analyzed through Physiological Equivalent Temperature (PET) index and Universal Thermal Climate Index (UTCI) indices, and bioclimatic clustering. The results of this study demonstrate that due to the climate variability across Iran during the year, there is at any point in time a location with climatic condition suitable for tourism. Mean values demonstrate maxima in bioclimatic comfort indices for the country in late winter and spring and minima for summer. Seven statistically significant clusters in bioclimatic indices were identified. Comparing these with clustering performed on PET and UTCI, the maximum overlaps between the two indices. In the following, the outputs of this research showed that most appropriate bioclimatic clustering for Iran includes seven clusters. These clustering locations according to climatic suitability for tourism provide a valuable contribution to tourism management in the country, particularly through marketing destinations to maximize tourist flow.

  13. Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

    PubMed Central

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T.

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging. PMID:25786703

  14. Spatial cluster analysis of human cases of Crimean Congo hemorrhagic fever reported in Pakistan.

    PubMed

    Abbas, Tariq; Younus, Muhammad; Muhammad, Sayyad Aun

    2015-01-01

    Crimean Congo hemorrhagic fever (CCHF) is a tick-borne viral zoonotic disease that has been reported in almost all geographic regions in Pakistan. The aim of this study was to identify spatial clusters of human cases of CCHF reported in country. Kulldorff's spatial scan statisitc, Anselin's Local Moran's I and Getis Ord Gi* tests were applied on data (i.e. number of laboratory confirmed cases reported from each district during year 2013). The analyses revealed a large multi-district cluster of high CCHF incidence in the uplands of Balochistan province near it border with Afghanistan. The cluster comprised the following districts: Qilla Abdullah; Qilla Saifullah; Loralai, Quetta, Sibi, Chagai, and Mastung. Another cluster was detected in Punjab and included Rawalpindi district and a part of Islamabad. We provide empirical evidence of spatial clustering of human CCHF cases in the country. The districts in the clusters should be given priority in surveillance, control programs, and further research.

  15. Clustering multilayer omics data using MuNCut.

    PubMed

    Teran Hidalgo, Sebastian J; Ma, Shuangge

    2018-03-14

    Omics profiling is now a routine component of biomedical studies. In the analysis of omics data, clustering is an essential step and serves multiple purposes including for example revealing the unknown functionalities of omics units, assisting dimension reduction in outcome model building, and others. In the most recent omics studies, a prominent trend is to conduct multilayer profiling, which collects multiple types of genetic, genomic, epigenetic and other measurements on the same subjects. In the literature, clustering methods tailored to multilayer omics data are still limited. Directly applying the existing clustering methods to multilayer omics data and clustering each layer first and then combing across layers are both "suboptimal" in that they do not accommodate the interconnections within layers and across layers in an informative way. In this study, we develop the MuNCut (Multilayer NCut) clustering approach. It is tailored to multilayer omics data and sufficiently accounts for both across- and within-layer connections. It is based on the novel NCut technique and also takes advantages of regularized sparse estimation. It has an intuitive formulation and is computationally very feasible. To facilitate implementation, we develop the function muncut in the R package NcutYX. Under a wide spectrum of simulation settings, it outperforms competitors. The analysis of TCGA (The Cancer Genome Atlas) data on breast cancer and cervical cancer shows that MuNCut generates biologically meaningful results which differ from those using the alternatives. We propose a more effective clustering analysis of multiple omics data. It provides a new venue for jointly analyzing genetic, genomic, epigenetic and other measurements.

  16. Laboratory-based validation of the baseline sensors of the ITER diagnostic residual gas analyzer

    NASA Astrophysics Data System (ADS)

    Klepper, C. C.; Biewer, T. M.; Marcus, C.; Andrew, P.; Gardner, W. L.; Graves, V. B.; Hughes, S.

    2017-10-01

    The divertor-specific ITER Diagnostic Residual Gas Analyzer (DRGA) will provide essential information relating to DT fusion plasma performance. This includes pulse-resolving measurements of the fuel isotopic mix reaching the pumping ducts, as well as the concentration of the helium generated as the ash of the fusion reaction. In the present baseline design, the cluster of sensors attached to this diagnostic's differentially pumped analysis chamber assembly includes a radiation compatible version of a commercial quadrupole mass spectrometer, as well as an optical gas analyzer using a plasma-based light excitation source. This paper reports on a laboratory study intended to validate the performance of this sensor cluster, with emphasis on the detection limit of the isotopic measurement. This validation study was carried out in a laboratory set-up that closely prototyped the analysis chamber assembly configuration of the baseline design. This includes an ITER-specific placement of the optical gas measurement downstream from the first turbine of the chamber's turbo-molecular pump to provide sufficient light emission while preserving the gas dynamics conditions that allow for \\textasciitilde 1 s response time from the sensor cluster [1].

  17. Subtypes of female juvenile offenders: a cluster analysis of the Millon Adolescent Clinical Inventory.

    PubMed

    Stefurak, Tres; Calhoun, Georgia B

    2007-01-01

    The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.

  18. Diversity and evolution analysis of glycoprotein GP85 from avian leukosis virus subgroup J isolates from chickens of different genetic backgrounds during 1989-2016: Coexistence of five extremely different clusters.

    PubMed

    Wang, Peikun; Lin, Lulu; Li, Haijuan; Yang, Yongli; Huang, Teng; Wei, Ping

    2018-02-01

    ALV-J has caused the most serious losses to the poultry industry in China. The gp85-coding sequence of ALV-J is known to be prone to mutation, but any association between the gp85 gene and breed of chicken remains unclear. A comprehensive and systematic study of the evolutionary process of ALV-J in China is needed. In this study, we compared and analyzed gp85 gene sequences from 198 ALV-J isolates, originating from China, USA, UK and France during 1989-2016. These were sorted into five clusters. Cluster 1, 2, 3, 4 and 5 included isolates from chicken types of different genetic backgrounds, e.g. white-feather broiler, Guangxi indigenous chicken breeds, Yellow chickens and layer chickens respectively. A correlation comparison of amino acid sequence similarities in the gp85 protein among the five clusters showed significant differences (P < 0.01) with the exception being when the third and fifth cluster were compared (P > 0.05). Results of entropy analysis of the gp85 sequences revealed that cluster 3 had the largest variation and cluster 1 had the least variation. The N-glycosylation sites in the majority of isolates numbered 14, 16, 17, 16 and 16, respectively, with regards to clusters 1-5. In addition, 5 isolates from cluster 3 had one more glycosylation site than the other isolates from cluster 3. Our study provides evidence that there were five extremely different ALV-J clusters during 1989-2016 and that the gp85 genes isolated from indigenous chicken breed isolates had the largest variation.

  19. Rates of proton transfer to Fe-S-based clusters: comparison of clusters containing {MFe(mu(2)-S)(2)}n+ and {MFe(3)(mu(3)-S)(4)}n+ (M = Fe, Mo, or W) cores.

    PubMed

    Bates, Katie; Garrett, Brendan; Henderson, Richard A

    2007-12-24

    The rates of proton transfer from [pyrH]+ (pyr = pyrrolidine) to the binuclear complexes [Fe2S2Cl4]2- and [S2MS2FeCl2]2- (M = Mo or W) are reported. The reactions were studied using stopped-flow spectrophotometry, and the rate constants for proton transfer were determined from analysis of the kinetics of the substitution reactions of these clusters with the nucleophiles Br- or PhS- in the presence of [pyrH]+. In general, Br- is a poor nucleophile for these clusters, and proton transfer occurs before Br- binds, allowing direct measure of the rate of proton transfer from [pyrH]+ to the cluster. In contrast, PhS- is a better nucleophile, and a pathway in which PhS- binds preferentially to the cluster prior to proton transfer from [pyrH]+ usually operates. For the reaction of [Fe2S2Cl4]2- with PhS- in the presence of [pyrH]+ both pathways are observed. Comparison of the results presented in this paper with analogous studies reported earlier on cuboidal Fe-S-based clusters allows discussion of the factors which affect the rates of proton transfer in synthetic clusters including the nuclearity of the cluster core, the metal composition, and the nature of the terminal ligands. The possible relevance of these findings to the protonation sites of natural Fe-S-based clusters, including FeMo-cofactor from nitrogenase, are presented.

  20. Molecular clustering of patients with diabetes and pulmonary tuberculosis: A systematic review and meta-analysis.

    PubMed

    Blanco-Guillot, Francles; Delgado-Sánchez, Guadalupe; Mongua-Rodríguez, Norma; Cruz-Hervert, Pablo; Ferreyra-Reyes, Leticia; Ferreira-Guerrero, Elizabeth; Yanes-Lane, Mercedes; Montero-Campos, Rogelio; Bobadilla-Del-Valle, Miriam; Torres-González, Pedro; Ponce-de-León, Alfredo; Sifuentes-Osornio, José; Garcia-Garcia, Lourdes

    2017-01-01

    Many studies have explored the relationship between diabetes mellitus (DM) and tuberculosis (TB) demonstrating increased risk of TB among patients with DM and poor prognosis of patients suffering from the association of DM/TB. Owing to a paucity of studies addressing this question, it remains unclear whether patients with DM and TB are more likely than TB patients without DM to be grouped into molecular clusters defined according to the genotype of the infecting Mycobacterium tuberculosis bacillus. That is, whether there is convincing molecular epidemiological evidence for TB transmission among DM patients. Objective: We performed a systematic review and meta-analysis to quantitatively evaluate the propensity for patients with DM and pulmonary TB (PTB) to cluster according to the genotype of the infecting M. tuberculosis bacillus. We conducted a systematic search in MEDLINE and LILACS from 1990 to June, 2016 with the following combinations of key words "tuberculosis AND transmission" OR "tuberculosis diabetes mellitus" OR "Mycobacterium tuberculosis molecular epidemiology" OR "RFLP-IS6110" OR "Spoligotyping" OR "MIRU-VNTR". Studies were included if they met the following criteria: (i) studies based on populations from defined geographical areas; (ii) use of genotyping by IS6110- restriction fragment length polymorphism (RFLP) analysis and spoligotyping or mycobacterial interspersed repetitive unit-variable number of tandem repeats (MIRU-VNTR) or other amplification methods to identify molecular clustering; (iii) genotyping and analysis of 50 or more cases of PTB; (iv) study duration of 11 months or more; (v) identification of quantitative risk factors for molecular clustering including DM; (vi) > 60% coverage of the study population; and (vii) patients with PTB confirmed bacteriologically. The exclusion criteria were: (i) Extrapulmonary TB; (ii) TB caused by nontuberculous mycobacteria; (iii) patients with PTB and HIV; (iv) pediatric PTB patients; (v) TB in closed environments (e.g. prisons, elderly homes, etc.); (vi) diabetes insipidus and (vii) outbreak reports. Hartung-Knapp-Sidik-Jonkman method was used to estimate the odds ratio (OR) of the association between DM with molecular clustering of cases with TB. In order to evaluate the degree of heterogeneity a statistical Q test was done. The publication bias was examined with Begg and Egger tests. Review Manager 5.3.5 CMA v.3 and Biostat and Software package R were used. Selection criteria were met by six articles which included 4076 patients with PTB of which 13% had DM. Twenty seven percent of the cases were clustered. The majority of cases (48%) were reported in a study in China with 31% clustering. The highest incidence of TB occurred in two studies from China. The global OR for molecular clustering was 0.84 (IC 95% 0.40-1.72). The heterogeneity between studies was moderate (I2 = 55%, p = 0.05), although there was no publication bias (Beggs test p = 0.353 and Eggers p = 0.429). There were very few studies meeting our selection criteria. The wide confidence interval indicates that there is not enough evidence to draw conclusions about the association. Clustering of patients with DM in TB transmission chains should be investigated in areas where both diseases are prevalent and focus on specific contexts.

  1. [Cluster analysis in biomedical researches].

    PubMed

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  2. High-Resolution Analysis by Whole-Genome Sequencing of an International Lineage (Sequence Type 111) of Pseudomonas aeruginosa Associated with Metallo-Carbapenemases in the United Kingdom.

    PubMed

    Turton, Jane F; Wright, Laura; Underwood, Anthony; Witney, Adam A; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D; Green, Jonathan; Holliman, Richard; Woodford, Neil

    2015-08-01

    Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  3. XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data

    PubMed Central

    2015-01-01

    Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893

  4. On the Distribution of Orbital Poles of Milky Way Satellites

    NASA Astrophysics Data System (ADS)

    Palma, Christopher; Majewski, Steven R.; Johnston, Kathryn V.

    2002-01-01

    In numerous studies of the outer Galactic halo some evidence for accretion has been found. If the outer halo did form in part or wholly through merger events, we might expect to find coherent streams of stars and globular clusters following orbits similar to those of their parent objects, which are assumed to be present or former Milky Way dwarf satellite galaxies. We present a study of this phenomenon by assessing the likelihood of potential descendant ``dynamical families'' in the outer halo. We conduct two analyses: one that involves a statistical analysis of the spatial distribution of all known Galactic dwarf satellite galaxies (DSGs) and globular clusters, and a second, more specific analysis of those globular clusters and DSGs for which full phase space dynamical data exist. In both cases our methodology is appropriate only to members of descendant dynamical families that retain nearly aligned orbital poles today. Since the Sagittarius dwarf (Sgr) is considered a paradigm for the type of merger/tidal interaction event for which we are searching, we also undertake a case study of the Sgr system and identify several globular clusters that may be members of its extended dynamical family. In our first analysis, the distribution of possible orbital poles for the entire sample of outer (Rgc>8 kpc) halo globular clusters is tested for statistically significant associations among globular clusters and DSGs. Our methodology for identifying possible associations is similar to that used by Lynden-Bell & Lynden-Bell, but we put the associations on a more statistical foundation. Moreover, we study the degree of possible dynamical clustering among various interesting ensembles of globular clusters and satellite galaxies. Among the ensembles studied, we find the globular cluster subpopulation with the highest statistical likelihood of association with one or more of the Galactic DSGs to be the distant, outer halo (Rgc>25 kpc), second-parameter globular clusters. The results of our orbital pole analysis are supported by the great circle cell count methodology of Johnston, Hernquist, & Bolte. The space motions of the clusters Pal 4, NGC 6229, NGC 7006, and Pyxis are predicted to be among those most likely to show the clusters to be following stream orbits, since these clusters are responsible for the majority of the statistical significance of the association between outer halo, second-parameter globular clusters and the Milky Way DSGs. In our second analysis, we study the orbits of the 41 globular clusters and six Milky Way-bound DSGs having measured proper motions to look for objects with both coplanar orbits and similar angular momenta. Unfortunately, the majority of globular clusters with measured proper motions are inner halo clusters that are less likely to retain memory of their original orbit. Although four potential globular cluster/DSG associations are found, we believe three of these associations involving inner halo clusters to be coincidental. While the present sample of objects with complete dynamical data is small and does not include many of the globular clusters that are more likely to have been captured by the Milky Way, the methodology we adopt will become increasingly powerful as more proper motions are measured for distant Galactic satellites and globular clusters, and especially as results from the Space Interferometry Mission (SIM) become available.

  5. Prevalence and risk factors of seizure clusters in adult patients with epilepsy.

    PubMed

    Chen, Baibing; Choi, Hyunmi; Hirsch, Lawrence J; Katz, Austen; Legge, Alexander; Wong, Rebecca A; Jiang, Alfred; Kato, Kenneth; Buchsbaum, Richard; Detyniecki, Kamil

    2017-07-01

    In the current study, we explored the prevalence of physician-confirmed seizure clusters. We also investigated potential clinical factors associated with the occurrence of seizure clusters overall and by epilepsy type. We reviewed medical records of 4116 adult (≥16years old) outpatients with epilepsy at our centers for documentation of seizure clusters. Variables including patient demographics, epilepsy details, medical and psychiatric history, AED history, and epilepsy risk factors were then tested against history of seizure clusters. Patients were then divided into focal epilepsy, idiopathic generalized epilepsy (IGE), or symptomatic generalized epilepsy (SGE), and the same analysis was run. Overall, seizure clusters were independently associated with earlier age of seizure onset, symptomatic generalized epilepsy (SGE), central nervous system (CNS) infection, cortical dysplasia, status epilepticus, absence of 1-year seizure freedom, and having failed 2 or more AEDs (P<0.0026). Patients with SGE (27.1%) were more likely to develop seizure clusters than patients with focal epilepsy (16.3%) and IGE (7.4%; all P<0.001). Analysis by epilepsy type showed that absence of 1-year seizure freedom since starting treatment at one of our centers was associated with seizure clustering in patients across all 3 epilepsy types. In patients with SGE, clusters were associated with perinatal/congenital brain injury. In patients with focal epilepsy, clusters were associated with younger age of seizure onset, complex partial seizures, cortical dysplasia, status epilepticus, CNS infection, and having failed 2 or more AEDs. In patients with IGE, clusters were associated with presence of an aura. Only 43.5% of patients with seizure clusters were prescribed rescue medications. Patients with intractable epilepsy are at a higher risk of developing seizure clusters. Factors such as having SGE, CNS infection, cortical dysplasia, status epilepticus or an early seizure onset, can also independently increase one's chance of having seizure clusters. Copyright © 2017. Published by Elsevier B.V.

  6. X-Ray Morphological Analysis of the Planck ESZ Clusters

    NASA Astrophysics Data System (ADS)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Ettori, Stefano; Andrade-Santos, Felipe; Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W.; Randall, Scott; Kraft, Ralph

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev-Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev-Zeldovich (ESZ) objects observed with XMM-Newton. We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.

  7. X-Ray Morphological Analysis of the Planck ESZ Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper wemore » determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.« less

  8. Genetic analysis and CRISPR typing of Salmonella enterica serovar Enteritidis from different sources revealed potential transmission from poultry and pig to human.

    PubMed

    Li, Qiuchun; Wang, Xin; Yin, Kequan; Hu, Yachen; Xu, Haiyan; Xie, Xiaolei; Xu, Lijuan; Fei, Xiao; Chen, Xiang; Jiao, Xinan

    2018-02-02

    Salmonella enterica serovar Enteritidis (S. Enteritidis) is one of the most prevalent serotypes in Salmonella isolated from poultry and the most commonly reported cause of human salmonellosis. In this study, we aimed to assess the genetic diversity of 329 S. Enteritidis strains isolated from different sources from 2009 to 2016 in China. Clustered regularly interspaced short palindromic repeat (CRISPR) typing was used to characterize these 262 chicken clinical isolates, 38 human isolates, 18 pig isolates, six duck isolates, three goose isolates and two isolates of unknown source. A total of 18 Enteritidis CRISPR types (ECTs) were identified, with ECT2, ECT8 and ECT4 as the top three ECTs. CRISPR typing identified ECT2 as the most prevalent ECT, which accounted for 41% of S. Enteritidis strains from all the sources except duck. ECT9 and ECT13 were identified in both pig and human isolates and revealed potential transmission from pig to human. A cluster analysis distributed 18 ECTs, including the top three ECTs, into four lineages with LI as the predominant lineage. Forty-eight out of 329 isolates were subjected to whole genome sequence typing, which divided them into four clusters, with Cluster I as the predominant cluster. Cluster I included 92% (34/37) of strains located in LI identified from the CRISPR typing, confirming the good correspondence between both typing methods. In addition, the CRISPR typing also revealed the close relationship between ECTs and isolated areas, confirming that CRISPR spacers might be obtained by bacteria from the unique phage or plasmid pools in the environment. However, further analysis is needed to determine the function of CRISPR-Cas systems in Salmonella and the relationship between spacers and the environment. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Comparative genomic analysis of clinical and environmental Vibrio vulnificus isolates revealed biotype 3 evolutionary relationships.

    PubMed

    Koton, Yael; Gordon, Michal; Chalifa-Caspi, Vered; Bisharat, Naiel

    2014-01-01

    In 1996 a common-source outbreak of severe soft tissue and bloodstream infections erupted among Israeli fish farmers and fish consumers due to changes in fish marketing policies. The causative pathogen was a new strain of Vibrio vulnificus, named biotype 3, which displayed a unique biochemical and genotypic profile. Initial observations suggested that the pathogen erupted as a result of genetic recombination between two distinct populations. We applied a whole genome shotgun sequencing approach using several V. vulnificus strains from Israel in order to study the pan genome of V. vulnificus and determine the phylogenetic relationship of biotype 3 with existing populations. The core genome of V. vulnificus based on 16 draft and complete genomes consisted of 3068 genes, representing between 59 and 78% of the whole genome of 16 strains. The accessory genome varied in size from 781 to 2044 kbp. Phylogenetic analysis based on whole, core, and accessory genomes displayed similar clustering patterns with two main clusters, clinical (C) and environmental (E), all biotype 3 strains formed a distinct group within the E cluster. Annotation of accessory genomic regions found in biotype 3 strains and absent from the core genome yielded 1732 genes, of which the vast majority encoded hypothetical proteins, phage-related proteins, and mobile element proteins. A total of 1916 proteins (including 713 hypothetical proteins) were present in all human pathogenic strains (both biotype 3 and non-biotype 3) and absent from the environmental strains. Clustering analysis of the non-hypothetical proteins revealed 148 protein clusters shared by all human pathogenic strains; these included transcriptional regulators, arylsulfatases, methyl-accepting chemotaxis proteins, acetyltransferases, GGDEF family proteins, transposases, type IV secretory system (T4SS) proteins, and integrases. Our study showed that V. vulnificus biotype 3 evolved from environmental populations and formed a genetically distinct group within the E-cluster. The unique epidemiological circumstances facilitated disease outbreak and brought this genotype to the attention of the scientific community.

  10. Heavy metal contamination of agricultural soils affected by mining activities around the Ganxi River in Chenzhou, Southern China.

    PubMed

    Ma, Li; Sun, Jing; Yang, Zhaoguang; Wang, Lin

    2015-12-01

    Heavy metal contamination attracted a wide spread attention due to their strong toxicity and persistence. The Ganxi River, located in Chenzhou City, Southern China, has been severely polluted by lead/zinc ore mining activities. This work investigated the heavy metal pollution in agricultural soils around the Ganxi River. The total concentrations of heavy metals were determined by inductively coupled plasma-mass spectrometry. The potential risk associated with the heavy metals in soil was assessed by Nemerow comprehensive index and potential ecological risk index. In both methods, the study area was rated as very high risk. Multivariate statistical methods including Pearson's correlation analysis, hierarchical cluster analysis, and principal component analysis were employed to evaluate the relationships between heavy metals, as well as the correlation between heavy metals and pH, to identify the metal sources. Three distinct clusters have been observed by hierarchical cluster analysis. In principal component analysis, a total of two components were extracted to explain over 90% of the total variance, both of which were associated with anthropogenic sources.

  11. Phenotyping asthma, rhinitis and eczema in MeDALL population-based birth cohorts: an allergic comorbidity cluster.

    PubMed

    Garcia-Aymerich, J; Benet, M; Saeys, Y; Pinart, M; Basagaña, X; Smit, H A; Siroux, V; Just, J; Momas, I; Rancière, F; Keil, T; Hohmann, C; Lau, S; Wahn, U; Heinrich, J; Tischer, C G; Fantini, M P; Lenzi, J; Porta, D; Koppelman, G H; Postma, D S; Berdel, D; Koletzko, S; Kerkhof, M; Gehring, U; Wickman, M; Melén, E; Hallberg, J; Bindslev-Jensen, C; Eller, E; Kull, I; Lødrup Carlsen, K C; Carlsen, K-H; Lambrecht, B N; Kogevinas, M; Sunyer, J; Kauffmann, F; Bousquet, J; Antó, J M

    2015-08-01

    Asthma, rhinitis and eczema often co-occur in children, but their interrelationships at the population level have been poorly addressed. We assessed co-occurrence of childhood asthma, rhinitis and eczema using unsupervised statistical techniques. We included 17 209 children at 4 years and 14 585 at 8 years from seven European population-based birth cohorts (MeDALL project). At each age period, children were grouped, using partitioning cluster analysis, according to the distribution of 23 variables covering symptoms 'ever' and 'in the last 12 months', doctor diagnosis, age of onset and treatments of asthma, rhinitis and eczema; immunoglobulin E sensitization; weight; and height. We tested the sensitivity of our estimates to subject and variable selections, and to different statistical approaches, including latent class analysis and self-organizing maps. Two groups were identified as the optimal way to cluster the data at both age periods and in all sensitivity analyses. The first (reference) group at 4 and 8 years (including 70% and 79% of children, respectively) was characterized by a low prevalence of symptoms and sensitization, whereas the second (symptomatic) group exhibited more frequent symptoms and sensitization. Ninety-nine percentage of children with comorbidities (co-occurrence of asthma, rhinitis and/or eczema) were included in the symptomatic group at both ages. The children's characteristics in both groups were consistent in all sensitivity analyses. At 4 and 8 years, at the population level, asthma, rhinitis and eczema can be classified together as an allergic comorbidity cluster. Future research including time-repeated assessments and biological data will help understanding the interrelationships between these diseases. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  12. Clusters, Groups, and Filaments in the Chandra Deep Field-South up to Redshift 1

    NASA Astrophysics Data System (ADS)

    Dehghan, S.; Johnston-Hollitt, M.

    2014-03-01

    We present a comprehensive structure detection analysis of the 0.3 deg2 area of the MUSYC-ACES field, which covers the Chandra Deep Field-South (CDFS). Using a density-based clustering algorithm on the MUSYC and ACES photometric and spectroscopic catalogs, we find 62 overdense regions up to redshifts of 1, including clusters, groups, and filaments. We also present the detection of a relatively small void of ~10 Mpc2 at z ~ 0.53. All structures are confirmed using the DBSCAN method, including the detection of nine structures previously reported in the literature. We present a catalog of all structures present, including their central position, mean redshift, velocity dispersions, and classification based on their morphological and spectroscopic distributions. In particular, we find 13 galaxy clusters and 6 large groups/small clusters. Comparison of these massive structures with published XMM-Newton imaging (where available) shows that 80% of these structures are associated with diffuse, soft-band (0.4-1 keV) X-ray emission, including 90% of all objects classified as clusters. The presence of soft-band X-ray emission in these massive structures (M 200 >= 4.9 × 1013 M ⊙) provides a strong independent confirmation of our methodology and classification scheme. In the closest two clusters identified (z < 0.13) high-quality optical imaging from the Deep2c field of the Garching-Bonn Deep Survey reveals the cD galaxies and demonstrates that they sit at the center of the detected X-ray emission. Nearly 60% of the clusters, groups, and filaments are detected in the known enhanced density regions of the CDFS at z ~= 0.13, 0.52, 0.68, and 0.73. Additionally, all of the clusters, bar the most distant, are found in these overdense redshift regions. Many of the clusters and groups exhibit signs of ongoing formation seen in their velocity distributions, position within the detected cosmic web, and in one case through the presence of tidally disrupted central galaxies exhibiting trails of stars. These results all provide strong support for hierarchical structure formation up to redshifts of 1.

  13. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience.

    PubMed

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.

  14. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

    PubMed Central

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992

  15. GibbsCluster: unsupervised clustering and alignment of peptide sequences.

    PubMed

    Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

    2017-07-03

    Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Using preoperative unsupervised cluster analysis of chronic rhinosinusitis to inform patient decision and endoscopic sinus surgery outcome.

    PubMed

    Adnane, Choaib; Adouly, Taoufik; Khallouk, Amine; Rouadi, Sami; Abada, Redallah; Roubal, Mohamed; Mahtar, Mohamed

    2017-02-01

    The purpose of this study is to use unsupervised cluster methodology to identify phenotype and mucosal eosinophilia endotype subgroups of patients with medical refractory chronic rhinosinusitis (CRS), and evaluate the difference in quality of life (QOL) outcomes after endoscopic sinus surgery (ESS) between these clusters for better surgical case selection. A prospective cohort study included 131 patients with medical refractory CRS who elected ESS. The Sino-Nasal Outcome Test (SNOT-22) was used to evaluate QOL before and 12 months after surgery. Unsupervised two-step clustering method was performed. One hundred and thirteen subjects were retained in this study: 46 patients with CRS without nasal polyps and 67 patients with nasal polyps. Nasal polyps, gender, mucosal eosinophilia profile, and prior sinus surgery were the most discriminating factors in the generated clusters. Three clusters were identified. A significant clinical improvement was observed in all clusters 12 months after surgery with a reduction of SNOT-22 scores. There was a significant difference in QOL outcomes between clusters; cluster 1 had the worst QOL improvement after FESS in comparison with the other clusters 2 and 3. All patients in cluster 1 presented CRSwNP with the highest mucosal eosinophilia endotype. Clustering method is able to classify CRS phenotypes and endotypes with different associated surgical outcomes.

  17. Statistical analysis and handling of missing data in cluster randomized trials: a systematic review.

    PubMed

    Fiero, Mallorie H; Huang, Shuang; Oren, Eyal; Bell, Melanie L

    2016-02-09

    Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.

  18. An Ecological Analysis of the Effects of Deviant Peer Clustering on Sexual Promiscuity, Problem Behavior, and Childbearing from Early Adolescence to Adulthood: An Enhancement of the Life History Framework

    ERIC Educational Resources Information Center

    Dishion, Thomas J.; Ha, Thao; Veronneau, Marie-Helene

    2012-01-01

    The authors propose that peer relationships should be included in a life history perspective on adolescent problem behavior. Longitudinal analyses were used to examine deviant peer clustering as the mediating link between attenuated family ties, peer marginalization, and social disadvantage in early adolescence and sexual promiscuity in middle…

  19. The Membership and Distance of the Open Cluster Collinder 419

    DTIC Science & Technology

    2010-09-01

    distance based upon new spectral classifications of the brighter members, UBV photometry , and an analysis of astrometric and photometric data from the... photometry of the fainter cluster members in Section 4. Our results are summarized in Section 5. 2. SPECTROSCOPY AND REDDENING OF THE BRIGHTER STARS...including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing

  20. Progressive myoclonic epilepsies

    PubMed Central

    Michelucci, Roberto; Canafoglia, Laura; Striano, Pasquale; Gambardella, Antonio; Magaudda, Adriana; Tinuper, Paolo; La Neve, Angela; Ferlazzo, Edoardo; Gobbi, Giuseppe; Giallonardo, Anna Teresa; Capovilla, Giuseppe; Visani, Elisa; Panzica, Ferruccio; Avanzini, Giuliano; Tassinari, Carlo Alberto; Bianchi, Amedeo; Zara, Federico

    2014-01-01

    Objective: To define the clinical spectrum and etiology of progressive myoclonic epilepsies (PMEs) in Italy using a database developed by the Genetics Commission of the Italian League against Epilepsy. Methods: We collected clinical and laboratory data from patients referred to 25 Italian epilepsy centers regardless of whether a positive causative factor was identified. PMEs of undetermined origins were grouped using 2-step cluster analysis. Results: We collected clinical data from 204 patients, including 77 with a diagnosis of Unverricht-Lundborg disease and 37 with a diagnosis of Lafora body disease; 31 patients had PMEs due to rarer genetic causes, mainly neuronal ceroid lipofuscinoses. Two more patients had celiac disease. Despite extensive investigation, we found no definitive etiology for 57 patients. Cluster analysis indicated that these patients could be grouped into 2 clusters defined by age at disease onset, age at myoclonus onset, previous psychomotor delay, seizure characteristics, photosensitivity, associated signs other than those included in the cardinal definition of PME, and pathologic MRI findings. Conclusions: Information concerning the distribution of different genetic causes of PMEs may provide a framework for an updated diagnostic workup. Phenotypes of the patients with PME of undetermined cause varied widely. The presence of separate clusters suggests that novel forms of PME are yet to be clinically and genetically characterized. PMID:24384641

  1. Mismatch of Posttraumatic Stress Disorder (PTSD) Symptoms and DSM-IV Symptom Clusters in a Cancer Sample: Exploratory Factor Analysis of the PTSD Checklist-Civilian Version

    PubMed Central

    Shelby, Rebecca A.; Golden-Kreutz, Deanna M.; Andersen, Barbara L.

    2007-01-01

    The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV; American Psychiatric Association, 1994a) conceptualization of posttraumatic stress disorder (PTSD) includes three symptom clusters: reexperiencing, avoidance/numbing, and arousal. The PTSD Checklist-Civilian Version (PCL-C) corresponds to the DSM-IV PTSD symptoms. In the current study, we conducted exploratory factor analysis (EFA) of the PCL-C with two aims: (a) to examine whether the PCL-C evidenced the three-factor solution implied by the DSM-IV symptom clusters, and (b) to identify a factor solution for the PCL-C in a cancer sample. Women (N = 148) with Stage II or III breast cancer completed the PCL-C after completion of cancer treatment. We extracted two-, three-, four-, and five-factor solutions using EFA. Our data did not support the DSM-IV PTSD symptom clusters. Instead, EFA identified a four-factor solution including reexperiencing, avoidance, numbing, and arousal factors. Four symptom items, which may be confounded with illness and cancer treatment-related symptoms, exhibited poor factor loadings. Using these symptom items in cancer samples may lead to overdiagnosis of PTSD and inflated rates of PTSD symptoms. PMID:16281232

  2. Progressive myoclonic epilepsies: definitive and still undetermined causes.

    PubMed

    Franceschetti, Silvana; Michelucci, Roberto; Canafoglia, Laura; Striano, Pasquale; Gambardella, Antonio; Magaudda, Adriana; Tinuper, Paolo; La Neve, Angela; Ferlazzo, Edoardo; Gobbi, Giuseppe; Giallonardo, Anna Teresa; Capovilla, Giuseppe; Visani, Elisa; Panzica, Ferruccio; Avanzini, Giuliano; Tassinari, Carlo Alberto; Bianchi, Amedeo; Zara, Federico

    2014-02-04

    To define the clinical spectrum and etiology of progressive myoclonic epilepsies (PMEs) in Italy using a database developed by the Genetics Commission of the Italian League against Epilepsy. We collected clinical and laboratory data from patients referred to 25 Italian epilepsy centers regardless of whether a positive causative factor was identified. PMEs of undetermined origins were grouped using 2-step cluster analysis. We collected clinical data from 204 patients, including 77 with a diagnosis of Unverricht-Lundborg disease and 37 with a diagnosis of Lafora body disease; 31 patients had PMEs due to rarer genetic causes, mainly neuronal ceroid lipofuscinoses. Two more patients had celiac disease. Despite extensive investigation, we found no definitive etiology for 57 patients. Cluster analysis indicated that these patients could be grouped into 2 clusters defined by age at disease onset, age at myoclonus onset, previous psychomotor delay, seizure characteristics, photosensitivity, associated signs other than those included in the cardinal definition of PME, and pathologic MRI findings. Information concerning the distribution of different genetic causes of PMEs may provide a framework for an updated diagnostic workup. Phenotypes of the patients with PME of undetermined cause varied widely. The presence of separate clusters suggests that novel forms of PME are yet to be clinically and genetically characterized.

  3. Cluster Analysis Identifies 3 Phenotypes within Allergic Asthma.

    PubMed

    Sendín-Hernández, María Paz; Ávila-Zarza, Carmelo; Sanz, Catalina; García-Sánchez, Asunción; Marcos-Vadillo, Elena; Muñoz-Bellido, Francisco J; Laffond, Elena; Domingo, Christian; Isidoro-García, María; Dávila, Ignacio

    Asthma is a heterogeneous chronic disease with different clinical expressions and responses to treatment. In recent years, several unbiased approaches based on clinical, physiological, and molecular features have described several phenotypes of asthma. Some phenotypes are allergic, but little is known about whether these phenotypes can be further subdivided. We aimed to phenotype patients with allergic asthma using an unbiased approach based on multivariate classification techniques (unsupervised hierarchical cluster analysis). From a total of 54 variables of 225 patients with well-characterized allergic asthma diagnosed following American Thoracic Society (ATS) recommendation, positive skin prick test to aeroallergens, and concordant symptoms, we finally selected 19 variables by multiple correspondence analyses. Then a cluster analysis was performed. Three groups were identified. Cluster 1 was constituted by patients with intermittent or mild persistent asthma, without family antecedents of atopy, asthma, or rhinitis. This group showed the lowest total IgE levels. Cluster 2 was constituted by patients with mild asthma with a family history of atopy, asthma, or rhinitis. Total IgE levels were intermediate. Cluster 3 included patients with moderate or severe persistent asthma that needed treatment with corticosteroids and long-acting β-agonists. This group showed the highest total IgE levels. We identified 3 phenotypes of allergic asthma in our population. Furthermore, we described 2 phenotypes of mild atopic asthma mainly differentiated by a family history of allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  4. Quantum chemical calculations in the structural analysis of phloretin

    NASA Astrophysics Data System (ADS)

    Gómez-Zavaglia, Andrea

    2009-07-01

    In this work, a conformational search on the molecule of phloretin [2',4',6'-Trihydroxy-3-(4-hydroxyphenyl)-propiophenone] has been performed. The molecule of phloretin has eight dihedral angles, four of them taking part in the carbon backbone and the other four, related with the orientation of the hydroxyl groups. A systematic search involving a random variation of the dihedral angles has been used to generate input structures for the quantum chemical calculations. Calculations at the DFT(B3LYP)/6-311++G(d,p) level of theory permitted the identification of 58 local minima belonging to the C 1 symmetry point group. The molecular structures of the conformers have been analyzed using hierarchical cluster analysis. This method allowed us to group conformers according to their similarities, and thus, to correlate the conformers' stability with structural parameters. The dendrogram obtained from the hierarchical cluster analysis depicted two main clusters. Cluster I included all the conformers with relative energies lower than 25 kJ mol -1 and cluster II, the remaining conformers. The possibility of forming intramolecular hydrogen bonds resulted the main factor contributing for the stability. Accordingly, all conformers depicting intramolecular H-bonds belong to cluster I. These conformations are clearly favored when the carbon backbone is as planar as possible. The values of the νC dbnd O and νOH vibrational modes were compared among all the conformers of phloretin. The redshifts associated with intramolecular H-bonds were correlated with the H-bonds distances and energies.

  5. Multiple imputation methods for bivariate outcomes in cluster randomised trials.

    PubMed

    DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

    2016-09-10

    Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  6. Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis.

    PubMed

    Vavougios, George D; George D, George; Pastaka, Chaido; Zarogiannis, Sotirios G; Gourgoulianis, Konstantinos I

    2016-02-01

    Phenotyping obstructive sleep apnea syndrome's comorbidity has been attempted for the first time only recently. The aim of our study was to determine phenotypes of comorbidity in obstructive sleep apnea syndrome patients employing a data-driven approach. Data from 1472 consecutive patient records were recovered from our hospital's database. Categorical principal component analysis and two-step clustering were employed to detect distinct clusters in the data. Univariate comparisons between clusters included one-way analysis of variance with Bonferroni correction and chi-square tests. Predictors of pairwise cluster membership were determined via a binary logistic regression model. The analyses revealed six distinct clusters: A, 'healthy, reporting sleeping related symptoms'; B, 'mild obstructive sleep apnea syndrome without significant comorbidities'; C1: 'moderate obstructive sleep apnea syndrome, obesity, without significant comorbidities'; C2: 'moderate obstructive sleep apnea syndrome with severe comorbidity, obesity and the exclusive inclusion of stroke'; D1: 'severe obstructive sleep apnea syndrome and obesity without comorbidity and a 33.8% prevalence of hypertension'; and D2: 'severe obstructive sleep apnea syndrome with severe comorbidities, along with the highest Epworth Sleepiness Scale score and highest body mass index'. Clusters differed significantly in apnea-hypopnea index, oxygen desaturation index; arousal index; age, body mass index, minimum oxygen saturation and daytime oxygen saturation (one-way analysis of variance P < 0.0001). Binary logistic regression indicated that older age, greater body mass index, lower daytime oxygen saturation and hypertension were associated independently with an increased risk of belonging in a comorbid cluster. Six distinct phenotypes of obstructive sleep apnea syndrome and its comorbidities were identified. Mapping the heterogeneity of the obstructive sleep apnea syndrome may help the early identification of at-risk groups. Finally, determining predictors of comorbidity for the moderate and severe strata of these phenotypes implies a need to take these factors into account when considering obstructive sleep apnea syndrome treatment options. © 2015 The Authors. Journal of Sleep Research published by John Wiley & Sons Ltd on behalf of European Sleep Research Society.

  7. Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks.

    PubMed

    Dinov, Martin; Leech, Robert

    2017-01-01

    Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.

  8. Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks

    PubMed Central

    Dinov, Martin; Leech, Robert

    2017-01-01

    Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110

  9. Conformational Clusters of Phosphorylated Tyrosine.

    PubMed

    Abdelrasoul, Maha; Ponniah, Komala; Mao, Alice; Warden, Meghan S; Elhefnawy, Wessam; Li, Yaohang; Pascal, Steven M

    2017-12-06

    Tyrosine phosphorylation plays an important role in many cellular and intercellular processes including signal transduction, subcellular localization, and regulation of enzymatic activity. In 1999, Blom et al., using the limited number of protein data bank (PDB) structures available at that time, reported that the side chain structures of phosphorylated tyrosine (pY) are partitioned into two conserved conformational clusters ( Blom, N.; Gammeltoft, S.; Brunak, S. J. Mol. Biol. 1999 , 294 , 1351 - 1362 ). We have used the spectral clustering algorithm to cluster the increasingly growing number of protein structures with pY sites, and have found that the pY residues cluster into three distinct side chain conformations. Two of these pY conformational clusters associate strongly with a narrow range of tyrosine backbone conformation. The novel cluster also highly correlates with the identity of the n + 1 residue, and is strongly associated with a sequential pYpY conformation which places two adjacent pY side chains in a specific relative orientation. Further analysis shows that the three pY clusters are associated with distinct distributions of cognate protein kinases.

  10. A measurement of CMB cluster lensing with SPT and DES year 1 data

    DOE PAGES

    Baxter, E. J.; Raghunathan, S.; Crawford, T. M.; ...

    2018-02-09

    Clusters of galaxies gravitationally lens the cosmic microwave background (CMB) radiation, resulting in a distinct imprint in the CMB on arcminute scales. Measurement of this effect offers a promising way to constrain the masses of galaxy clusters, particularly those at high redshift. We use CMB maps from the South Pole Telescope Sunyaev-Zel'dovich (SZ) survey to measure the CMB lensing signal around galaxy clusters identified in optical imaging from first year observations of the Dark Energy Survey. The cluster catalog used in this analysis contains 3697 members with mean redshift ofmore » $$\\bar{z} = 0.45$$. We detect lensing of the CMB by the galaxy clusters at $$8.1\\sigma$$ significance. Using the measured lensing signal, we constrain the amplitude of the relation between cluster mass and optical richness to roughly $$17\\%$$ precision, finding good agreement with recent constraints obtained with galaxy lensing. The error budget is dominated by statistical noise but includes significant contributions from systematic biases due to the thermal SZ effect and cluster miscentering.« less

  11. Measuring consistent masses for 25 Milky Way globular clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kimmig, Brian; Seth, Anil; Ivans, Inese I.

    2015-02-01

    We present central velocity dispersions, masses, mass-to-light ratios (M/Ls ), and rotation strengths for 25 Galactic globular clusters (GCs). We derive radial velocities of 1951 stars in 12 GCs from single order spectra taken with Hectochelle on the MMT telescope. To this sample we add an analysis of available archival data of individual stars. For the full set of data we fit King models to derive consistent dynamical parameters for the clusters. We find good agreement between single-mass King models and the observed radial dispersion profiles. The large, uniform sample of dynamical masses we derive enables us to examine trendsmore » of M/L with cluster mass and metallicity. The overall values of M/L and the trends with mass and metallicity are consistent with existing measurements from a large sample of M31 clusters. This includes a clear trend of increasing M/L with cluster mass and lower than expected M/Ls for the metal-rich clusters. We find no clear trend of increasing rotation with increasing cluster metallicity suggested in previous work.« less

  12. A measurement of CMB cluster lensing with SPT and DES year 1 data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baxter, E. J.; Raghunathan, S.; Crawford, T. M.

    Clusters of galaxies gravitationally lens the cosmic microwave background (CMB) radiation, resulting in a distinct imprint in the CMB on arcminute scales. Measurement of this effect offers a promising way to constrain the masses of galaxy clusters, particularly those at high redshift. We use CMB maps from the South Pole Telescope Sunyaev-Zel'dovich (SZ) survey to measure the CMB lensing signal around galaxy clusters identified in optical imaging from first year observations of the Dark Energy Survey. The cluster catalog used in this analysis contains 3697 members with mean redshift ofmore » $$\\bar{z} = 0.45$$. We detect lensing of the CMB by the galaxy clusters at $$8.1\\sigma$$ significance. Using the measured lensing signal, we constrain the amplitude of the relation between cluster mass and optical richness to roughly $$17\\%$$ precision, finding good agreement with recent constraints obtained with galaxy lensing. The error budget is dominated by statistical noise but includes significant contributions from systematic biases due to the thermal SZ effect and cluster miscentering.« less

  13. Coherent clusters of inertial particles in homogeneous turbulence

    NASA Astrophysics Data System (ADS)

    Baker, Lucia; Frankel, Ari; Mani, Ali; Coletti, Filippo

    2016-11-01

    Clustering of heavy particles in turbulent flows manifests itself in a broad spectrum of physical phenomena, including sediment transport, cloud formation, and spray combustion. However, a clear topological definition of particle cluster has been lacking, limiting our ability to describe their features and dynamics. Here we introduce a definition of coherent cluster based on self-similarity, and apply it to the distribution of heavy particles in direct numerical simulations of homogeneous isotropic turbulence. We consider a range of particle Stokes numbers, with and without the effect of gravity. Clusters show self-similarity at length scales larger than twice the Kolmogorov length, with a specific fractal dimension. In the absence of gravity, clusters demonstrate a tendency to sample regions of the flow where strain is dominant over vorticity, and to align themselves with the local vorticity vector; when gravity is present, the clusters tend to align themselves with gravity, and their fall speed is different from the average settling velocity. This approach yields observations which are consistent with findings obtained from previous studies while opening new avenues for analysis of the topology and evolution of particle clusters in a wealth of applications.

  14. Cluster folding analysis of 20Ne+16O elastic transfer

    NASA Astrophysics Data System (ADS)

    Hamada, Sh.; Keeley, N.; Kemper, K. W.; Rusek, K.

    2018-05-01

    The available experimental data for the 20Ne+16O system in the energy range where the effect of α -cluster transfer is well observed are reanalyzed using the cluster folding model. The cluster folding potential, which includes both real and imaginary terms, reproduces the data at forward angles and the inclusion of the 16O(20Ne,16O)20Ne elastic transfer process provides a satisfactory description of the backward angles. The spectroscopic factor for the 20Ne→16O+α overlap was extracted and compared with other values from the literature. The present results suggest that the (20Ne,16O ) reaction might be an alternative means of exploring the α -particle structure of nuclei.

  15. Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

    PubMed

    He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

    2011-12-01

    Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.

  16. Cosmology from galaxy clusters as observed by Planck

    NASA Astrophysics Data System (ADS)

    Pierpaoli, Elena

    We propose to use current all-sky data on galaxy clusters in the radio/infrared bands in order to constrain cosmology. This will be achieved performing parameter estimation with number counts and power spectra for galaxy clusters detected by Planck through their Sunyaev—Zeldovich signature. The ultimate goal of this proposal is to use clusters as tracers of matter density in order to provide information about fundamental properties of our Universe, such as the law of gravity on large scale, early Universe phenomena, structure formation and the nature of dark matter and dark energy. We will leverage on the availability of a larger and deeper cluster catalog from the latest Planck data release in order to include, for the first time, the cluster power spectrum in the cosmological parameter determination analysis. Furthermore, we will extend clusters' analysis to cosmological models not yet investigated by the Planck collaboration. These aims require a diverse set of activities, ranging from the characterization of the clusters' selection function, the choice of the cosmological cluster sample to be used for parameter estimation, the construction of mock samples in the various cosmological models with correct correlation properties in order to produce reliable selection functions and noise covariance matrices, and finally the construction of the appropriate likelihood for number counts and power spectra. We plan to make the final code available to the community and compatible with the most widely used cosmological parameter estimation code. This research makes use of data from the NASA satellites Planck and, less directly, Chandra, in order to constrain cosmology; and therefore perfectly fits the NASA objectives and the specifications of this solicitation.

  17. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayedmore » embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  18. EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.

    PubMed

    Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina

    2009-04-01

    In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.

  19. In vitro motility evaluation of aggregated cancer cells by means of automatic image processing.

    PubMed

    De Hauwer, C; Darro, F; Camby, I; Kiss, R; Van Ham, P; Decaesteker, C

    1999-05-01

    Set up of an automatic image processing based method that enables the motility of in vitro aggregated cells to be evaluated for a number of hours. Our biological model included the PC-3 human prostate cancer cell line growing as a monolayer on the bottom of Falcon plastic dishes containing conventional culture media. Our equipment consisted of an incubator, an inverted phase contrast microscope, a Charge Coupled Device (CCD) video camera, and a computer equipped with an image processing software developed in our laboratory. This computer-assisted microscope analysis of aggregated cells enables global cluster motility to be evaluated. This analysis also enables the trajectory of each cell to be isolated and parametrized within a given cluster or, indeed, the trajectories of individual cells outside a cluster. The results show that motility inside a PC-3 cluster is not restricted to slight motion due to cluster expansion, but rather consists of a marked cell movement within the cluster. The proposed equipment enables in vitro aggregated cell motility to be studied. This method can, therefore, be used in pharmacological studies in order to select anti-motility related compounds. The compounds selected by the equipment described could then be tested in vivo as potential anti-metastatic.

  20. A comparative meta-analysis of QTL between intraspecific Gossypium hirsutum interspecific populations and Gossypium hirsutum x Gossypium barbadense populations

    USDA-ARS?s Scientific Manuscript database

    Recent Meta-analysis of quantitative trait loci (QTL) in tetraploid cotton (Gossypium spp.) has identified regions of the genome with high concentrations of various trait QTL called clusters, and specific trait QTL called hotspots. The Meta-analysis included all population types of Gossypium mixing ...

  1. Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

    NASA Astrophysics Data System (ADS)

    Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

    2012-12-01

    The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.

  2. a Morphometric Analysis of HYLARANA SIGNATA Group (previously Known as RANA SIGNATA and RANA PICTURATA) of Malaysia

    NASA Astrophysics Data System (ADS)

    Zainudin, Ramlah; Sazali, Siti Nurlydia

    A study on morphometrical variations of Malaysian Hylarana signata group was conducted to reveal the morphological relationships within the species group. Twenty-seven morphological characters from 18 individuals of H. signata and H. picturata were measured and recorded. The numerical data were analysed using Discriminant Function Analysis in SPSS program version 16.0 and UPGMA Cluster Analysis in Minitab program version 14.0. The results show the complexity clustering between the examined species that might be due to ancient polymorphism of the lineages or cryptic species within the group. Hence, further study should include more representatives in order to fully elucidate the morphological relationships of H. signata group.

  3. Identification of Reliable Components in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS): a Data-Driven Approach across Metabolic Processes.

    PubMed

    Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun

    2015-11-04

    There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.

  4. Galactic Astronomy in the Ultraviolet

    NASA Astrophysics Data System (ADS)

    Rastorguev, A. S.; Sachkov, M. E.; Zabolotskikh, M. V.

    2017-12-01

    We propose a number of prospective observational programs for the ultraviolet space observatory WSO-UV, which seem to be of great importance to modern galactic astronomy. The programs include the search for binary Cepheids; the search and detailed photometric study and the analysis of radial distribution of UV-bright stars in globular clusters ("blue stragglers", blue horizontal-branch stars, RR Lyrae variables, white dwarfs, and stars with UV excesses); the investigation of stellar content and kinematics of young open clusters and associations; the study of spectral energy distribution in hot stars, including calculation of the extinction curves in the UV, optical and NIR; and accurate definition of the relations between the UV-colors and effective temperature. The high angular resolution of the observatory allows accurate astrometric measurements of stellar proper motions and their kinematic analysis.

  5. Mass Profile Decomposition of the Frontier Fields Cluster MACS J0416-2403: Insights on the Dark-matter Inner Profile

    NASA Astrophysics Data System (ADS)

    Annunziatella, M.; Bonamigo, M.; Grillo, C.; Mercurio, A.; Rosati, P.; Caminha, G.; Biviano, A.; Girardi, M.; Gobat, R.; Lombardi, M.; Munari, E.

    2017-12-01

    We present a high-resolution dissection of the two-dimensional total mass distribution in the core of the Hubble Frontier Fields galaxy cluster MACS J0416.1‑2403, at z = 0.396. We exploit HST/WFC3 near-IR (F160W) imaging, VLT/Multi Unit Spectroscopic Explorer spectroscopy, and Chandra data to separate the stellar, hot gas, and dark-matter mass components in the inner 300 kpc of the cluster. We combine the recent results of our refined strong lensing analysis, which includes the contribution of the intracluster gas, with the modeling of the surface brightness and stellar mass distributions of 193 cluster members, of which 144 are spectroscopically confirmed. We find that, moving from 10 to 300 kpc from the cluster center, the stellar to total mass fraction decreases from 12% to 1% and the hot gas to total mass fraction increases from 3% to 9%, resulting in a baryon fraction of approximatively 10% at the outermost radius. We measure that the stellar component represents ∼30%, near the cluster center, and 15%, at larger clustercentric distances, of the total mass in the cluster substructures. We subtract the baryonic mass component from the total mass distribution and conclude that within 30 kpc (∼3 times the effective radius of the brightest cluster galaxy) from the cluster center the surface mass density profile of the total mass and global (cluster plus substructures) dark-matter are steeper and that of the diffuse (cluster) dark-matter is shallower than an NFW profile. Our current analysis does not point to a significant offset between the cluster stellar and dark-matter components. This detailed and robust reconstruction of the inner dark-matter distribution in a larger sample of galaxy clusters will set a new benchmark for different structure formation scenarios.

  6. miR-17-92 Cluster Promotes Cholangiocarcinoma Growth

    PubMed Central

    Zhu, Hanqing; Han, Chang; Lu, Dongdong; Wu, Tong

    2015-01-01

    miR-17-92 is an oncogenic miRNA cluster implicated in the development of several cancers; however, it remains unknown whether the miR-17-92 cluster is able to regulate cholangiocarcinogenesis. This study was designed to investigate the biological functions and molecular mechanisms of the miR-17-92 cluster in cholangiocarcinoma. In situ hybridization and quantitative RT-PCR analysis showed that the miR-17-92 cluster is highly expressed in human cholangiocarcinoma cells compared with the nonneoplastic biliary epithelial cells. Forced overexpression of the miR-17-92 cluster or its members, miR-92a and miR-19a, in cultured human cholangiocarcinoma cells enhanced tumor cell proliferation, colony formation, and invasiveness, in vitro. Overexpression of the miR-17-92 cluster or miR-92a also enhanced cholangiocarcinoma growth in vivo in hairless outbred mice with severe combined immunodeficiency (SHO-PrkdcscidHrhr). The tumor-suppressor, phosphatase and tensin homolog deleted on chromosome 10 (PTEN), was identified as a bona fide target of both miR-92a and miR-19a in cholangiocarcinoma cells via sequence prediction, 3′ untranslated region luciferase activity assay, and Western blot analysis. Accordingly, overexpression of the PTEN open reading frame protein (devoid of 3′ untranslated region) prevented miR-92a– or miR-19a–induced cholangiocarcinoma cell growth. Microarray analysis revealed additional targets of the miR-17-92 cluster in human cholangiocarcinoma cells, including APAF-1 and PRDM2. Moreover, we observed that the expression of the miR-17-92 cluster is regulated by IL-6/Stat3, a key oncogenic signaling pathway pivotal in cholangiocarcinogenesis. Taken together, our findings disclose a novel IL-6/Stat3–miR-17-92 cluster–PTEN signaling axis that is crucial for cholangiocarcinogenesis and tumor progression. PMID:25239565

  7. Structures in the Great Attractor region

    NASA Astrophysics Data System (ADS)

    Radburn-Smith, D. J.; Lucey, J. R.; Woudt, P. A.; Kraan-Korteweg, R. C.; Watson, F. G.

    2006-07-01

    To further our understanding of the Great Attractor (GA), we have undertaken a redshift survey using the 2-degree Field (2dF) instrument on the Anglo-Australian Telescope (AAT). Clusters and filaments in the GA region were targeted with 25 separate pointings resulting in approximately 2600 new redshifts. Targets included poorly studied X-ray clusters from the Clusters in the Zone of Avoidance (CIZA) Catalogue as well as the Cen-Crux and PKS 1343-601 clusters, both of which lie close to the classic GA centre. For nine clusters in the region, we report velocity distributions as well as virial and projected mass estimates. The virial mass of CIZA J1324.7-5736, now identified as a separate structure from the Cen-Crux cluster, is found to be ˜3 × 1014-M⊙, in good agreement with the X-ray inferred mass. In the PKS 1343-601 field, five redshifts are measured of which four are new. An analysis of redshifts from this survey, in combination with those from the literature, reveals the dominant structure in the GA region to be a large filament, which appears to extend from Abell S0639 (l= 281°, b=+11°) to (l˜ 5°, b˜-50°), encompassing the Cen-Crux, CIZA J1324.7-5736, Norma and Pavo II clusters. Behind the Norma cluster at cz˜ 15-000-km-s-1, the masses of four rich clusters are calculated. These clusters (Triangulum Australis, Ara, CIZA J1514.6-4558 and CIZA J1410.4-4246) may contribute to a continued large-scale flow beyond the GA. The results of these observations will be incorporated into a subsequent analysis of the GA flow.

  8. Cluster Analysis Identifies Distinct Pathogenetic Patterns in C3 Glomerulopathies/Immune Complex-Mediated Membranoproliferative GN.

    PubMed

    Iatropoulos, Paraskevas; Daina, Erica; Curreri, Manuela; Piras, Rossella; Valoti, Elisabetta; Mele, Caterina; Bresin, Elena; Gamba, Sara; Alberti, Marta; Breno, Matteo; Perna, Annalisa; Bettoni, Serena; Sabadini, Ettore; Murer, Luisa; Vivarelli, Marina; Noris, Marina; Remuzzi, Giuseppe

    2018-01-01

    Membranoproliferative GN (MPGN) was recently reclassified as alternative pathway complement-mediated C3 glomerulopathy (C3G) and immune complex-mediated membranoproliferative GN (IC-MPGN). However, genetic and acquired alternative pathway abnormalities are also observed in IC-MPGN. Here, we explored the presence of distinct disease entities characterized by specific pathophysiologic mechanisms. We performed unsupervised hierarchical clustering, a data-driven statistical approach, on histologic, genetic, and clinical data and data regarding serum/plasma complement parameters from 173 patients with C3G/IC-MPGN. This approach divided patients into four clusters, indicating the existence of four different pathogenetic patterns. Specifically, this analysis separated patients with fluid-phase complement activation (clusters 1-3) who had low serum C3 levels and a high prevalence of genetic and acquired alternative pathway abnormalities from patients with solid-phase complement activation (cluster 4) who had normal or mildly altered serum C3, late disease onset, and poor renal survival. In patients with fluid-phase complement activation, those in clusters 1 and 2 had massive activation of the alternative pathway, including activation of the terminal pathway, and the highest prevalence of subendothelial deposits, but those in cluster 2 had additional activation of the classic pathway and the highest prevalence of nephrotic syndrome at disease onset. Patients in cluster 3 had prevalent activation of C3 convertase and highly electron-dense intramembranous deposits. In addition, we provide a simple algorithm to assign patients with C3G/IC-MPGN to specific clusters. These distinct clusters may facilitate clarification of disease etiology, improve risk assessment for ESRD, and pave the way for personalized treatment. Copyright © 2018 by the American Society of Nephrology.

  9. Space-time analysis of Down syndrome: results consistent with transient pre-disposing contagious agent.

    PubMed

    McNally, Richard J Q; Rankin, Judith; Shirley, Mark D F; Rushton, Stephen P; Pless-Mulloli, Tanja

    2008-10-01

    Whilst maternal age is an established risk factor for Patau syndrome (trisomy 13), Edwards syndrome (trisomy 18) and Down syndrome (trisomy 21), the aetiology and contribution of genetic and environmental factors remains unclear. We analysed for space-time clustering using high quality fully population-based data from a geographically defined region. The study included all cases of Patau, Edwards and Down syndrome, delivered during 1985-2003 and resident in the former Northern Region of England, including terminations of pregnancy for fetal anomaly. We applied the K-function test for space-time clustering with fixed thresholds of close in space and time using residential addresses at time of delivery. The Knox test was used to indicate the range over which the clustering effect occurred. Tests were repeated using nearest neighbour (NN) thresholds to adjust for variable population density. The study analysed 116 cases of Patau syndrome, 240 cases of Edwards syndrome and 1084 cases of Down syndrome. There was evidence of space-time clustering for Down syndrome (fixed threshold of close in space: P = 0.01, NN threshold: P = 0.02), but little or no clustering for Patau (P = 0.57, P = 0.19) or Edwards (P = 0.37, P = 0.06) syndromes. Clustering of Down syndrome was associated with cases from more densely populated areas and evidence of clustering persisted when cases were restricted to maternal age <40 years. The highly novel space-time clustering for Down syndrome suggests an aetiological role for transient environmental factors, such as infections.

  10. Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters.

    PubMed

    Berenguer, Roberto; Pastor-Juan, María Del Rosario; Canales-Vázquez, Jesús; Castro-García, Miguel; Villas, María Victoria; Legorburo, Francisco Mansilla; Sabater, Sebastià

    2018-04-24

    Purpose To identify the reproducible and nonredundant radiomics features (RFs) for computed tomography (CT). Materials and Methods Two phantoms were used to test RF reproducibility by using test-retest analysis, by changing the CT acquisition parameters (hereafter, intra-CT analysis), and by comparing five different scanners with the same CT parameters (hereafter, inter-CT analysis). Reproducible RFs were selected by using the concordance correlation coefficient (as a measure of the agreement between variables) and the coefficient of variation (defined as the ratio of the standard deviation to the mean). Redundant features were grouped by using hierarchical cluster analysis. Results A total of 177 RFs including intensity, shape, and texture features were evaluated. The test-retest analysis showed that 91% (161 of 177) of the RFs were reproducible according to concordance correlation coefficient. Reproducibility of intra-CT RFs, based on coefficient of variation, ranged from 89.3% (151 of 177) to 43.1% (76 of 177) where the pitch factor and the reconstruction kernel were modified, respectively. Reproducibility of inter-CT RFs, based on coefficient of variation, also showed large material differences, from 85.3% (151 of 177; wood) to only 15.8% (28 of 177; polyurethane). Ten clusters were identified after the hierarchical cluster analysis and one RF per cluster was chosen as representative. Conclusion Many RFs were redundant and nonreproducible. If all the CT parameters are fixed except field of view, tube voltage, and milliamperage, then the information provided by the analyzed RFs can be summarized in only 10 RFs (each representing a cluster) because of redundancy. © RSNA, 2018 Online supplemental material is available for this article.

  11. Characterization of the CPAP-treated patient population in Catalonia

    PubMed Central

    Gavaldá, Ricard; Teixidó, Ivan; Woehrle, Holger; Rué, Montserrat; Solsona, Francesc; Escarrabill, Joan; Colls, Cristina; García-Altés, Anna; de Batlle, Jordi; Sánchez de-la-Torre, Manuel

    2017-01-01

    There are different phenotypes of obstructive sleep apnoea (OSA), many of which have not been characterised. Identification of these different phenotypes is important in defining prognosis and guiding the therapeutic strategy. The aim of this study was to characterise the entire population of continuous positive airway pressure (CPAP)-treated patients in Catalonia and identify specific patient profiles using cluster analysis. A total of 72,217 CPAP-treated patients who contacted the Catalan Health System (CatSalut) during the years 2012 and 2013 were included. Six clusters were identified, classified as “Neoplastic patients” (Cluster 1, 10.4%), “Metabolic syndrome patients” (Cluster 2, 27.7%), “Asthmatic patients” (Cluster 3, 5.8%), “Musculoskeletal and joint disorder patients” (Cluster 4, 10.3%), “Patients with few comorbidities” (Cluster 5, 35.6%) and “Oldest and cardiac disease patients” (Cluster 6, 10.2%). Healthcare facility use and mortality were highest in patients from Cluster 1 and 6. Conversely, patients in Clusters 2 and 4 had low morbidity, mortality and healthcare resource use. Our findings highlight the heterogeneity of CPAP-treated patients, and suggest that OSA is associated with a different prognosis in the clusters identified. These results suggest the need for a comprehensive and individualised approach to CPAP treatment of OSA. PMID:28934303

  12. Cause-specific mortality trends in The Netherlands, 1875-1992: a formal analysis of the epidemiologic transition.

    PubMed

    Wolleswinkel-van den Bosch, J H; Looman, C W; Van Poppel, F W; Mackenbach, J P

    1997-08-01

    The objective of this study is to produce a detailed yet robust description of the epidemiologic transition in The Netherlands. National mortality data on sex, age, cause of death and calendar year (1875-1992) were extracted from official publications. For the entire period, 27 causes of death could be distinguished, while 65 causes (nested within the 27) could be studied from 1901 onwards. Cluster analysis was used to determine groups of causes of death with similar trend curves over a period of time with respect to age- and sex-standardized mortality rates. With respect to the 27 causes, three important clusters were found: (1) infectious diseases which declined rapidly in the late 19th century (e.g. typhoid fever), (2) infectious diseases which showed a less precipitous decline (e.g. respiratory tuberculosis), and (3) non-infectious diseases which showed an increasing trend during most of the period 1875-1992 (e.g. cancer). The 65 causes provided more detail. Seven important clusters were found: four consisted mainly of infectious diseases, including a new cluster that declined rapidly after the Second World War (WW2) (e.g. acute bronchitis/influenza) and a new cluster showing an increasing trend in the 1920s and 1930s before declining in the years thereafter (e.g. appendicitis). Three clusters mainly contained non-infectious diseases, including a new one that declined from 1900 onwards (e.g. cancer of the stomach) and a new one that increased until WW2 but declined thereafter (e.g. chronic rheumatic heart disease). The results suggest that the conventional interpretation of the epidemiologic transition, which assumes a uniform decline of infectious diseases and a uniform increase of non-infectious diseases, needs to be modified.

  13. Vaccines for preventing anthrax.

    PubMed

    Donegan, Sarah; Bellamy, Richard; Gamble, Carrol L

    2009-04-15

    Anthrax is a bacterial zoonosis that occasionally causes human disease and is potentially fatal. Anthrax vaccines include a live-attenuated vaccine, an alum-precipitated cell-free filtrate vaccine, and a recombinant protein vaccine. To evaluate the effectiveness, immunogenicity, and safety of vaccines for preventing anthrax. We searched the following databases (November 2008): Cochrane Infectious Diseases Group Specialized Register; CENTRAL (The Cochrane Library 2008, Issue 4); MEDLINE; EMBASE; LILACS; and mRCT. We also searched reference lists. We included randomized controlled trials (RCTs) of individuals and cluster-RCTs comparing anthrax vaccine with placebo, other (non-anthrax) vaccines, or no intervention; or comparing administration routes or treatment regimens of anthrax vaccine. Two authors independently considered trial eligibility, assessed risk of bias, and extracted data. We presented cases of anthrax and seroconversion rates using risk ratios (RR) and 95% confidence intervals (CI). We summarized immunoglobulin G (IgG) concentrations using geometric means. We carried out a sensitivity analysis to investigate the effect of clustering on the results from one cluster-RCT. No meta-analysis was undertaken. One cluster-RCT (with 157,259 participants) and four RCTs of individuals (1917 participants) met the inclusion criteria. The cluster-RCT from the former USSR showed that, compared with no vaccine, a live-attenuated vaccine (called STI) protected against clinical anthrax whether given by a needleless device (RR 0.16; 102,737 participants, 154 clusters) or the scarification method (RR 0.25; 104,496 participants, 151 clusters). Confidence intervals were statistically significant in unadjusted calculations, but when a small amount of association within clusters was assumed, the differences were not statistically significant. The four RCTs (of individuals) of inactivated vaccines (anthrax vaccine absorbed and recombinant protective antigen) showed a dose response relationship for the anti-protective antigen IgG antibody titre. Intramuscular administration was associated with fewer injection site reactions than subcutaneous injection, and injection site reaction rates were lower when the dosage interval was longer. One cluster-RCT provides limited evidence that a live-attenuated vaccine is effective in preventing cutaneous anthrax. Vaccines based on anthrax antigens are immunogenic in most vaccinees with few adverse events or reactions. Ongoing randomized controlled trials are investigating the immunogenicity and safety of anthrax vaccines.

  14. Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.

    PubMed

    Roche, Kimberly E; Weinstein, Marvin; Dunwoodie, Leland J; Poehlman, William L; Feltus, Frank A

    2018-05-25

    We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.

  15. Jade: using on-demand cloud analysis to give scientists back their flow

    NASA Astrophysics Data System (ADS)

    Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.

    2017-12-01

    The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.

  16. Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003

    NASA Astrophysics Data System (ADS)

    Di Salvo, Roberto; Montalto, Placido; Nunnari, Giuseppe; Neri, Marco; Puglisi, Giuseppe

    2013-02-01

    Time series clustering is an important task in data analysis issues in order to extract implicit, previously unknown, and potentially useful information from a large collection of data. Finding useful similar trends in multivariate time series represents a challenge in several areas including geophysics environment research. While traditional time series analysis methods deal only with univariate time series, multivariate time series analysis is a more suitable approach in the field of research where different kinds of data are available. Moreover, the conventional time series clustering techniques do not provide desired results for geophysical datasets due to the huge amount of data whose sampling rate is different according to the nature of signal. In this paper, a novel approach concerning geophysical multivariate time series clustering is proposed using dynamic time series segmentation and Self Organizing Maps techniques. This method allows finding coupling among trends of different geophysical data recorded from monitoring networks at Mt. Etna spanning from 1996 to 2003, when the transition from summit eruptions to flank eruptions occurred. This information can be used to carry out a more careful evaluation of the state of volcano and to define potential hazard assessment at Mt. Etna.

  17. The hierarchical cluster analysis of oral health attitudes and behaviour using the Hiroshima University--Dental Behavioural Inventory (HU-DBI) among final year dental students in 17 countries.

    PubMed

    Komabayashi, Takashi; Kawamura, Makoto; Kim, Kang-Ju; Wright, Fredrick A C; Declerck, Dominique; Goiâs, Maria do Carmo Matias Freire; Hu, De-Yu; Honkala, Eino; Lévy, Gérard; Kalwitzki, Matthias; Polychronopoulou, Argy; Yip, Kevin Hak-Kong; Eli, Ilana; Kinirons, Martin J; Petti, Stefano; Srisilapanan, Patcharawan; Kwan, Stella Y L; Centore, Linda S

    2006-10-01

    To explore and describe international oral health attitudes/ behaviours among final year dental students. Validated translated versions of the Hiroshima University-Dental Behavioural Inventory (HU-DBI) questionnaire were administered to 1,096 final-year dental students in 17 countries. Hierarchical cluster analysis was conducted within the data to detect patterns and groupings. The overall response rate was 72%. The cluster analysis identified two main groups among the countries. Group 1 consisted of twelve countries: one Oceanic (Australia), one Middle-Eastern (Israel), seven European (Northern Ireland, England, Finland, Greece, Germany, Italy, and France) and three Asian (Korea, Thailand and Malaysia) countries. Group 2 consisted of five countries: one South American (Brazil), one European (Belgium) and three Asian (China, Indonesia and Japan) countries. The percentages of 'agree' responses in three HU-DBI questionnaire items were significantly higher in Group 2 than in Group 1. They include: "I worry about the colour of my teeth."; "I have noticed some white sticky deposits on my teeth."; and "I am bothered by the colour of my gums." Grouping the countries into international clusters yielded useful information for dentistry and dental education.

  18. Phylogeny of kemenyan (Styrax sp.) from North Sumatra based on morphological characters

    NASA Astrophysics Data System (ADS)

    Susilowati, A.; Kholibrina, C. R.; Rachmat, H. H.; Munthe, M. A.

    2018-02-01

    Kemenyan is the most famous local tree species from North Sumatra. Kemenyan is known as rosin producer that very valuable for pharmacheutical, cosmetic, food preservatives and vernis. Based on its history, there were only two species of kemenyan those were kemenyan durame and toba, but in its the natural distribution we also found others species showing different characteristics with previously known ones. The objectives of this research were:The objectives of this research were: (1). To determine the morphological diversity of kemenyan in North Sumatra and (2). To determine phylogeny clustering based on the morphological characters. Data was collected from direct observation and morphological characterization, based on purposive sampling technique to those samples trees atPakpak Bharat, North Sumatra. Morphological characters were examined using descriptive analysis, phenotypic variability using standard deviation, and cluster analysis. The result showed that there was a difference between 4 species kemenyen (batak, minyak, durame and toba) according to 75 observed characters including flower, fruits, leaf, stem, bark, crown type, wood and the resin. Analysis and both quantitative and qualitative characters kemenyan clustered into two groups. In which, kemenyan toba separated with other clusters.

  19. Cluster analysis of stress corrosion mechanisms for steel wires used in bridge cables through acoustic emission particle swarm optimization.

    PubMed

    Li, Dongsheng; Yang, Wei; Zhang, Wenyao

    2017-05-01

    Stress corrosion is the major failure type of bridge cable damage. The acoustic emission (AE) technique was applied to monitor the stress corrosion process of steel wires used in bridge cable structures. The damage evolution of stress corrosion in bridge cables was obtained according to the AE characteristic parameter figure. A particle swarm optimization cluster method was developed to determine the relationship between the AE signal and stress corrosion mechanisms. Results indicate that the main AE sources of stress corrosion in bridge cables included four types: passive film breakdown and detachment of the corrosion product, crack initiation, crack extension, and cable fracture. By analyzing different types of clustering data, the mean value of each damage pattern's AE characteristic parameters was determined. Different corrosion damage source AE waveforms and the peak frequency were extracted. AE particle swarm optimization cluster analysis based on principal component analysis was also proposed. This method can completely distinguish the four types of damage sources and simplifies the determination of the evolution process of corrosion damage and broken wire signals. Copyright © 2017. Published by Elsevier B.V.

  20. COOL CORE CLUSTERS FROM COSMOLOGICAL SIMULATIONS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rasia, E.; Borgani, S.; Murante, G.

    2015-11-01

    We present results obtained from a set of cosmological hydrodynamic simulations of galaxy clusters, aimed at comparing predictions with observational data on the diversity between cool-core (CC) and non-cool-core (NCC) clusters. Our simulations include the effects of stellar and active galactic nucleus (AGN) feedback and are based on an improved version of the smoothed particle hydrodynamics code GADGET-3, which ameliorates gas mixing and better captures gas-dynamical instabilities by including a suitable artificial thermal diffusion. In this Letter, we focus our analysis on the entropy profiles, the primary diagnostic we used to classify the degree of cool-coreness of clusters, and themore » iron profiles. In keeping with observations, our simulated clusters display a variety of behaviors in entropy profiles: they range from steadily decreasing profiles at small radii, characteristic of CC systems, to nearly flat core isentropic profiles, characteristic of NCC systems. Using observational criteria to distinguish between the two classes of objects, we find that they occur in similar proportions in both simulations and observations. Furthermore, we also find that simulated CC clusters have profiles of iron abundance that are steeper than those of NCC clusters, which is also in agreement with observational results. We show that the capability of our simulations to generate a realistic CC structure in the cluster population is due to AGN feedback and artificial thermal diffusion: their combined action allows us to naturally distribute the energy extracted from super-massive black holes and to compensate for the radiative losses of low-entropy gas with short cooling time residing in the cluster core.« less

  1. Stability and change in adolescent spirituality/religiosity: a person-centered approach.

    PubMed

    Good, Marie; Willoughby, Teena; Busseri, Michael A

    2011-03-01

    Although there has been a substantial increase over the past decade in studies that have examined the psychosocial correlates of spirituality/religiosity in adolescence, very little is known about spirituality/religiosity as a domain of development in its own right. To address this limitation, the authors identified configurations of multiple dimensions of spirituality/religiosity across 2 time points with an empirical classification procedure (cluster analysis) and assessed development in these configurations at the sample and individual level. Participants included 756 predominately Canadian-born adolescents (53% female, 47% male) from southern Ontario, Canada, who completed a survey in Grade 11 (M age = 16.41 years) and Grade 12 (M age = 17.36 years). Measures included religious activity involvement, enjoyment of religious activities, the Spiritual Transcendence Index, wondering about spiritual issues, frequency of prayer, and frequency of meditation. Sample-level development (structural stability and change) was assessed by examining whether the structural configurations of the clusters were consistent over time. Individual-level development was assessed by examining intraindividual stability and change in cluster membership over time. Results revealed that a five cluster-solution was optimal at both grades. Clusters were identified as aspiritual/irreligious, disconnected wonderers, high institutional and personal, primarily personal, and meditators. With the exception of the high institutional and personal cluster, the cluster structures were stable over time. There also was significant intraindividual stability in all clusters over time; however, a significant proportion of individuals classified as high institutional and personal in Grade 11 moved into the primarily personal cluster in Grade 12. PsycINFO Database Record (c) 2011 APA, all rights reserved.

  2. Cool Core Clusters from Cosmological Simulations

    NASA Astrophysics Data System (ADS)

    Rasia, E.; Borgani, S.; Murante, G.; Planelles, S.; Beck, A. M.; Biffi, V.; Ragone-Figueroa, C.; Granato, G. L.; Steinborn, L. K.; Dolag, K.

    2015-11-01

    We present results obtained from a set of cosmological hydrodynamic simulations of galaxy clusters, aimed at comparing predictions with observational data on the diversity between cool-core (CC) and non-cool-core (NCC) clusters. Our simulations include the effects of stellar and active galactic nucleus (AGN) feedback and are based on an improved version of the smoothed particle hydrodynamics code GADGET-3, which ameliorates gas mixing and better captures gas-dynamical instabilities by including a suitable artificial thermal diffusion. In this Letter, we focus our analysis on the entropy profiles, the primary diagnostic we used to classify the degree of cool-coreness of clusters, and the iron profiles. In keeping with observations, our simulated clusters display a variety of behaviors in entropy profiles: they range from steadily decreasing profiles at small radii, characteristic of CC systems, to nearly flat core isentropic profiles, characteristic of NCC systems. Using observational criteria to distinguish between the two classes of objects, we find that they occur in similar proportions in both simulations and observations. Furthermore, we also find that simulated CC clusters have profiles of iron abundance that are steeper than those of NCC clusters, which is also in agreement with observational results. We show that the capability of our simulations to generate a realistic CC structure in the cluster population is due to AGN feedback and artificial thermal diffusion: their combined action allows us to naturally distribute the energy extracted from super-massive black holes and to compensate for the radiative losses of low-entropy gas with short cooling time residing in the cluster core.

  3. Pt-Zn Clusters on Stoichiometric MgO(100) and TiO2(110): Dramatically Different Sintering Behavior

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dadras, Mostafa J.; Shen, Lu; Alexandrova, Anastassia N.

    2015-03-02

    Zn was suggested to be a promising additive to Pt in the catalysis of dehydrogenation reactions. In this work, mixed Pt-Zn clusters deposited on two simple oxides, MgO(100) and TiO2(110), were investigated. The stability of these systems against cluster sintering, one of the major mechanisms of catalyst deactivation, is simulated using a Metropolis Monte Carlo scheme under the assumption of the Ostwald ripening mechanism. Particle migration, association to and dissociation from clusters, and evaporation and redeposition of monomers were all included in the simulations. Simulations are done at several high temperatures relevant to reactions of catalytic dehydrogenation. The effect ofmore » temperature is included via both the Metropolis algorithm and the Boltzmann-weighted populations of the global and thermally accessible local minima on the density functional theory potential energy surfaces of clusters of all sizes and compositions up to tetramers. On both surfaces, clusters are shown to sinter quite rapidly. However, the resultant compositions of the clusters most resistant to sintering are quite different on the two supports. On TiO2(110), Pt and Zn appear to phase separate, preferentially forming clusters rich in just one or the other metal. On MgO(100), Pt and Zn remain well-mixed and form a range of bimetallic clusters of various compositions that appear relatively stable. However, Zn is more easily lost from MgO through evaporation. These phenomena were rationalized by several means of chemical bonding analysis.« less

  4. Characterisation of colletotrichum species associated with anthracnose of banana.

    PubMed

    Zakaria, Latiffah; Sahak, Shamsiah; Zakaria, Maziah; Salleh, Baharuddin

    2009-12-01

    A total of 13 Colletotrichum isolates were obtained from different banana cultivars (Musa spp.) with symptoms of anthracnose. Colletotrichum isolates from anthracnose of guava (Psidium guajava) and water apple (Syzygium aqueum) were also included in this study. Based on cultural and morphological characteristics, isolates from banana and guava were identified as Colletotrichum musae and from water apple as Colletotrichum gloeosporiodes. Isolates of C. musae from banana and guava had similar banding patterns in a randomly amplified polymorphic DNA (RAPD) analysis with four random primers, and they clustered together in a UPGMA analysis. C. gloeosporiodes from water apple was clustered in a separate cluster. Based on the present study, C. musae was frequently isolated from anthracnose of different banana cultivars and the RAPD banding patterns of C. musae isolates were highly similar but showed intraspecific variations.

  5. Characterisation of Colletotrichum Species Associated with Anthracnose of Banana

    PubMed Central

    Zakaria, Latiffah; Sahak, Shamsiah; Zakaria, Maziah; Salleh, Baharuddin

    2009-01-01

    A total of 13 Colletotrichum isolates were obtained from different banana cultivars (Musa spp.) with symptoms of anthracnose. Colletotrichum isolates from anthracnose of guava (Psidium guajava) and water apple (Syzygium aqueum) were also included in this study. Based on cultural and morphological characteristics, isolates from banana and guava were identified as Colletotrichum musae and from water apple as Colletotrichum gloeosporiodes. Isolates of C. musae from banana and guava had similar banding patterns in a randomly amplified polymorphic DNA (RAPD) analysis with four random primers, and they clustered together in a UPGMA analysis. C. gloeosporiodes from water apple was clustered in a separate cluster. Based on the present study, C. musae was frequently isolated from anthracnose of different banana cultivars and the RAPD banding patterns of C. musae isolates were highly similar but showed intraspecific variations. PMID:24575184

  6. Bias and inference from misspecified mixed-effect models in stepped wedge trial analysis.

    PubMed

    Thompson, Jennifer A; Fielding, Katherine L; Davey, Calum; Aiken, Alexander M; Hargreaves, James R; Hayes, Richard J

    2017-10-15

    Many stepped wedge trials (SWTs) are analysed by using a mixed-effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common-to-all or varied-between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within-cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within-cluster comparisons in the standard model. In the SWTs simulated here, mixed-effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within-cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  7. Bias and inference from misspecified mixed‐effect models in stepped wedge trial analysis

    PubMed Central

    Fielding, Katherine L.; Davey, Calum; Aiken, Alexander M.; Hargreaves, James R.; Hayes, Richard J.

    2017-01-01

    Many stepped wedge trials (SWTs) are analysed by using a mixed‐effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common‐to‐all or varied‐between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within‐cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within‐cluster comparisons in the standard model. In the SWTs simulated here, mixed‐effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within‐cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28556355

  8. Use of Conserved Randomly Amplified Polymorphic DNA (RAPD) Fragments and RAPD Pattern for Characterization of Lactobacillus fermentum in Ghanaian Fermented Maize Dough

    PubMed Central

    Hayford, Alice E.; Petersen, Anne; Vogensen, Finn K.; Jakobsen, Mogens

    1999-01-01

    The present work describes the use of randomly amplified polymorphic DNA (RAPD) for the characterization of 172 dominant Lactobacillus isolates from present and previous studies of Ghanaian maize fermentation. Heterofermentative lactobacilli dominate the fermentation flora, since approximately 85% of the isolates belong to this group. Cluster analysis of the RAPD profiles obtained showed the presence of two main clusters. Cluster 1 included Lactobacillus fermentum, whereas cluster 2 comprised the remaining Lactobacillus spp. The two distinct clusters emerged at the similarity level of <50%. All isolates in cluster 1 showed similarity in their RAPD profile to the reference strains of L. fermentum included in the study. These isolates, yielding two distinct bands of approximately 695 and 773 bp with the primers used, were divided into four subclusters, indicating that several strains are involved in the fermentation and remain dominant throughout the process. The two distinct RAPD fragments were cloned, sequenced, and used as probes in Southern hybridization experiments. With one exception, Lactobacillus reuteri LMG 13045, the probes hybridized only to fragments of different sizes in EcoRI-digested chromosomal DNA of L. fermentum strains, thus indicating the specificity of the probes and variation within the L. fermentum isolates. PMID:10388723

  9. Comparative Investigation of Shared Filesystems for the LHCb Online Cluster

    NASA Astrophysics Data System (ADS)

    Vijay Kartik, S.; Neufeld, Niko

    2012-12-01

    This paper describes the investigative study undertaken to evaluate shared filesystem performance and suitability in the LHCb Online environment. Particular focus is given to the measurements and field tests designed and performed on an in-house OpenAFS setup; related comparisons with NFSv4 and GPFS (a clustered filesystem from IBM) are presented. The motivation for the investigation and the test setup arises from the need to serve common user-space like home directories, experiment software and control areas, and clustered log areas. Since the operational requirements on such user-space are stringent in terms of read-write operations (in frequency and access speed) and unobtrusive data relocation, test results are presented with emphasis on file-level performance, stability and “high-availability” of the shared filesystems. Use cases specific to the experiment operation in LHCb, including the specific handling of shared filesystems served to a cluster of 1500 diskless nodes, are described. Issues of prematurely expiring authenticated sessions are explicitly addressed, keeping in mind long-running analysis jobs on the Online cluster. In addition, quantitative test results are also presented with alternatives including NFSv4. Comparative measurements of filesystem performance benchmarks are presented, which are seen to be used as reference for decisions on potential migration of the current storage solution deployed in the LHCb online cluster.

  10. Bruker biotyper matrix-assisted laser desorption ionization-time of flight mass spectrometry system for identification of Nocardia, Rhodococcus, Kocuria, Gordonia, Tsukamurella, and Listeria species.

    PubMed

    Hsueh, Po-Ren; Lee, Tai-Fen; Du, Shin-Hei; Teng, Shih-Hua; Liao, Chun-Hsing; Sheng, Wang-Hui; Teng, Lee-Jene

    2014-07-01

    We evaluated whether the Bruker Biotyper matrix-associated laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) system provides accurate species-level identifications of 147 isolates of aerobically growing Gram-positive rods (GPRs). The bacterial isolates included Nocardia (n = 74), Listeria (n = 39), Kocuria (n = 15), Rhodococcus (n = 10), Gordonia (n = 7), and Tsukamurella (n = 2) species, which had all been identified by conventional methods, molecular methods, or both. In total, 89.7% of Listeria monocytogenes, 80% of Rhodococcus species, 26.7% of Kocuria species, and 14.9% of Nocardia species (n = 11, all N. nova and N. otitidiscaviarum) were correctly identified to the species level (score values, ≥ 2.0). A clustering analysis of spectra generated by the Bruker Biotyper identified six clusters of Nocardia species, i.e., cluster 1 (N. cyriacigeorgica), cluster 2 (N. brasiliensis), cluster 3 (N. farcinica), cluster 4 (N. puris), cluster 5 (N. asiatica), and cluster 6 (N. beijingensis), based on the six peaks generated by ClinProTools with the genetic algorithm, i.e., m/z 2,774.477 (cluster 1), m/z 5,389.792 (cluster 2), m/z 6,505.720 (cluster 3), m/z 5,428.795 (cluster 4), m/z 6,525.326 (cluster 5), and m/z 16,085.216 (cluster 6). Two clusters of L. monocytogenes spectra were also found according to the five peaks, i.e., m/z 5,594.85, m/z 6,184.39, and m/z 11,187.31, for cluster 1 (serotype 1/2a) and m/z 5,601.21 and m/z 11,199.33 for cluster 2 (serotypes 1/2b and 4b). The Bruker Biotyper system was unable to accurately identify Nocardia (except for N. nova and N. otitidiscaviarum), Tsukamurella, or Gordonia species. Continuous expansion of the MALDI-TOF MS databases to include more GPRs is necessary. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  11. Subspace K-means clustering.

    PubMed

    Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

    2013-12-01

    To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).

  12. Geographic Clusters of Basal Cell Carcinoma in a Northern California Health Plan Population.

    PubMed

    Ray, G Thomas; Kulldorff, Martin; Asgari, Maryam M

    2016-11-01

    Rates of skin cancer, including basal cell carcinoma (BCC), the most common cancer, have been increasing over the past 3 decades. A better understanding of geographic clustering of BCCs can help target screening and prevention efforts. Present a methodology to identify spatial clusters of BCC and identify such clusters in a northern California population. This retrospective study used a BCC registry to determine rates of BCC by census block group, and used spatial scan statistics to identify statistically significant geographic clusters of BCCs, adjusting for age, sex, and socioeconomic status. The study population consisted of white, non-Hispanic members of Kaiser Permanente Northern California during years 2011 and 2012. Statistically significant geographic clusters of BCC as determined by spatial scan statistics. Spatial analysis of 28 408 individuals who received a diagnosis of at least 1 BCC in 2011 or 2012 revealed distinct geographic areas with elevated BCC rates. Among the 14 counties studied, BCC incidence ranged from 661 to 1598 per 100 000 person-years. After adjustment for age, sex, and neighborhood socioeconomic status, a pattern of 5 discrete geographic clusters emerged, with a relative risk ranging from 1.12 (95% CI, 1.03-1.21; P = .006) for a cluster in eastern Sonoma and northern Napa Counties to 1.40 (95% CI, 1.15-1.71; P < .001) for a cluster in east Contra Costa and west San Joaquin Counties, compared with persons residing outside that cluster. In this study of a northern California population, we identified several geographic clusters with modestly elevated incidence of BCC. Knowledge of geographic clusters can help inform future research on the underlying etiology of the clustering including factors related to the environment, health care access, or other characteristics of the resident population, and can help target screening efforts to areas of highest yield.

  13. A New Approach to Identify High Burnout Medical Staffs by Kernel K-Means Cluster Analysis in a Regional Teaching Hospital in Taiwan

    PubMed Central

    Lee, Yii-Ching; Huang, Shian-Chang; Huang, Chih-Hsuan; Wu, Hsin-Hung

    2016-01-01

    This study uses kernel k-means cluster analysis to identify medical staffs with high burnout. The data collected in October to November 2014 are from the emotional exhaustion dimension of the Chinese version of Safety Attitudes Questionnaire in a regional teaching hospital in Taiwan. The number of effective questionnaires including the entire staffs such as physicians, nurses, technicians, pharmacists, medical administrators, and respiratory therapists is 680. The results show that 8 clusters are generated by kernel k-means method. Employees in clusters 1, 4, and 5 are relatively in good conditions, whereas employees in clusters 2, 3, 6, 7, and 8 need to be closely monitored from time to time because they have relatively higher degree of burnout. When employees with higher degree of burnout are identified, the hospital management can take actions to improve the resilience, reduce the potential medical errors, and, eventually, enhance the patient safety. This study also suggests that the hospital management needs to keep track of medical staffs’ fatigue conditions and provide timely assistance for burnout recovery through employee assistance programs, mindfulness-based stress reduction programs, positivity currency buildup, and forming appreciative inquiry groups. PMID:27895218

  14. Multidimensional analysis of peak pain symptoms and experiences.

    PubMed

    Kinsman, R; Dirks, J F; Wunder, J; Carbaugh, R; Stieg, R

    1989-01-01

    Peak pain symptoms and experiences were explored within a group of 243 intractable pain patients seen consecutively at a pain clinic. Using a 5-point scale, patients rated the frequency with which 99 symptom adjectives occurred when their pain was at its worst. Key cluster analysis identified 11 reliable, conceptually clear symptom clusters: Four affective symptom categories, Angry Depression, Diminished Drive, Intropunitive Depression and Anxiety, describing emotional states concomitant with peak pain; two somatic symptom categories, Ecto-Pain and Endo-Pain, describing surface and deep bodily pain, respectively; and five additional symptom categories including Cognitive Dysfunction, Sleep Disturbance, Fatigue, Withdrawal and Disequilibrium. Among the affective symptom clusters, symptoms of Angry Depression were reported to occur frequently by 32% of the patients while only 11% reported the frequent occurrence of Intropunitive Depression. For the somatic symptom clusters, 25 and 52% reported the frequent occurrence of Ecto-Pain and Endo-Pain, respectively. Pain reports measured by Ecto-Pain and Endo-Pain were nearly independent of all other symptom categories. The results suggest that the experiential context of pain differs widely among intractable pain patients. The study derived a Pain Symptom Checklist to measure each symptom cluster as one way to identify coping styles among chronic pain patients.

  15. Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis

    DTIC Science & Technology

    2015-01-01

    ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large

  16. Conditions for the Evolution of Gene Clusters in Bacterial Genomes

    PubMed Central

    Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.

    2010-01-01

    Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992

  17. Transcriptional profiles of Arabidopsis stomataless mutants reveal developmental and physiological features of life in the absence of stomata

    PubMed Central

    de Marcos, Alberto; Triviño, Magdalena; Pérez-Bueno, María Luisa; Ballesteros, Isabel; Barón, Matilde; Mena, Montaña; Fenoll, Carmen

    2015-01-01

    Loss of function of the positive stomata development regulators SPCH or MUTE in Arabidopsis thaliana renders stomataless plants; spch-3 and mute-3 mutants are extreme dwarfs, but produce cotyledons and tiny leaves, providing a system to interrogate plant life in the absence of stomata. To this end, we compared their cotyledon transcriptomes with that of wild-type plants. K-means clustering of differentially expressed genes generated four clusters: clusters 1 and 2 grouped genes commonly regulated in the mutants, while clusters 3 and 4 contained genes distinctively regulated in mute-3. Classification in functional categories and metabolic pathways of genes in clusters 1 and 2 suggested that both mutants had depressed secondary, nitrogen and sulfur metabolisms, while only a few photosynthesis-related genes were down-regulated. In situ quenching analysis of chlorophyll fluorescence revealed limited inhibition of photosynthesis. This and other fluorescence measurements matched the mutant transcriptomic features. Differential transcriptomes of both mutants were enriched in growth-related genes, including known stomata development regulators, which paralleled their epidermal phenotypes. Analysis of cluster 3 was not informative for developmental aspects of mute-3. Cluster 4 comprised genes differentially up−regulated in mute−3, 35% of which were direct targets for SPCH and may relate to the unique cell types of mute−3. A screen of T-DNA insertion lines in genes differentially expressed in the mutants identified a gene putatively involved in stomata development. A collection of lines for conditional overexpression of transcription factors differentially expressed in the mutants rendered distinct epidermal phenotypes, suggesting that these proteins may be novel stomatal development regulators. Thus, our transcriptome analysis represents a useful source of new genes for the study of stomata development and for characterizing physiology and growth in the absence of stomata. PMID:26157447

  18. Whole Genome Sequence and Phylogenetic Analysis Show Helicobacter pylori Strains from Latin America Have Followed a Unique Evolution Pathway

    PubMed Central

    Muñoz-Ramírez, Zilia Y.; Mendez-Tenorio, Alfonso; Kato, Ikuko; Bravo, Maria M.; Rizzato, Cosmeri; Thorell, Kaisa; Torres, Roberto; Aviles-Jimenez, Francisco; Camorlinga, Margarita; Canzian, Federico; Torres, Javier

    2017-01-01

    Helicobacter pylori (HP) genetics may determine its clinical outcomes. Despite high prevalence of HP infection in Latin America (LA), there have been no phylogenetic studies in the region. We aimed to understand the structure of HP populations in LA mestizo individuals, where gastric cancer incidence remains high. The genome of 107 HP strains from Mexico, Nicaragua and Colombia were analyzed with 59 publicly available worldwide genomes. To study bacterial relationship on whole genome level we propose a virtual hybridization technique using thousands of high-entropy 13 bp DNA probes to generate fingerprints. Phylogenetic virtual genome fingerprint (VGF) was compared with Multi Locus Sequence Analysis (MLST) and with phylogenetic analyses of cagPAI virulence island sequences. With MLST some Nicaraguan and Mexican strains clustered close to Africa isolates, whereas European isolates were spread without clustering and intermingled with LA isolates. VGF analysis resulted in increased resolution of populations, separating European from LA strains. Furthermore, clusters with exclusively Colombian, Mexican, or Nicaraguan strains were observed, where the Colombian cluster separated from Europe, Asia, and Africa, while Nicaraguan and Mexican clades grouped close to Africa. In addition, a mixed large LA cluster including Mexican, Colombian, Nicaraguan, Peruvian, and Salvadorian strains was observed; all LA clusters separated from the Amerind clade. With cagPAI sequence analyses LA clades clearly separated from Europe, Asia and Amerind, and Colombian strains formed a single cluster. A NeighborNet analyses suggested frequent and recent recombination events particularly among LA strains. Results suggests that in the new world, H. pylori has evolved to fit mestizo LA populations, already 500 years after the Spanish colonization. This co-adaption may account for regional variability in gastric cancer risk. PMID:28293542

  19. Mapping the Dark Matter Distribution of the Merging Galaxy Cluster Abell 115

    NASA Astrophysics Data System (ADS)

    Kim, Mincheol; Jee, Myungkook James; Forman, William; Golovich, Nathan; van Weeren, Reinout

    2018-01-01

    The colliding galaxy cluster Abell 115 shows a number of clear merging features including radio relics, double X-ray peaks, and offsets between the cluster member galaxies and the X-ray distributions. In order to constrain the merging scenario of this complex system, it is critical to know where the dark matter is. We present a high-fidelity weak-lensing analysis of the system using a state-of-the-art method that robustly models the detailed PSF variations. Our mass reconstruction reveals two distinct mass peaks. Through a careful bootstrapping analysis, we demonstrate that the positions of these two mass peaks are highly consistent with those of the cluster galaxies, although the comparison with the X-ray emission shows that the mass peaks lead the X-ray peaks. We obtain the first weak-lensing mass of each subcluster by simultaneously fitting two NFW profiles, as well as the total mass of the system. Interestingly, the total mass is a few factors lower than the published dynamical mass based on velocity dispersion. This large mass discrepancy may be attributed to a significant disruption of the cluster galaxy orbits due to the violent merger. Our preliminary analysis indicates that the two subclusters might have experienced a first off-axis collision a few Gyrs ago and might be now returning for a second collision.

  20. Partial wave analysis of the reaction p(3.5 GeV) + p → pK + Λ to search for the "ppK –" bound state

    DOE PAGES

    Agakishiev, G.; Arnold, O.; Belver, D.; ...

    2015-01-26

    Employing the Bonn–Gatchina partial wave analysis framework (PWA), we have analyzed HADES data of the reaction p(3.5GeV) + p → pK +Λ. This reaction might contain information about the kaonic cluster “ppK -” (with quantum numbers J P=0 - and total isospin I =1/2) via its decay into pΛ. Due to interference effects in our coherent description of the data, a hypothetical K ¯NN (or, specifically “ppK -”) cluster signal need not necessarily show up as a pronounced feature (e.g. a peak) in an invariant mass spectrum like pΛ. Our PWA analysis includes a variety of resonant and non-resonant intermediatemore » states and delivers a good description of our data (various angular distributions and two-hadron invariant mass spectra) without a contribution of a K ¯NN cluster. At a confidence level of CL s=95% such a cluster cannot contribute more than 2–12% to the total cross section with a pK + Λ final state, which translates into a production cross-section between 0.7 μb and 4.2 μb, respectively. The range of the upper limit depends on the assumed cluster mass, width and production process.« less

  1. Benthic foraminiferal assemblages as bio-indicators of metals contamination in sediments, Qarun Lake as a case study, Egypt

    NASA Astrophysics Data System (ADS)

    Abd El Naby, Ahmed; Al Menoufy, Safia; Gad, Ahmed

    2018-03-01

    Qarun Lake, in the Fayoum Depression of the Western Desert of Egypt, lies within the deepest area in the River Nile flood plain. The drainage water in the Qarun Lake is derived from the discharge of the natural and artificial drainage systems in the Fayoum. Mixed domestic and agricultural pollutants, including heavy metals, nitrates, phosphates, sulfates and pesticides, are discharged into Qarun Lake. Forty-six samples, collected from the undisturbed layer of sediments were used for benthic foraminiferal analysis. Concentrations of some selected trace metal elements (Cd, Co, Cr, Cu, Fe, Mn, Ni, Pb, Sr, V, and Zn) were also determined. Statistical analysis of the abiotic variables (Texture distribution of sediments, Physico-chemical parameters, and metals concentrations) and of the biotic variables (distribution of benthic foraminiferal species) were also performed. The Q-mode cluster analysis of benthic foraminiferal distribution has provided evidence that the Qarun Lake can be subdivided into two cluster groups (A and B), reflecting environmental changes in the lake ecosystem. Cluster B can also be subdivided into two sub-clusters (B1 and B2). The presence of only pollution tolerant taxa with higher faunal density and lower diversity and the absence of the other foraminiferal assemblages in cluster A were attributed to the high concentration of trace metal elements and the strong environmental stress at the eastern and central parts of the Qarun Lake.

  2. Autonomic specificity of basic emotions: evidence from pattern classification and cluster analysis.

    PubMed

    Stephens, Chad L; Christie, Israel C; Friedman, Bruce H

    2010-07-01

    Autonomic nervous system (ANS) specificity of emotion remains controversial in contemporary emotion research, and has received mixed support over decades of investigation. This study was designed to replicate and extend psychophysiological research, which has used multivariate pattern classification analysis (PCA) in support of ANS specificity. Forty-nine undergraduates (27 women) listened to emotion-inducing music and viewed affective films while a montage of ANS variables, including heart rate variability indices, peripheral vascular activity, systolic time intervals, and electrodermal activity, were recorded. Evidence for ANS discrimination of emotion was found via PCA with 44.6% of overall observations correctly classified into the predicted emotion conditions, using ANS variables (z=16.05, p<.001). Cluster analysis of these data indicated a lack of distinct clusters, which suggests that ANS responses to the stimuli were nomothetic and stimulus-specific rather than idiosyncratic and individual-specific. Collectively these results further confirm and extend support for the notion that basic emotions have distinct ANS signatures. Copyright © 2010 Elsevier B.V. All rights reserved.

  3. Users matter : multi-agent systems model of high performance computing cluster users.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    North, M. J.; Hood, C. S.; Decision and Information Sciences

    2005-01-01

    High performance computing clusters have been a critical resource for computational science for over a decade and have more recently become integral to large-scale industrial analysis. Despite their well-specified components, the aggregate behavior of clusters is poorly understood. The difficulties arise from complicated interactions between cluster components during operation. These interactions have been studied by many researchers, some of whom have identified the need for holistic multi-scale modeling that simultaneously includes network level, operating system level, process level, and user level behaviors. Each of these levels presents its own modeling challenges, but the user level is the most complex duemore » to the adaptability of human beings. In this vein, there are several major user modeling goals, namely descriptive modeling, predictive modeling and automated weakness discovery. This study shows how multi-agent techniques were used to simulate a large-scale computing cluster at each of these levels.« less

  4. ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data

    NASA Technical Reports Server (NTRS)

    Wharton, S. W.; Turner, B. J.

    1981-01-01

    An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.

  5. Automated atlas-based clustering of white matter fiber tracts from DTMRI.

    PubMed

    Maddah, Mahnaz; Mewes, Andrea U J; Haker, Steven; Grimson, W Eric L; Warfield, Simon K

    2005-01-01

    A new framework is presented for clustering fiber tracts into anatomically known bundles. This work is motivated by medical applications in which variation analysis of known bundles of fiber tracts in the human brain is desired. To include the anatomical knowledge in the clustering, we invoke an atlas of fiber tracts, labeled by the number of bundles of interest. In this work, we construct such an atlas and use it to cluster all fiber tracts in the white matter. To build the atlas, we start with a set of labeled ROIs specified by an expert and extract the fiber tracts initiating from each ROI. Affine registration is used to project the extracted fiber tracts of each subject to the atlas, whereas their B-spline representation is used to efficiently compare them to the fiber tracts in the atlas and assign cluster labels. Expert visual inspection of the result confirms that the proposed method is very promising and efficient in clustering of the known bundles of fiber tracts.

  6. Breaking Self-Similarity in Poor Clusters of Galaxies

    NASA Astrophysics Data System (ADS)

    Kempner, J. C.; David, L. P.

    2005-12-01

    The large scatter in the LX--TX relation among poor clusters in the ˜2--4 keV range indicates that the self-similarity seen among hotter clusters does not apply to their cooler siblings. Many forms of non-gravitational heating have been proposed to break this self-similarity, including cluster mergers, AGN heating, and supernova ``pre-heating.'' We present an analysis of a sample of poor clusters from the Chandra and XMM archives that suggests a cycle of heating and cooling in the cores of these clusters is responsible for the departures from self-similarity. That these differences exist only in the core is strongly suggestive of AGN heating as the dominant mechanism. Support for this work was provided by the National Aeronautics and Space Administration through Chandra Award Number G05-5138A issued by the Chandra X-ray Observatory Center, which is operated by the Smithsonian Astrophysical Observatory for and on behalf of NASA under contract NAS8-39073, and by NASA contract NAG5-12933.

  7. Analysis of Helium Segregation on Surfaces of Plasma-Exposed Tungsten

    NASA Astrophysics Data System (ADS)

    Maroudas, Dimitrios; Hu, Lin; Hammond, Karl; Wirth, Brian

    2015-11-01

    We report a systematic theoretical and atomic-scale computational study of implanted helium segregation on surfaces of tungsten, which is considered as a plasma facing component in nuclear fusion reactors. We employ a hierarchy of atomic-scale simulations, including molecular statics to understand the origin of helium surface segregation, targeted molecular-dynamics (MD) simulations of near-surface cluster reactions, and large-scale MD simulations of implanted helium evolution in plasma-exposed tungsten. We find that small, mobile helium clusters (of 1-7 He atoms) in the near-surface region are attracted to the surface due to an elastic interaction force. This thermodynamic driving force induces drift fluxes of these mobile clusters toward the surface, facilitating helium segregation. Moreover, the clusters' drift toward the surface enables cluster reactions, most importantly trap mutation, at rates much higher than in the bulk material. This cluster dynamics has significant effects on the surface morphology, near-surface defect structures, and the amount of helium retained in the material upon plasma exposure.

  8. Cluster ensemble based on Random Forests for genetic data.

    PubMed

    Alhusain, Luluah; Hafez, Alaaeldin M

    2017-01-01

    Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

  9. Classification of Support Needs for Elderly Outpatients with Diabetes Who Live Alone.

    PubMed

    Miyawaki, Yoshiko; Shimizu, Yasuko; Seto, Natsuko

    2016-02-01

    To investigate the support needs of elderly patients with diabetes and to classify elderly patients with diabetes living alone on the basis of support needs. Support needs were derived from a literature review of relevant journals and interviews of outpatients as well as expert nurses in the field of diabetes to prepare a 45-item questionnaire. Each item was analyzed on a 4-point Likert scale. The study included 634 elderly patients with diabetes who were recruited from 3 hospitals in Japan. Exploratory factor analysis was performed to determine the underlying structure of support needs, followed by hierarchical cluster analysis to clarify the characteristics of patients living alone (n=104) who had common support needs. Exploratory factor analysis suggested a 5-factor solution with 23 items: (1) hope for class and gatherings, (2) hope for personal advice including emergency response, (3) supportlessness and hopelessness, (4) barriers to food preparation, (5) hope of safe medical therapy. The hierarchical cluster analysis of subjects yielded 7 clusters, including a no special-support needs group, a collective support group, a self-care support group, a personal-support focus group, a life-support group, a food-preparation support group and a healthcare-environment support group. The support needs of elderly patients with diabetes who live alone can be divided into 2 categories: life and self-care support. Implementation of these categories in outpatient-management programs in which contact time with patients is limited is important in the overall management of elderly patients with diabetes who are living alone. Copyright © 2015 Canadian Diabetes Association. Published by Elsevier Inc. All rights reserved.

  10. Genetic diversity of allozymes in turnip (Brassica rapa L. var. rapa) from the Nordic area.

    PubMed

    Persson, K; Fält, A S; von Bothmer, R

    2001-01-01

    Genetic diversity and relationships based on isozymes were studied in 31 accessions of turnip (Brassica rapa L. var. rapa). The material included varieties, elite stocks, landraces and older turnip of slash-and-burn type from the Nordic area. A total of 9 isozyme loci and 26 alleles were studied. The isozyme systems were ACO, DIA, GPI, GOT, PGM, PGD and SKD. The level of heterozygosity was reduced in the landraces, but it was high for the variety group 'Ostersundom'. Turnip has a higher genetic variation than other crops within B. rapa and than in other species with the same breeding system. The genetic diversity showed that 18.7% of the genetic variation was within the accessions, and the total H tau value was 0.358. Gpi-I and Pgd-I showed the lowest variation compared with the other loci. The cluster analysis revealed five clusters, with one main cluster including 25 of the 31 accessions. The dendrogram indicated that the variety group 'Ostersundom' clustered together whereas the variety group 'Bortfelder' was associated with country of origin. The landraces were spread in different clusters. The 'slash-and-burn' type of turnip belonged to two groups.

  11. Multimorbidity and health-related quality of life (HRQoL) in a nationally representative population sample: implications of count versus cluster method for defining multimorbidity on HRQoL.

    PubMed

    Wang, Lili; Palmer, Andrew J; Cocker, Fiona; Sanderson, Kristy

    2017-01-09

    No universally accepted definition of multimorbidity (MM) exists, and implications of different definitions have not been explored. This study examined the performance of the count and cluster definitions of multimorbidity on the sociodemographic profile and health-related quality of life (HRQoL) in a general population. Data were derived from the nationally representative 2007 Australian National Survey of Mental Health and Wellbeing (n = 8841). The HRQoL scores were measured using the Assessment of Quality of Life (AQoL-4D) instrument. The simple count (2+ & 3+ conditions) and hierarchical cluster methods were used to define/identify clusters of multimorbidity. Linear regression was used to assess the associations between HRQoL and multimorbidity as defined by the different methods. The assessment of multimorbidity, which was defined using the count method, resulting in the prevalence of 26% (MM2+) and 10.1% (MM3+). Statistically significant clusters identified through hierarchical cluster analysis included heart or circulatory conditions (CVD)/arthritis (cluster-1, 9%) and major depressive disorder (MDD)/anxiety (cluster-2, 4%). A sensitivity analysis suggested that the stability of the clusters resulted from hierarchical clustering. The sociodemographic profiles were similar between MM2+, MM3+ and cluster-1, but were different from cluster-2. HRQoL was negatively associated with MM2+ (β: -0.18, SE: -0.01, p < 0.001), MM3+ (β: -0.23, SE: -0.02, p < 0.001), cluster-1 (β: -0.10, SE: 0.01, p < 0.001) and cluster-2 (β: -0.36, SE: 0.01, p < 0.001). Our findings confirm the existence of an inverse relationship between multimorbidity and HRQoL in the Australian population and indicate that the hierarchical clustering approach is validated when the outcome of interest is HRQoL from this head-to-head comparison. Moreover, a simple count fails to identify if there are specific conditions of interest that are driving poorer HRQoL. Researchers should exercise caution when selecting a definition of multimorbidity because it may significantly influence the study outcomes.

  12. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome.

    PubMed

    Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard

    2014-07-01

    Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.

  13. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). Conclusions: A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.« less

  14. Effects of Dexamethasone and Placebo on Symptom Clusters in Advanced Cancer Patients: A Preliminary Report.

    PubMed

    Yennurajalingam, Sriram; Williams, Janet L; Chisholm, Gary; Bruera, Eduardo

    2016-03-01

    Advanced cancer patients frequently experience debilitating symptoms that occur in clusters, but few pharmacological studies have targeted symptom clusters. Our objective was to examine the effects of dexamethasone on symptom clusters in patients with advanced cancer. We reviewed the data from a previous randomized clinical trial to determine the effects of dexamethasone on cancer symptoms. Symptom clusters were identified according to baseline symptoms by using principal component analysis. Correlations and change in the severity of symptom clusters were analyzed after study treatment. A total of 114 participants were included in this study. Three clusters were identified: fatigue/anorexia-cachexia/depression (FAD), sleep/anxiety/drowsiness (SAD), and pain/dyspnea (PD). Changes in severity of FAD and PD significantly correlated over time (at baseline, day 8, and day 15). The FAD cluster was associated with significant improvement in severity at day 8 and day 15, whereas no significant change was observed with the SAD cluster or PD cluster after dexamethasone treatment. The results of this preliminary study suggest significant correlation over time and improvement in the FAD cluster at day 8 and day 15 after treatment with dexamethasone. These findings suggest that fatigue, anorexia-cachexia, and depression may share a common pathophysiologic basis. Further studies are needed to investigate this cluster and target anti-inflammatory therapies. ©AlphaMed Press.

  15. Coupled protein-ligand dynamics in truncated hemoglobin N from atomistic simulations and transition networks.

    PubMed

    Cazade, Pierre-André; Berezovska, Ganna; Meuwly, Markus

    2015-05-01

    The nature of ligand motion in proteins is difficult to characterize directly using experiment. Specifically, it is unclear to what degree these motions are coupled. All-atom simulations are used to sample ligand motion in truncated Hemoglobin N. A transition network analysis including ligand- and protein-degrees of freedom is used to analyze the microscopic dynamics. Clustering of two different subsets of MD trajectories highlights the importance of a diverse and exhaustive description to define the macrostates for a ligand-migration network. Monte Carlo simulations on the transition matrices from one particular clustering are able to faithfully capture the atomistic simulations. Contrary to clustering by ligand positions only, including a protein degree of freedom yields considerably improved coarse grained dynamics. Analysis with and without imposing detailed balance agree closely which suggests that the underlying atomistic simulations are converged with respect to sampling transitions between neighboring sites. Protein and ligand dynamics are not independent from each other and ligand migration through globular proteins is not passive diffusion. Transition network analysis is a powerful tool to analyze and characterize the microscopic dynamics in complex systems. This article is part of a Special Issue entitled Recent developments of molecular dynamics. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Density-functional study of the structures and properties of holmium-doped silicon clusters HoSi n (n = 3-9) and their anions.

    PubMed

    Hou, Liyuan; Yang, Jucai; Liu, Yuming

    2017-04-01

    The structures and properties of Ho-doped Si clusters, including their adiabatic electron affinities (AEAs), simulated photoelectron spectra (PESs), stabilities, magnetic moments, and charge-transfer characteristics, were systematically investigated using four density-functional methods. The results show that the double-hybrid functional (which includes an MP2 correlation component) can accurately predict the ground-state structure and properties of Ho-doped Si clusters. The ground-state structures of HoSi n (n = 3-9) are sextuplet electronic states. The structures of these Ho-doped Si clusters (aside from HoSi 7 ) are substitutional. The ground-state structures of HoSi n - are quintuplet electronic states. Their predicted AEAs are in excellent agreement with the experimental ones. The mean absolute error in the theoretical AEAs of HoSi n (n = 4-9) is only 0.04 eV. The simulated PESs for HoSi n - (n = 5-9) are in good agreement with the experimental PESs. Based on its simulated PES and theoretical AEA, we reassigned the experimental PES of HoSi 4 - and obtained an experimental AEA of 2.2 ± 0.1 eV. The dissociation energies of Ho from HoSi n and HoSi n - (n = 3-9) were evaluated to test the relative stabilities of the clusters. HOMO-LUMO gap analysis indicated that doping the Si clusters with the rare-earth metal atom significantly increases their photochemical reactivity. Natural population analysis showed that the magnetic moments of HoSi n (n = 3-9) and their anions derive mainly from the Ho atom. It was also found that the magnetic moments of Ho in the HoSi n clusters are larger than the magnetic moment of an isolated Ho atom.

  17. Molecular clustering of patients with diabetes and pulmonary tuberculosis: A systematic review and meta-analysis

    PubMed Central

    Blanco-Guillot, Francles; Delgado-Sánchez, Guadalupe; Mongua-Rodríguez, Norma; Cruz-Hervert, Pablo; Ferreyra-Reyes, Leticia; Ferreira-Guerrero, Elizabeth; Yanes-Lane, Mercedes; Montero-Campos, Rogelio; Bobadilla-del-Valle, Miriam; Torres-González, Pedro; Ponce-de-León, Alfredo; Sifuentes-Osornio, José; Garcia-Garcia, Lourdes

    2017-01-01

    Introduction Many studies have explored the relationship between diabetes mellitus (DM) and tuberculosis (TB) demonstrating increased risk of TB among patients with DM and poor prognosis of patients suffering from the association of DM/TB. Owing to a paucity of studies addressing this question, it remains unclear whether patients with DM and TB are more likely than TB patients without DM to be grouped into molecular clusters defined according to the genotype of the infecting Mycobacterium tuberculosis bacillus. That is, whether there is convincing molecular epidemiological evidence for TB transmission among DM patients. Objective: We performed a systematic review and meta-analysis to quantitatively evaluate the propensity for patients with DM and pulmonary TB (PTB) to cluster according to the genotype of the infecting M. tuberculosis bacillus. Materials and methods We conducted a systematic search in MEDLINE and LILACS from 1990 to June, 2016 with the following combinations of key words “tuberculosis AND transmission” OR “tuberculosis diabetes mellitus” OR “Mycobacterium tuberculosis molecular epidemiology” OR “RFLP-IS6110” OR “Spoligotyping” OR “MIRU-VNTR”. Studies were included if they met the following criteria: (i) studies based on populations from defined geographical areas; (ii) use of genotyping by IS6110- restriction fragment length polymorphism (RFLP) analysis and spoligotyping or mycobacterial interspersed repetitive unit-variable number of tandem repeats (MIRU-VNTR) or other amplification methods to identify molecular clustering; (iii) genotyping and analysis of 50 or more cases of PTB; (iv) study duration of 11 months or more; (v) identification of quantitative risk factors for molecular clustering including DM; (vi) > 60% coverage of the study population; and (vii) patients with PTB confirmed bacteriologically. The exclusion criteria were: (i) Extrapulmonary TB; (ii) TB caused by nontuberculous mycobacteria; (iii) patients with PTB and HIV; (iv) pediatric PTB patients; (v) TB in closed environments (e.g. prisons, elderly homes, etc.); (vi) diabetes insipidus and (vii) outbreak reports. Hartung-Knapp-Sidik-Jonkman method was used to estimate the odds ratio (OR) of the association between DM with molecular clustering of cases with TB. In order to evaluate the degree of heterogeneity a statistical Q test was done. The publication bias was examined with Begg and Egger tests. Review Manager 5.3.5 CMA v.3 and Biostat and Software package R were used. Results Selection criteria were met by six articles which included 4076 patients with PTB of which 13% had DM. Twenty seven percent of the cases were clustered. The majority of cases (48%) were reported in a study in China with 31% clustering. The highest incidence of TB occurred in two studies from China. The global OR for molecular clustering was 0.84 (IC 95% 0.40–1.72). The heterogeneity between studies was moderate (I2 = 55%, p = 0.05), although there was no publication bias (Beggs test p = 0.353 and Eggers p = 0.429). Conclusion There were very few studies meeting our selection criteria. The wide confidence interval indicates that there is not enough evidence to draw conclusions about the association. Clustering of patients with DM in TB transmission chains should be investigated in areas where both diseases are prevalent and focus on specific contexts. PMID:28902922

  18. An effective fuzzy kernel clustering analysis approach for gene expression data.

    PubMed

    Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

    2015-01-01

    Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.

  19. The extracellular Leucine-Rich Repeat superfamily; a comparative survey and analysis of evolutionary relationships and expression patterns

    PubMed Central

    Dolan, Jackie; Walshe, Karen; Alsbury, Samantha; Hokamp, Karsten; O'Keeffe, Sean; Okafuji, Tatsuya; Miller, Suzanne FC; Tear, Guy; Mitchell, Kevin J

    2007-01-01

    Background Leucine-rich repeats (LRRs) are highly versatile and evolvable protein-ligand interaction motifs found in a large number of proteins with diverse functions, including innate immunity and nervous system development. Here we catalogue all of the extracellular LRR (eLRR) proteins in worms, flies, mice and humans. We use convergent evidence from several transmembrane-prediction and motif-detection programs, including a customised algorithm, LRRscan, to identify eLRR proteins, and a hierarchical clustering method based on TribeMCL to establish their evolutionary relationships. Results This yields a total of 369 proteins (29 in worm, 66 in fly, 135 in mouse and 139 in human), many of them of unknown function. We group eLRR proteins into several classes: those with only LRRs, those that cluster with Toll-like receptors (Tlrs), those with immunoglobulin or fibronectin-type 3 (FN3) domains and those with some other domain. These groups show differential patterns of expansion and diversification across species. Our analyses reveal several clusters of novel genes, including two Elfn genes, encoding transmembrane proteins with eLRRs and an FN3 domain, and six genes encoding transmembrane proteins with eLRRs only (the Elron cluster). Many of these are expressed in discrete patterns in the developing mouse brain, notably in the thalamus and cortex. We have also identified a number of novel fly eLRR proteins with discrete expression in the embryonic nervous system. Conclusion This study provides the necessary foundation for a systematic analysis of the functions of this class of genes, which are likely to include prominently innate immunity, inflammation and neural development, especially the specification of neuronal connectivity. PMID:17868438

  20. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    PubMed

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  1. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2015-01-01

    Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745

  2. A cluster-based approach to selecting representative stimuli from the International Affective Picture System (IAPS) database.

    PubMed

    Constantinescu, Alexandra C; Wolters, Maria; Moore, Adam; MacPherson, Sarah E

    2017-06-01

    The International Affective Picture System (IAPS; Lang, Bradley, & Cuthbert, 2008) is a stimulus database that is frequently used to investigate various aspects of emotional processing. Despite its extensive use, selecting IAPS stimuli for a research project is not usually done according to an established strategy, but rather is tailored to individual studies. Here we propose a standard, replicable method for stimulus selection based on cluster analysis, which re-creates the group structure that is most likely to have produced the valence arousal, and dominance norms associated with the IAPS images. Our method includes screening the database for outliers, identifying a suitable clustering solution, and then extracting the desired number of stimuli on the basis of their level of certainty of belonging to the cluster they were assigned to. Our method preserves statistical power in studies by maximizing the likelihood that the stimuli belong to the cluster structure fitted to them, and by filtering stimuli according to their certainty of cluster membership. In addition, although our cluster-based method is illustrated using the IAPS, it can be extended to other stimulus databases.

  3. Population Structure With Localized Haplotype Clusters

    PubMed Central

    Browning, Sharon R.; Weir, Bruce S.

    2010-01-01

    We propose a multilocus version of FST and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific FST estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based FST than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of FST and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data. PMID:20457877

  4. Clustering analysis of moving target signatures

    NASA Astrophysics Data System (ADS)

    Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto

    2010-04-01

    Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.

  5. Basic limnology of fifty-one lakes in Costa Rica.

    PubMed

    Haberyan, Kurt A; Horn, Sally P; Umaña, Gerardo

    2003-03-01

    We visited 51 lakes in Costa Rica as part of a broad-based survey to document their physical and chemical characteristics and how these relate to the mode of formation and geographical distribution of the lakes. The four oxbow lakes were low in elevation and tended to be turbid, high in conductivity and CO2, but low in dissolved O2; one of these, L. Gandoca, had a hypolimnion essentially composed of sea water. These were similar to the four wetland lakes, but the latter instead had low conductivities and pH, and turbidity was often due to tannins rather than suspended sediments. The thirteen artificial lakes formed a very heterogenous group, whose features varied depending on local factors. The thirteen lakes dammed by landslides, lava flows, or lahars occurred in areas with steep slopes, and were more likely to be stratified than most other types of lakes. The eight lakes that occupy volcanic craters tended to be deep, stratified, clear, and cool; two of these, L. Hule and L. Río Cuarto, appeared to be oligomictic (tending toward meromictic). The nine glacial lakes, all located above 3440 m elevation near Cerro Chirripó, were clear, cold, dilute, and are probably polymictic. Cluster analysis resulted in three significant groups of lakes. Cluster 1 included four calcium-rich lakes (average 48 mg l-1), Cluster 2 included fourteen lakes with more Si than Ca+2 and higher Cl- than the other clusters, and Cluster 3 included the remaining thirty-three lakes that were generally less concentrated. Each cluster included lakes of various origins located in different geographical regions; these data indicate that, apart from the high-altitude glacial lakes and lakes in the Miravalles area, similarity in lake chemistry is independent of lake distribution.

  6. WordCluster: detecting clusters of DNA words and genomic elements

    PubMed Central

    2011-01-01

    Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981

  7. VizieR Online Data Catalog: SPT-GMOS spectroscopy of gal. in massive clusters (Bayliss+, 2017)

    NASA Astrophysics Data System (ADS)

    Bayliss, M. B.; Zengo, K.; Ruel, J.; Benson, B. A.; Bleem, L. E.; Bocquet, S.; Bulbul, E.; Brodwin, M.; Capasso, R.; Chiu, I.-N.; McDonald, M.; Rapetti, D.; Saro, A.; Stalder, B.; Stark, A. A.; Strazzullo, V.; Stubbs, C. W.; Zenteno, A.

    2017-10-01

    The majority of the data set used in this analysis comes from the SPT-GMOS spectroscopic survey (Bayliss+ 2016, J/ApJS/227/3), which consists of spectroscopic follow-up of 62 galaxy clusters from the SPT-SZ survey. The full SPT-GMOS sample includes 2243 galaxy spectra, 1579 of which are cluster member galaxies. In addition to previously published galaxy spectroscopy, we also present the first publication of new spectroscopy in the fields of three SPT galaxy clusters. We observed SPT-CLJ0000-5748, SPT-CLJ0516-5430, and SPT-CLJ2337-5942 with the Inamori Magellan Areal Camera and Spectrograph (IMACS) mounted on the Magellan-I (Baade) telescope at Las Campanas Observatory on the nights of 14-15 September 2012. (2 data files).

  8. Anthocyanins in the bracts of Curcuma species and relationship of the species based on anthocyanin composition.

    PubMed

    Koshioka, Masaji; Umegaki, Naoko; Boontiang, Kriangsuk; Pornchuti, Witayaporn; Thammasiri, Kanchit; Yamaguchi, Satoshi; Tatsuzawa, Fumi; Nakayama, Masayoshi; Tateishi, Akira; Kubota, Satoshi

    2015-03-01

    Five anthocyanins, delphinidin 3-O-rutinoside, cyanidin 3-O-rutinoside, petunidin 3-O-rutinoside, malvidin 3-O-glucoside and malvidin 3-O-rutinoside, were identified. Three anthocyanins, delphinidin 3-O-glucoside, cyanidin 3-O-glucoside and pelargonidin 3-O-rutinoside, were putatively identified based on C18 HPLC retention time, absorption spectrum, including λmax, and comparisons with those of corresponding standard anthocyanins, as the compounds responsible for the pink to purple-red pigmentation of the bracts of Curcuma alismatifolia and five related species. Cluster analysis based on four major anthocyanins formed two clusters. One consisted of only one species, C. alismatifolia, and the other consisted of five. Each cluster further formed sub-clusters depending on either species or habitats.

  9. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    PubMed

    Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A

    2015-01-01

    One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  10. Dietary patterns, insulin sensitivity and inflammation in older adults

    PubMed Central

    Anderson, Amy L.; Harris, Tamara B.; Tylavsky, Frances A.; Perry, Sara E.; Houston, Denise K.; Lee, Jung Sun; Kanaya, Alka M.; Sahyoun, Nadine R.

    2011-01-01

    Background/Objectives Several studies have linked dietary patterns to insulin sensitivity and systemic inflammation, which affect risk of multiple chronic diseases. The purpose of this study was to investigate the dietary patterns of a cohort of older adults, and examine relationships of dietary patterns with markers of insulin sensitivity and systemic inflammation. Subjects/Methods The Health, Aging and Body Composition (Health ABC) Study is a prospective cohort study of 3075 older adults. In Health ABC, multiple indicators of glucose metabolism and systemic inflammation were assessed. Food intake was estimated with a modified Block food frequency questionnaire (FFQ). In this study, dietary patterns of 1751 participants with complete data were derived by cluster analysis. Results Six clusters were identified, including a ‘Healthy foods’ cluster, characterized by higher intake of lowfat dairy products, fruit, whole grains, poultry, fish and vegetables. In the main analysis, the ‘Healthy foods’ cluster had significantly lower fasting insulin and HOMA-IR than the ‘Breakfast cereal’ and ‘High-fat dairy products’ clusters, and lower fasting glucose than the ‘High-fat dairy products’ cluster (P ≤ 0.05). No differences were found in 2-hour glucose. With respect to inflammation, the ‘Healthy foods’ cluster had lower IL-6 than the ‘Sweets and desserts’ and ‘High-fat dairy products’ clusters, and no differences were seen in CRP or TNF-α. Conclusions A dietary pattern high in lowfat dairy products, fruit, whole grains, poultry, fish and vegetables may be associated with greater insulin sensitivity and lower systemic inflammation in older adults. PMID:21915138

  11. Multilocus sequence analysis of Anaplasma phagocytophilum reveals three distinct lineages with different host ranges in clinically ill French cattle.

    PubMed

    Chastagner, Amélie; Dugat, Thibaud; Vourc'h, Gwenaël; Verheyden, Hélène; Legrand, Loïc; Bachy, Véronique; Chabanne, Luc; Joncour, Guy; Maillard, Renaud; Boulouis, Henri-Jean; Haddad, Nadia; Bailly, Xavier; Leblond, Agnès

    2014-12-09

    Molecular epidemiology represents a powerful approach to elucidate the complex epidemiological cycles of multi-host pathogens, such as Anaplasma phagocytophilum. A. phagocytophilum is a tick-borne bacterium that affects a wide range of wild and domesticated animals. Here, we characterized its genetic diversity in populations of French cattle; we then compared the observed genotypes with those found in horses, dogs, and roe deer to determine whether genotypes of A. phagocytophilum are shared among different hosts. We sampled 120 domesticated animals (104 cattle, 13 horses, and 3 dogs) and 40 wild animals (roe deer) and used multilocus sequence analysis on nine loci (ankA, msp4, groESL, typA, pled, gyrA, recG, polA, and an intergenic region) to characterize the genotypes of A. phagocytophilum present. Phylogenic analysis revealed three genetic clusters of bacterial variants in domesticated animals. The two principal clusters included 98% of the bacterial genotypes found in cattle, which were only distantly related to those in roe deer. One cluster comprised only cattle genotypes, while the second contained genotypes from cattle, horses, and dogs. The third contained all roe deer genotypes and three cattle genotypes. Geographical factors could not explain this clustering pattern. These results suggest that roe deer do not contribute to the spread of A. phagocytophilum in cattle in France. Further studies should explore if these different clusters are associated with differing disease severity in domesticated hosts. Additionally, it remains to be seen if the three clusters of A. phagocytophilum genotypes in cattle correspond to distinct epidemiological cycles, potentially involving different reservoir hosts.

  12. Classification of patients based on their evaluation of hospital outcomes: cluster analysis following a national survey in Norway

    PubMed Central

    2013-01-01

    Background A general trend towards positive patient-reported evaluations of hospitals could be taken as a sign that most patients form a homogeneous, reasonably pleased group, and consequently that there is little need for quality improvement. The objective of this study was to explore this assumption by identifying and statistically validating clusters of patients based on their evaluation of outcomes related to overall satisfaction, malpractice and benefit of treatment. Methods Data were collected using a national patient-experience survey of 61 hospitals in the 4 health regions in Norway during spring 2011. Postal questionnaires were mailed to 23,420 patients after their discharge from hospital. Cluster analysis was performed to identify response clusters of patients, based on their responses to single items about overall patient satisfaction, benefit of treatment and perception of malpractice. Results Cluster analysis identified six response groups, including one cluster with systematically poorer evaluation across outcomes (18.5% of patients) and one small outlier group (5.3%) with very poor scores across all outcomes. One-Way ANOVA with post-hoc tests showed that most differences between the six response groups on the three outcome items were significant. The response groups were significantly associated with nine patient-experience indicators (p < 0.001), and all groups were significantly different from each of the other groups on a majority of the patient-experience indicators. Clusters were significantly associated with age, education, self-perceived health, gender, and the degree to write open comments in the questionnaire. Conclusions The study identified five response clusters with distinct patient-reported outcome scores, in addition to a heterogeneous outlier group with very poor scores across all outcomes. The outlier group and the cluster with systematically poorer evaluation across outcomes comprised almost one-quarter of all patients, clearly demonstrating the need to tailor quality initiatives and improve patient-perceived quality in hospitals. More research on patient clustering in patient evaluation is needed, as well as standardization of methodology to increase comparability across studies. PMID:23433450

  13. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    ERIC Educational Resources Information Center

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  14. Exploring syndrome differentiation using non-negative matrix factorization and cluster analysis in patients with atopic dermatitis.

    PubMed

    Yun, Younghee; Jung, Wonmo; Kim, Hyunho; Jang, Bo-Hyoung; Kim, Min-Hee; Noh, Jiseong; Ko, Seong-Gyu; Choi, Inhwa

    2017-08-01

    Syndrome differentiation (SD) results in a diagnostic conclusion based on a cluster of concurrent symptoms and signs, including pulse form and tongue color. In Korea, there is a strong interest in the standardization of Traditional Medicine (TM). In order to standardize TM treatment, standardization of SD should be given priority. The aim of this study was to explore the SD, or symptom clusters, of patients with atopic dermatitis (AD) using non-negative factorization methods and k-means clustering analysis. We screened 80 patients and enrolled 73 eligible patients. One TM dermatologist evaluated the symptoms/signs using an existing clinical dataset from patients with AD. This dataset was designed to collect 15 dermatologic and 18 systemic symptoms/signs associated with AD. Non-negative matrix factorization was used to decompose the original data into a matrix with three features and a weight matrix. The point of intersection of the three coordinates from each patient was placed in three-dimensional space. With five clusters, the silhouette score reached 0.484, and this was the best silhouette score obtained from two to nine clusters. Patients were clustered according to the varying severity of concurrent symptoms/signs. Through the distribution of the null hypothesis generated by 10,000 permutation tests, we found significant cluster-specific symptoms/signs from the confidence intervals in the upper and lower 2.5% of the distribution. Patients in each cluster showed differences in symptoms/signs and severity. In a clinical situation, SD and treatment are based on the practitioners' observations and clinical experience. SD, identified through informatics, can contribute to development of standardized, objective, and consistent SD for each disease. Copyright © 2017. Published by Elsevier Ltd.

  15. Identification of Loci and Functional Characterization of Trichothecene Biosynthesis Genes in Filamentous Fungi of the Genus Trichoderma▿†

    PubMed Central

    Cardoza, R. E.; Malmierca, M. G.; Hermosa, M. R.; Alexander, N. J.; McCormick, S. P.; Proctor, R. H.; Tijerino, A. M.; Rumbero, A.; Monte, E.; Gutiérrez, S.

    2011-01-01

    Trichothecenes are mycotoxins produced by Trichoderma, Fusarium, and at least four other genera in the fungal order Hypocreales. Fusarium has a trichothecene biosynthetic gene (TRI) cluster that encodes transport and regulatory proteins as well as most enzymes required for the formation of the mycotoxins. However, little is known about trichothecene biosynthesis in the other genera. Here, we identify and characterize TRI gene orthologues (tri) in Trichoderma arundinaceum and Trichoderma brevicompactum. Our results indicate that both Trichoderma species have a tri cluster that consists of orthologues of seven genes present in the Fusarium TRI cluster. Organization of genes in the cluster is the same in the two Trichoderma species but differs from the organization in Fusarium. Sequence and functional analysis revealed that the gene (tri5) responsible for the first committed step in trichothecene biosynthesis is located outside the cluster in both Trichoderma species rather than inside the cluster as it is in Fusarium. Heterologous expression analysis revealed that two T. arundinaceum cluster genes (tri4 and tri11) differ in function from their Fusarium orthologues. The Tatri4-encoded enzyme catalyzes only three of the four oxygenation reactions catalyzed by the orthologous enzyme in Fusarium. The Tatri11-encoded enzyme catalyzes a completely different reaction (trichothecene C-4 hydroxylation) than the Fusarium orthologue (trichothecene C-15 hydroxylation). The results of this study indicate that although some characteristics of the tri/TRI cluster have been conserved during evolution of Trichoderma and Fusarium, the cluster has undergone marked changes, including gene loss and/or gain, gene rearrangement, and divergence of gene function. PMID:21642405

  16. Cluster-based upper body marker models for three-dimensional kinematic analysis: Comparison with an anatomical model and reliability analysis.

    PubMed

    Boser, Quinn A; Valevicius, Aïda M; Lavoie, Ewen B; Chapman, Craig S; Pilarski, Patrick M; Hebert, Jacqueline S; Vette, Albert H

    2018-04-27

    Quantifying angular joint kinematics of the upper body is a useful method for assessing upper limb function. Joint angles are commonly obtained via motion capture, tracking markers placed on anatomical landmarks. This method is associated with limitations including administrative burden, soft tissue artifacts, and intra- and inter-tester variability. An alternative method involves the tracking of rigid marker clusters affixed to body segments, calibrated relative to anatomical landmarks or known joint angles. The accuracy and reliability of applying this cluster method to the upper body has, however, not been comprehensively explored. Our objective was to compare three different upper body cluster models with an anatomical model, with respect to joint angles and reliability. Non-disabled participants performed two standardized functional upper limb tasks with anatomical and cluster markers applied concurrently. Joint angle curves obtained via the marker clusters with three different calibration methods were compared to those from an anatomical model, and between-session reliability was assessed for all models. The cluster models produced joint angle curves which were comparable to and highly correlated with those from the anatomical model, but exhibited notable offsets and differences in sensitivity for some degrees of freedom. Between-session reliability was comparable between all models, and good for most degrees of freedom. Overall, the cluster models produced reliable joint angles that, however, cannot be used interchangeably with anatomical model outputs to calculate kinematic metrics. Cluster models appear to be an adequate, and possibly advantageous alternative to anatomical models when the objective is to assess trends in movement behavior. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. Results We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene,more » and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Conclusions Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  18. A generalized analysis of hydrophobic and loop clusters within globular protein sequences

    PubMed Central

    Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

    2007-01-01

    Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072

  19. Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures

    NASA Astrophysics Data System (ADS)

    Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.

    2016-12-01

    The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.

  20. Cholera epidemic in Guinea-Bissau (2008): the importance of "place".

    PubMed

    Luquero, Francisco J; Banga, Cunhate Na; Remartínez, Daniel; Palma, Pedro Pablo; Baron, Emanuel; Grais, Rebeca F

    2011-05-04

    As resources are limited when responding to cholera outbreaks, knowledge about where to orient interventions is crucial. We describe the cholera epidemic affecting Guinea-Bissau in 2008 focusing on the geographical spread in order to guide prevention and control activities. We conducted two studies: 1) a descriptive analysis of the cholera epidemic in Guinea-Bissau focusing on its geographical spread (country level and within the capital); and 2) a cross-sectional study to measure the prevalence of houses with at least one cholera case in the most affected neighbourhood of the capital (Bairro Bandim) to detect clustering of households with cases (cluster analysis). All cholera cases attending the cholera treatment centres in Guinea-Bissau who fulfilled a modified World Health Organization clinical case definition during the epidemic were included in the descriptive study. For the cluster analysis, a sample of houses was selected from a satellite photo (Google Earth™); 140 houses (and the four closest houses) were assessed from the 2,202 identified structures. We applied K-functions and Kernel smoothing to detect clustering. We confirmed the clustering using Kulldorff's spatial scan statistic. A total of 14,222 cases and 225 deaths were reported in the country (AR = 0.94%, CFR = 1.64%). The more affected regions were Biombo, Bijagos and Bissau (the capital). Bairro Bandim was the most affected neighborhood of the capital (AR = 4.0). We found at least one case in 22.7% of the houses (95%CI: 19.5-26.2) in this neighborhood. The cluster analysis identified two areas within Bairro Bandim at highest risk: a market and an intersection where runoff accumulates waste (p<0.001). Our analysis allowed for the identification of the most affected regions in Guinea-Bissau during the 2008 cholera outbreak, and the most affected areas within the capital. This information was essential for making decisions on where to reinforce treatment and to guide control and prevention activities.

  1. Cholera Epidemic in Guinea-Bissau (2008): The Importance of “Place”

    PubMed Central

    Luquero, Francisco J.; Banga, Cunhate Na; Remartínez, Daniel; Palma, Pedro Pablo; Baron, Emanuel; Grais, Rebeca F.

    2011-01-01

    Background As resources are limited when responding to cholera outbreaks, knowledge about where to orient interventions is crucial. We describe the cholera epidemic affecting Guinea-Bissau in 2008 focusing on the geographical spread in order to guide prevention and control activities. Methodology/Principal Findings We conducted two studies: 1) a descriptive analysis of the cholera epidemic in Guinea-Bissau focusing on its geographical spread (country level and within the capital); and 2) a cross-sectional study to measure the prevalence of houses with at least one cholera case in the most affected neighbourhood of the capital (Bairro Bandim) to detect clustering of households with cases (cluster analysis). All cholera cases attending the cholera treatment centres in Guinea-Bissau who fulfilled a modified World Health Organization clinical case definition during the epidemic were included in the descriptive study. For the cluster analysis, a sample of houses was selected from a satellite photo (Google Earth™); 140 houses (and the four closest houses) were assessed from the 2,202 identified structures. We applied K-functions and Kernel smoothing to detect clustering. We confirmed the clustering using Kulldorff's spatial scan statistic. A total of 14,222 cases and 225 deaths were reported in the country (AR = 0.94%, CFR = 1.64%). The more affected regions were Biombo, Bijagos and Bissau (the capital). Bairro Bandim was the most affected neighborhood of the capital (AR = 4.0). We found at least one case in 22.7% of the houses (95%CI: 19.5–26.2) in this neighborhood. The cluster analysis identified two areas within Bairro Bandim at highest risk: a market and an intersection where runoff accumulates waste (p<0.001). Conclusions/Significance Our analysis allowed for the identification of the most affected regions in Guinea-Bissau during the 2008 cholera outbreak, and the most affected areas within the capital. This information was essential for making decisions on where to reinforce treatment and to guide control and prevention activities. PMID:21572530

  2. IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

    PubMed

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world. Copyright © 2015 Hadjithomas et al.

  3. Low Back Pain Subgroups using Fear-Avoidance Model Measures: Results of a Cluster Analysis

    PubMed Central

    Beneciuk, Jason M.; Robinson, Michael E.; George, Steven Z.

    2012-01-01

    Objectives The purpose of this secondary analysis was to test the hypothesis that an empirically derived psychological subgrouping scheme based on multiple Fear-Avoidance Model (FAM) constructs would provide additional capabilities for clinical outcomes in comparison to a single FAM construct. Methods Patients (n = 108) with acute or sub-acute low back pain (LBP) enrolled in a clinical trial comparing behavioral physical therapy interventions to classification based physical therapy completed baseline questionnaires for pain catastrophizing (PCS), fear-avoidance beliefs (FABQ-PA, FABQ-W), and patient-specific fear (FDAQ). Clinical outcomes were pain intensity and disability measured at baseline, 4-weeks, and 6-months. A hierarchical agglomerative cluster analysis was used to create distinct cluster profiles among FAM measures and discriminant analysis was used to interpret clusters. Changes in clinical outcomes were investigated with repeated measures ANOVA and differences in results based on cluster membership were compared to FABQ-PA subgrouping used in the original trial. Results Three distinct FAM subgroups (Low Risk, High Specific Fear, and High Fear & Catastrophizing) emerged from cluster analysis. Subgroups differed on baseline pain and disability (p’s<.01) with the High Fear & Catastrophizing subgroup associated with greater pain than the Low Risk subgroup (p<.01) and the greatest disability (p’s<.05). Subgroup × time interactions were detected for both pain and disability (p’s<.05) with the High Fear & Catastrophizing subgroup reporting greater changes in pain and disability than other subgroups (p’s<.05). In contrast, FABQ-PA subgroups used in the original trial were not associated with interactions for clinical outcomes. Discussion These data suggest that subgrouping based on multiple FAM measures may provide additional information on clinical outcomes in comparison to determining subgroup status by FABQ-PA alone. Subgrouping methods for patients with LBP should include multiple psychological factors to further explore if patients can be matched with appropriate interventions. PMID:22510537

  4. Transcriptomic markers meet the real world: finding diagnostic signatures of corticosteroid treatment in commercial beef samples

    PubMed Central

    2012-01-01

    Background The use of growth-promoters in beef cattle, despite the EU ban, remains a frequent practice. The use of transcriptomic markers has already proposed to identify indirect evidence of anabolic hormone treatment. So far, such approach has been tested in experimentally treated animals. Here, for the first time commercial samples were analyzed. Results Quantitative determination of Dexamethasone (DEX) residues in the urine collected at the slaughterhouse was performed by Liquid Chromatography-Mass Spectrometry (LC-MS). DNA-microarray technology was used to obtain transcriptomic profiles of skeletal muscle in commercial samples and negative controls. LC-MS confirmed the presence of low level of DEX residues in the urine of the commercial samples suspect for histological classification. Principal Component Analysis (PCA) on microarray data identified two clusters of samples. One cluster included negative controls and a subset of commercial samples, while a second cluster included part of the specimens collected at the slaughterhouse together with positives for corticosteroid treatment based on thymus histology and LC-MS. Functional analysis of the differentially expressed genes (3961) between the two groups provided further evidence that animals clustering with positive samples might have been treated with corticosteroids. These suspect samples could be reliably classified with a specific classification tool (Prediction Analysis of Microarray) using just two genes. Conclusions Despite broad variation observed in gene expression profiles, the present study showed that DNA-microarrays can be used to find transcriptomic signatures of putative anabolic treatments and that gene expression markers could represent a useful screening tool. PMID:23110699

  5. Differentiation of Shewanella putrefaciens and Shewanella alga on the basis of whole-cell protein profiles, ribotyping, phenotypic characterization, and 16S rRNA gene sequence analysis.

    PubMed Central

    Vogel, B F; Jørgensen, K; Christensen, H; Olsen, J E; Gram, L

    1997-01-01

    Seventy-six presumed Shewanella putrefaciens isolates from fish, oil drillings, and clinical specimens, the type strain of Shewanella putrefaciens (ATCC 8071), the type strain of Shewanella alga (IAM 14159), and the type strain of Shewanella hanedai (ATCC 33224) were compared by several typing methods. Numerical analysis of sodium dodecyl sulfate-polyacrylamide gel electrophoresis of whole-cell protein and ribotyping patterns showed that the strains were separated into two distinct clusters with 56% +/- 10% and 40% +/- 14% similarity for whole-cell protein profiling and ribotyping, respectively. One cluster consisted of 26 isolates with 52 to 55 mol% G + C and included 15 human isolates, mostly clinical specimens, 8 isolates from marine waters, and the type strain of S. alga. This homogeneous cluster of mesophilic, halotolerant strains was by all analyses identical to the recently defined species S. alga (U. Simidu et al., Int. J. Syst. Bacteriol, 40:331-336, 1990). Fifty-two typically psychrotolerant strains formed the other, more heterogeneous major cluster, with 43 to 47 mol% G + C. The type strain of S. putrefaciens was included in this group. The two groups were confirmed by 16S rRNA gene sequence analysis. It is concluded that the isolates must be considered two different species, S. alga and S. putrefaciens, and that most mesophilic isolates formerly identified as S. putrefaciens belong to S. alga. The ecological role and potential pathogenicity of S. alga can be evaluated only if the organism is correctly identified. PMID:9172338

  6. Three subgroups of pain profiles identified in 227 women with arthritis: a latent class analysis.

    PubMed

    de Luca, Katie; Parkinson, Lynne; Downie, Aron; Blyth, Fiona; Byles, Julie

    2017-03-01

    The objectives were to identify subgroups of women with arthritis based upon the multi-dimensional nature of their pain experience and to compare health and socio-demographic variables between subgroups. A latent class analysis of 227 women with self-reported arthritis was used to identify clusters of women based upon the sensory, affective, and cognitive dimensions of the pain experience. Multivariate multinomial logistic regression analysis was used to determine the relationship between cluster membership and health and sociodemographic characteristics. A three-class cluster model was most parsimonious. 39.5 % of women had a unidimensional pain profile; 38.6 % of women had moderate multidimensional pain profile that included additional pain symptomatology such as sensory qualities and pain catastrophizing; and 21.9 % of women had severe multidimensional pain profile that included prominent pain symptomatology such as sensory and affective qualities of pain, pain catastrophizing, and neuropathic pain. Women with severe multidimensional pain profile have a 30.5 % higher risk of poorer quality of life and a 7.3 % higher risk of suffering depression, and women with moderate multidimensional pain profile have a 6.4 % higher risk of poorer quality of life when compared to women with unidimensional pain. This study identified three distinct subgroups of pain profiles in older women with arthritis. Women had very different experiences of pain, and cluster membership impacted significantly on health-related quality of life. These preliminary findings provide a stronger understanding of profiles of pain and may contribute to the development of tailored treatment options in arthritis.

  7. Functional Angucycline-Like Antibiotic Gene Cluster in the Terminal Inverted Repeats of the Streptomyces ambofaciens Linear Chromosome

    PubMed Central

    Pang, Xiuhua; Aigle, Bertrand; Girardet, Jean-Michel; Mangenot, Sophie; Pernodet, Jean-Luc; Decaris, Bernard; Leblond, Pierre

    2004-01-01

    Streptomyces ambofaciens has an 8-Mb linear chromosome ending in 200-kb terminal inverted repeats. Analysis of the F6 cosmid overlapping the terminal inverted repeats revealed a locus similar to type II polyketide synthase (PKS) gene clusters. Sequence analysis identified 26 open reading frames, including genes encoding the β-ketoacyl synthase (KS), chain length factor (CLF), and acyl carrier protein (ACP) that make up the minimal PKS. These KS, CLF, and ACP subunits are highly homologous to minimal PKS subunits involved in the biosynthesis of angucycline antibiotics. The genes encoding the KS and ACP subunits are transcribed constitutively but show a remarkable increase in expression after entering transition phase. Five genes, including those encoding the minimal PKS, were replaced by resistance markers to generate single and double mutants (replacement in one and both terminal inverted repeats). Double mutants were unable to produce either diffusible orange pigment or antibacterial activity against Bacillus subtilis. Single mutants showed an intermediate phenotype, suggesting that each copy of the cluster was functional. Transformation of double mutants with a conjugative and integrative form of F6 partially restored both phenotypes. The pigmented and antibacterial compounds were shown to be two distinct molecules produced from the same biosynthetic pathway. High-pressure liquid chromatography analysis of culture extracts from wild-type and double mutants revealed a peak with an associated bioactivity that was absent from the mutants. Two additional genes encoding KS and CLF were present in the cluster. However, disruption of the second KS gene had no effect on either pigment or antibiotic production. PMID:14742212

  8. Non-invasive quantification of tumour heterogeneity in water diffusivity to differentiate malignant from benign tissues of urinary bladder: a phase I study.

    PubMed

    Nguyen, Huyen T; Shah, Zarine K; Mortazavi, Amir; Pohar, Kamal S; Wei, Lai; Jia, Guang; Zynger, Debra L; Knopp, Michael V

    2017-05-01

    To quantify the heterogeneity of the tumour apparent diffusion coefficient (ADC) using voxel-based analysis to differentiate malignancy from benign wall thickening of the urinary bladder. Nineteen patients with histopathological findings of their cystectomy specimen were included. A data set of voxel-based ADC values was acquired for each patient's lesion. Histogram analysis was performed on each data set to calculate uniformity (U) and entropy (E). The k-means clustering of the voxel-wised ADC data set was implemented to measure mean intra-cluster distance (MICD) and largest inter-cluster distance (LICD). Subsequently, U, E, MICD, and LICD for malignant tumours were compared with those for benign lesions using a two-sample t-test. Eleven patients had pathological confirmation of malignancy and eight with benign wall thickening. Histogram analysis showed that malignant tumours had a significantly higher degree of ADC heterogeneity with lower U (P = 0.016) and higher E (P = 0.005) than benign lesions. In agreement with these findings, k-means clustering of voxel-wise ADC indicated that bladder malignancy presented with significantly higher MICD (P < 0.001) and higher LICD (P = 0.002) than benign wall thickening. The quantitative assessment of tumour diffusion heterogeneity using voxel-based ADC analysis has the potential to become a non-invasive tool to distinguish malignant from benign tissues of urinary bladder cancer. • Heterogeneity is an intrinsic characteristic of tumoral tissue. • Non-invasive quantification of tumour heterogeneity can provide adjunctive information to improve cancer diagnosis accuracy. • Histogram analysis and k-means clustering can quantify tumour diffusion heterogeneity. • The quantification helps differentiate malignant from benign urinary bladder tissue.

  9. Clustering of diet- and activity-related parenting practices: cross-sectional findings of the INPACT study

    PubMed Central

    2013-01-01

    Background Various diet- and activity-related parenting practices are positive determinants of child dietary and activity behaviour, including home availability, parental modelling and parental policies. There is evidence that parenting practices cluster within the dietary domain and within the activity domain. This study explores whether diet- and activity-related parenting practices cluster across the dietary and activity domain. Also examined is whether the clusters are related to child and parental background characteristics. Finally, to indicate the relevance of the clusters in influencing child dietary and activity behaviour, we examined whether clusters of parenting practices are related to these behaviours. Methods Data were used from 1480 parent–child dyads participating in the Dutch IVO Nutrition and Physical Activity Child cohorT (INPACT). Parents of children aged 8–11 years completed questionnaires at home assessing their diet- and activity-related parenting practices, child and parental background characteristics, and child dietary and activity behaviours. Principal component analysis (PCA) was used to identify clusters of parenting practices. Backward regression analysis was used to examine the relationship between child and parental background characteristics with cluster scores, and partial correlations to examine associations between cluster scores and child dietary and activity behaviours. Results PCA revealed five clusters of parenting practices: 1) high visibility and accessibility of screens and unhealthy food, 2) diet- and activity-related rules, 3) low availability of unhealthy food, 4) diet- and activity-related positive modelling, and 5) positive modelling on sports and fruit. Low parental education was associated with unhealthy cluster 1, while high(er) education was associated with healthy clusters 2, 3 and 5. Separate clusters were related to both child dietary and activity behaviour in the hypothesized directions: healthy clusters were positively related to obesity-reducing behaviours and negatively to obesity-inducing behaviours. Conclusion Parenting practices cluster across the dietary and activity domain. Parental education can be seen as an indicator of a broader parental context in which clusters of parenting practices operate. Separate clusters are related to both child dietary and activity behaviour. Interventions that focus on clusters of parenting practices to assist parents (especially low-educated parents) in changing their child’s dietary and activity behaviour seems justified. PMID:23531232

  10. Disordered eating in a Swedish community sample of adolescent girls: subgroups, stability, and associations with body esteem, deliberate self-harm and other difficulties.

    PubMed

    Viborg, Njördur; Wångby-Lundh, Margit; Lundh, Lars-Gunnar; Wallin, Ulf; Johnsson, Per

    2018-01-01

    The developmental study of subtypes of disordered eating (DE) during adolescence may be relevant to understand the development of eating disorders. The purpose of the present study was to identify subgroups with different profiles of DE in a community sample of adolescent girls aged 13-15 years, and to study the stability of these profiles and subgroups over a one-year interval in order to find patterns that may need to be addressed in further research and prevention. Cluster analysis according to the LICUR procedure was performed on five aspects of DE, and the structural and individual stability of these clusters was analysed. The clusters were compared with regard to BMI, body esteem, deliberate self-harm, and other kinds of psychological difficulties. The analysis revealed six clusters (Multiple eating problems including purging, Multiple eating problems without purging, Social eating problems, Weight concerns, Fear of not being able to stop eating, and No eating problems) all of which had structurally stable profiles and five of which showed stability at the individual level. The more pronounced DE clusters (Multiple eating problems including/without purging) were consistently associated with higher levels of psychological difficulties and lower levels of body esteem. Furthermore, girls that reported purging reported engaging in self-harm to a larger extent. Subgroups of 13-15 year old girls show stable patterns of disordered eating that are associated with higher rates of psychological impairment and lower body esteem. The subgroup of girls who engage in purging also engage in more deliberate self-harm.

  11. Whole brain white matter connectivity analysis using machine learning: An application to autism.

    PubMed

    Zhang, Fan; Savadjiev, Peter; Cai, Weidong; Song, Yang; Rathi, Yogesh; Tunç, Birkan; Parker, Drew; Kapur, Tina; Schultz, Robert T; Makris, Nikos; Verma, Ragini; O'Donnell, Lauren J

    2018-05-15

    In this paper, we propose an automated white matter connectivity analysis method for machine learning classification and characterization of white matter abnormality via identification of discriminative fiber tracts. The proposed method uses diffusion MRI tractography and a data-driven approach to find fiber clusters corresponding to subdivisions of the white matter anatomy. Features extracted from each fiber cluster describe its diffusion properties and are used for machine learning. The method is demonstrated by application to a pediatric neuroimaging dataset from 149 individuals, including 70 children with autism spectrum disorder (ASD) and 79 typically developing controls (TDC). A classification accuracy of 78.33% is achieved in this cross-validation study. We investigate the discriminative diffusion features based on a two-tensor fiber tracking model. We observe that the mean fractional anisotropy from the second tensor (associated with crossing fibers) is most affected in ASD. We also find that local along-tract (central cores and endpoint regions) differences between ASD and TDC are helpful in differentiating the two groups. These altered diffusion properties in ASD are associated with multiple robustly discriminative fiber clusters, which belong to several major white matter tracts including the corpus callosum, arcuate fasciculus, uncinate fasciculus and aslant tract; and the white matter structures related to the cerebellum, brain stem, and ventral diencephalon. These discriminative fiber clusters, a small part of the whole brain tractography, represent the white matter connections that could be most affected in ASD. Our results indicate the potential of a machine learning pipeline based on white matter fiber clustering. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2011-01-01

    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  13. Genetic Interaction Score (S-Score) Calculation, Clustering, and Visualization of Genetic Interaction Profiles for Yeast.

    PubMed

    Roguev, Assen; Ryan, Colm J; Xu, Jiewei; Colson, Isabelle; Hartsuiker, Edgar; Krogan, Nevan

    2018-02-01

    This protocol describes computational analysis of genetic interaction screens, ranging from data capture (plate imaging) to downstream analyses. Plate imaging approaches using both digital camera and office flatbed scanners are included, along with a protocol for the extraction of colony size measurements from the resulting images. A commonly used genetic interaction scoring method, calculation of the S-score, is discussed. These methods require minimal computer skills, but some familiarity with MATLAB and Linux/Unix is a plus. Finally, an outline for using clustering and visualization software for analysis of resulting data sets is provided. © 2018 Cold Spring Harbor Laboratory Press.

  14. Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

    ERIC Educational Resources Information Center

    DiStefano, Christine; Kamphaus, R. W.

    2006-01-01

    Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…

  15. Malignant pleural mesothelioma and mesothelial hyperplasia: A new molecular tool for the differential diagnosis.

    PubMed

    Bruno, Rossella; Alì, Greta; Giannini, Riccardo; Proietti, Agnese; Lucchi, Marco; Chella, Antonio; Melfi, Franca; Mussi, Alfredo; Fontanini, Gabriella

    2017-01-10

    Malignant pleural mesothelioma (MPM) is a rare asbestos related cancer, aggressive and unresponsive to therapies. Histological examination of pleural lesions is the gold standard of MPM diagnosis, although it is sometimes hard to discriminate the epithelioid type of MPM from benign mesothelial hyperplasia (MH).This work aims to define a new molecular tool for the differential diagnosis of MPM, using the expression profile of 117 genes deregulated in this tumour.The gene expression analysis was performed by nanoString System on tumour tissues from 36 epithelioid MPM and 17 MH patients, and on 14 mesothelial pleural samples analysed in a blind way. Data analysis included raw nanoString data normalization, unsupervised cluster analysis by Pearson correlation, non-parametric Mann Whitney U-test and molecular classification by the Uncorrelated Shrunken Centroid (USC) Algorithm.The Mann-Whitney U-test found 35 genes upregulated and 31 downregulated in MPM. The unsupervised cluster analysis revealed two clusters, one composed only of MPM and one only of MH samples, thus revealing class-specific gene profiles. The Uncorrelated Shrunken Centroid algorithm identified two classifiers, one including 22 genes and the other 40 genes, able to properly classify all the samples as benign or malignant using gene expression data; both classifiers were also able to correctly determine, in a blind analysis, the diagnostic categories of all the 14 unknown samples.In conclusion we delineated a diagnostic tool combining molecular data (gene expression) and computational analysis (USC algorithm), which can be applied in the clinical practice for the differential diagnosis of MPM.

  16. From the Cluster Temperature Function to the Mass Function at Low Z

    NASA Technical Reports Server (NTRS)

    Mushotzky, Richard (Technical Monitor); Markevitch, Maxim

    2004-01-01

    This XMM project consisted of three observations of the nearby, hot galaxy cluster Triangulum Australis, one of the cluster center and two offsets. The goal was to measure the radial gas temperature profile out to large radii and derive the total gravitating mass within the radius of average mass overdensity 500. The central pointing also provides data for a detailed two-dimensional gas temperature map of this interesting cluster. We have analyzed all three observations. The derivation of the temperature map using the central pointing is complete, and the paper is soon to be submitted. During the course of this study and of the analysis of archival XMM cluster observations, it became apparent that the commonly used XMM background flare screening techniques are often not accurate enough for studies of the cluster outer regions. The information on the cluster's total masses is contained at large off-center distances, and it is precisely the temperatures for those low-brightness regions that are most affected by the detector background anomalies. In particular, our two offset observations of the Triangulum have been contaminated by the background flares ("bad cosmic weather") to a degree where they could not be used for accurate spectral analysis. This forced us to expand the scope of our project. We needed to devise a more accurate method of screening and modeling the background flares, and to evaluate the uncertainty of the XMM background modeling. To do this, we have analyzed a large number of archival EPIC blank-field and closed-cover observations. As a result, we have derived stricter background screening criteria. It also turned out that mild flares affecting EPIC-pn can be modeled with an adequate accuracy. Such modeling has been used to derive our Triangulum temperature map. The results of our XMM background analysis, including the modeling recipes, are presented in a paper which is in final preparation and will be submitted soon. It will be useful not only for our future analysis but for other XMM cluster observations as well.

  17. Indoleamine Hallucinogens in Cluster Headache: Results of the Clusterbusters Medication Use Survey.

    PubMed

    Schindler, Emmanuelle A D; Gottschalk, Christopher H; Weil, Marsha J; Shapiro, Robert E; Wright, Douglas A; Sewell, Richard Andrew

    2015-01-01

    Cluster headache is one of the most debilitating pain syndromes. A significant number of patients are refractory to conventional therapies. The Clusterbusters.org medication use survey sought to characterize the effects of both conventional and alternative medications used in cluster headache. Participants were recruited from cluster headache websites and headache clinics. The final analysis included responses from 496 participants. The survey was modeled after previously published surveys and was available online. Most responses were chosen from a list, though others were free-texted. Conventional abortive and preventative medications were identified and their efficacies agreed with those previously published. The indoleamine hallucinogens, psilocybin, lysergic acid diethylamide, and lysergic acid amide, were comparable to or more efficacious than most conventional medications. These agents were also perceived to shorten/abort a cluster period and bring chronic cluster headache into remission more so than conventional medications. Furthermore, infrequent and non-hallucinogenic doses were reported to be efficacious. Findings provide additional evidence that several indoleamine hallucinogens are rated as effective in treating cluster headache. These data reinforce the need for further investigation of the effects of these and related compounds in cluster headache under experimentally controlled settings.

  18. MicroRNA cluster miR-17-92 regulates multiple functionally related voltage-gated potassium channels in chronic neuropathic pain

    PubMed Central

    Sakai, Atsushi; Saitow, Fumihito; Maruyama, Motoyo; Miyake, Noriko; Miyake, Koichi; Shimada, Takashi; Okada, Takashi; Suzuki, Hidenori

    2017-01-01

    miR-17-92 is a microRNA cluster with six distinct members. Here, we show that the miR-17-92 cluster and its individual members modulate chronic neuropathic pain. All cluster members are persistently upregulated in primary sensory neurons after nerve injury. Overexpression of miR-18a, miR-19a, miR-19b and miR-92a cluster members elicits mechanical allodynia in rats, while their blockade alleviates mechanical allodynia in a rat model of neuropathic pain. Plausible targets for the miR-17-92 cluster include genes encoding numerous voltage-gated potassium channels and their modulatory subunits. Single-cell analysis reveals extensive co-expression of miR-17-92 cluster and its predicted targets in primary sensory neurons. miR-17-92 downregulates the expression of potassium channels, and reduced outward potassium currents, in particular A-type currents. Combined application of potassium channel modulators synergistically alleviates mechanical allodynia induced by nerve injury or miR-17-92 overexpression. miR-17-92 cluster appears to cooperatively regulate the function of multiple voltage-gated potassium channel subunits, perpetuating mechanical allodynia. PMID:28677679

  19. PCA based clustering for brain tumor segmentation of T1w MRI images.

    PubMed

    Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay

    2017-03-01

    Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  20. Analyzing Protein Clusters on the Plasma Membrane: Application of Spatial Statistical Analysis Methods on Super-Resolution Microscopy Images.

    PubMed

    Paparelli, Laura; Corthout, Nikky; Pavie, Benjamin; Annaert, Wim; Munck, Sebastian

    2016-01-01

    The spatial distribution of proteins within the cell affects their capability to interact with other molecules and directly influences cellular processes and signaling. At the plasma membrane, multiple factors drive protein compartmentalization into specialized functional domains, leading to the formation of clusters in which intermolecule interactions are facilitated. Therefore, quantifying protein distributions is a necessity for understanding their regulation and function. The recent advent of super-resolution microscopy has opened up the possibility of imaging protein distributions at the nanometer scale. In parallel, new spatial analysis methods have been developed to quantify distribution patterns in super-resolution images. In this chapter, we provide an overview of super-resolution microscopy and summarize the factors influencing protein arrangements on the plasma membrane. Finally, we highlight methods for analyzing clusterization of plasma membrane proteins, including examples of their applications.

  1. Phenotypes of sleeplessness: stressing the need for psychodiagnostics in the assessment of insomnia.

    PubMed

    van de Laar, Merijn; Leufkens, Tim; Bakker, Bart; Pevernagie, Dirk; Overeem, Sebastiaan

    2017-09-01

    Insomnia is a too general term for various subtypes that might have different etiologies and therefore require different types of treatment. In this explorative study we used cluster analysis to distinguish different phenotypes in 218 patients with insomnia, taking into account several factors including sleep variables and characteristics related to personality and psychiatric comorbidity. Three clusters emerged from the analysis. The 'moderate insomnia with low psychopathology'-cluster was characterized by relatively normal personality traits, as well as normal levels of anxiety and depressive symptoms in the presence of moderate insomnia severity. The 'severe insomnia with moderate psychopathology'-cluster showed relatively high scores on the Insomnia Severity Index and scores on the sleep log that were indicative for severe insomnia. Anxiety and depressive symptoms were slightly above the cut-off and they were characterized by below average self-sufficiency and less goal-directed behavior. The 'early onset insomnia with high psychopathology'-cluster showed a much younger age and earlier insomnia onset than the other two groups. Anxiety and depressive symptoms were well above the cut-off score and the group consisted of a higher percentage of subjects with comorbid psychiatric disorders. This cluster showed a 'typical psychiatric' personality profile. Our findings stress the need for psychodiagnostic procedures next to a sleep-related diagnostic approach, especially in the younger insomnia patients. Specific treatment suggestions are given based on the three phenotypes.

  2. The Impact of Clinical, Demographic and Risk Factors on Rates of HIV Transmission: A Population-based Phylogenetic Analysis in British Columbia, Canada

    PubMed Central

    Poon, Art F. Y.; Joy, Jeffrey B.; Woods, Conan K.; Shurgold, Susan; Colley, Guillaume; Brumme, Chanson J.; Hogg, Robert S.; Montaner, Julio S. G.; Harrigan, P. Richard

    2015-01-01

    Background. The diversification of human immunodeficiency virus (HIV) is shaped by its transmission history. We therefore used a population based province wide HIV drug resistance database in British Columbia (BC), Canada, to evaluate the impact of clinical, demographic, and behavioral factors on rates of HIV transmission. Methods. We reconstructed molecular phylogenies from 27 296 anonymized bulk HIV pol sequences representing 7747 individuals in BC—about half the estimated HIV prevalence in BC. Infections were grouped into clusters based on phylogenetic distances, as a proxy for variation in transmission rates. Rates of cluster expansion were reconstructed from estimated dates of HIV seroconversion. Results. Our criteria grouped 4431 individuals into 744 clusters largely separated with respect to risk factors, including large established clusters predominated by injection drug users and more-recently emerging clusters comprising men who have sex with men. The mean log10 viral load of an individual's phylogenetic neighborhood (composed of 5 other individuals with shortest phylogenetic distances) increased their odds of appearing in a cluster by >2-fold per log10 viruses per milliliter. Conclusions. Hotspots of ongoing HIV transmission can be characterized in near real time by the secondary analysis of HIV resistance genotypes, providing an important potential resource for targeting public health initiatives for HIV prevention. PMID:25312037

  3. Identification of piecewise affine systems based on fuzzy PCA-guided robust clustering technique

    NASA Astrophysics Data System (ADS)

    Khanmirza, Esmaeel; Nazarahari, Milad; Mousavi, Alireza

    2016-12-01

    Hybrid systems are a class of dynamical systems whose behaviors are based on the interaction between discrete and continuous dynamical behaviors. Since a general method for the analysis of hybrid systems is not available, some researchers have focused on specific types of hybrid systems. Piecewise affine (PWA) systems are one of the subsets of hybrid systems. The identification of PWA systems includes the estimation of the parameters of affine subsystems and the coefficients of the hyperplanes defining the partition of the state-input domain. In this paper, we have proposed a PWA identification approach based on a modified clustering technique. By using a fuzzy PCA-guided robust k-means clustering algorithm along with neighborhood outlier detection, the two main drawbacks of the well-known clustering algorithms, i.e., the poor initialization and the presence of outliers, are eliminated. Furthermore, this modified clustering technique enables us to determine the number of subsystems without any prior knowledge about system. In addition, applying the structure of the state-input domain, that is, considering the time sequence of input-output pairs, provides a more efficient clustering algorithm, which is the other novelty of this work. Finally, the proposed algorithm has been evaluated by parameter identification of an IGV servo actuator. Simulation together with experiment analysis has proved the effectiveness of the proposed method.

  4. Proteomic analysis of protein-protein interactions within the Cysteine Sulfinate Desulfinase Fe-S cluster biogenesis system.

    PubMed

    Bolstad, Heather M; Botelho, Danielle J; Wood, Matthew J

    2010-10-01

    Fe-S cluster biogenesis is of interest to many fields, including bioenergetics and gene regulation. The CSD system is one of three Fe-S cluster biogenesis systems in E. coli and is comprised of the cysteine desulfurase CsdA, the sulfur acceptor protein CsdE, and the E1-like protein CsdL. The biological role, biochemical mechanism, and protein targets of the system remain uncharacterized. Here we present that the active site CsdE C61 has a lowered pK(a) value of 6.5, which is nearly identical to that of C51 in the homologous SufE protein and which is likely critical for its function. We observed that CsdE forms disulfide bonds with multiple proteins and identified the proteins that copurify with CsdE. The identification of Fe-S proteins and both putative and established Fe-S cluster assembly (ErpA, glutaredoxin-3, glutaredoxin-4) and sulfur trafficking (CsdL, YchN) proteins supports the two-pathway model, in which the CSD system is hypothesized to synthesize both Fe-S clusters and other sulfur-containing cofactors. We suggest that the identified Fe-S cluster assembly proteins may be the scaffold and/or shuttle proteins for the CSD system. By comparison with previous analysis of SufE, we demonstrate that there is some overlap in the CsdE and SufE interactomes.

  5. Risk Profiles for Injurious Falls in People Over 60: A Population-Based Cohort Study

    PubMed Central

    Ek, Stina; Rizzuto, Debora; Fratiglioni, Laura; Johnell, Kristina; Xu, Weili

    2018-01-01

    Abstract Background Although falls in older adults are related to multiple risk factors, these factors have commonly been studied individually. We aimed to identify risk profiles for injurious falls in older adults by detecting clusters of established risk factors and quantifying their impact on fall risk. Methods Participants were 2,566 people, aged 60 years and older, from the population-based Swedish National Study on Aging and Care in Kungsholmen. Injurious falls was defined as hospitalization for or receipt of outpatient care because a fall. Cluster analysis was used to identify aggregation of possible risk factors including chronic diseases, fall-risk increasing drugs (FRIDs), physical and cognitive impairments, and lifestyle-related factors. Associations between the clusters and injurious falls over 3, 5, and 10 years were estimated using flexible parametric survival models. Results Five clusters were identified including: a “healthy”, a “well-functioning with multimorbidity”, a “well-functioning, with multimorbidity and high FRID consumption”, a “physically and cognitively impaired”, and a “disabled” cluster. The risk of injurious falls for all groups was significantly higher than for the first cluster of healthy individuals in the reference category. Hazard ratios (95% confidence intervals) ranged from 1.71 (1.02–2.66) for the second cluster to 12.67 (7.38–21.75) for the last cluster over 3 years of follow-up. The highest risk was observed in the last two clusters with high burden of physical and cognitive impairments. Conclusion Risk factors for injurious fall tend to aggregate, representing different levels of risk for falls. Our findings can be useful to tailor and prioritize clinical and public health interventions. PMID:28605455

  6. Cross-scale analysis of cluster correspondence using different operational neighborhoods

    NASA Astrophysics Data System (ADS)

    Lu, Yongmei; Thill, Jean-Claude

    2008-09-01

    Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of multi-location events is examined for cluster correspondence if they are associated with the same operational neighborhood. This paper discusses the construction of operational neighborhoods for cluster correspondence analysis based on the spatial properties of the underlying zoning system and the scales at which the zones are aggregated into neighborhoods. Impacts of this construction on the degree of cluster correspondence are also analyzed. Empirical analyses of cluster correspondence between paired vehicle theft and recovery locations are conducted on different zoning methods and across a series of geographic scales and the dynamics of cluster correspondence patterns are discussed.

  7. Detection and clustering of features in aerial images by neuron network-based algorithm

    NASA Astrophysics Data System (ADS)

    Vozenilek, Vit

    2015-12-01

    The paper presents the algorithm for detection and clustering of feature in aerial photographs based on artificial neural networks. The presented approach is not focused on the detection of specific topographic features, but on the combination of general features analysis and their use for clustering and backward projection of clusters to aerial image. The basis of the algorithm is a calculation of the total error of the network and a change of weights of the network to minimize the error. A classic bipolar sigmoid was used for the activation function of the neurons and the basic method of backpropagation was used for learning. To verify that a set of features is able to represent the image content from the user's perspective, the web application was compiled (ASP.NET on the Microsoft .NET platform). The main achievements include the knowledge that man-made objects in aerial images can be successfully identified by detection of shapes and anomalies. It was also found that the appropriate combination of comprehensive features that describe the colors and selected shapes of individual areas can be useful for image analysis.

  8. The Earth Observation Technology Cluster

    NASA Astrophysics Data System (ADS)

    Aplin, P.; Boyd, D. S.; Danson, F. M.; Donoghue, D. N. M.; Ferrier, G.; Galiatsatos, N.; Marsh, A.; Pope, A.; Ramirez, F. A.; Tate, N. J.

    2012-07-01

    The Earth Observation Technology Cluster is a knowledge exchange initiative, promoting development, understanding and communication about innovative technology used in remote sensing of the terrestrial or land surface. This initiative provides an opportunity for presentation of novel developments from, and cross-fertilisation of ideas between, the many and diverse members of the terrestrial remote sensing community. The Earth Observation Technology Cluster involves a range of knowledge exchange activities, including organisation of technical events, delivery of educational materials, publication of scientific findings and development of a coherent terrestrial EO community. The initiative as a whole covers the full range of remote sensing operation, from new platform and sensor development, through image retrieval and analysis, to data applications and environmental modelling. However, certain topical and strategic themes have been selected for detailed investigation: (1) Unpiloted Aerial Vehicles, (2) Terrestrial Laser Scanning, (3) Field-Based Fourier Transform Infra-Red Spectroscopy, (4) Hypertemporal Image Analysis, and (5) Circumpolar and Cryospheric Application. This paper presents general activities and achievements of the Earth Observation Technology Cluster, and reviews state-of-the-art developments in the five specific thematic areas.

  9. Seasonal and spatial variations of water quality and trophic status in Daya Bay, South China Sea.

    PubMed

    Wu, Mei-Lin; Wang, You-Shao; Wang, Yu-Tu; Sun, Fu-Lin; Sun, Cui-Ci; Cheng, Hao; Dong, Jun-De

    2016-11-15

    Coastal water quality and trophic status are subject to intensive environmental stress induced by human activities and climate change. Quarterly cruises were conducted to identify environmental characteristics in Daya Bay in 2013. Water quality is spatially and temporally dynamic in the bay. Cluster analysis (CA) groups 12 monitoring stations into two clusters. Cluster I consists of stations (S1, S2, S4-S7, S9, and S12) located in the central, eastern, and southern parts of the bay, representing less polluted regions. Cluster II includes stations (S3, S8, S10, and S11) located in the western and northern parts of the bay, indicating the highly polluted regions receiving a high amount of wastewater and freshwater discharge. Principal component analysis (PCA) identified that water quality experience seasonal change (summer, winter, and spring-autumn seasons) because of two monsoons in the study area. Eutrophication in the bay is graded as high by Assessment of Estuarine Trophic Status (ASSETS). Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Genome mining of the sordarin biosynthetic gene cluster from Sordaria araneosa Cain ATCC 36386: characterization of cycloaraneosene synthase and GDP-6-deoxyaltrose transferase.

    PubMed

    Kudo, Fumitaka; Matsuura, Yasunori; Hayashi, Takaaki; Fukushima, Masayuki; Eguchi, Tadashi

    2016-07-01

    Sordarin is a glycoside antibiotic with a unique tetracyclic diterpene aglycone structure called sordaricin. To understand its intriguing biosynthetic pathway that may include a Diels-Alder-type [4+2]cycloaddition, genome mining of the gene cluster from the draft genome sequence of the producer strain, Sordaria araneosa Cain ATCC 36386, was carried out. A contiguous 67 kb gene cluster consisting of 20 open reading frames encoding a putative diterpene cyclase, a glycosyltransferase, a type I polyketide synthase, and six cytochrome P450 monooxygenases were identified. In vitro enzymatic analysis of the putative diterpene cyclase SdnA showed that it catalyzes the transformation of geranylgeranyl diphosphate to cycloaraneosene, a known biosynthetic intermediate of sordarin. Furthermore, a putative glycosyltransferase SdnJ was found to catalyze the glycosylation of sordaricin in the presence of GDP-6-deoxy-d-altrose to give 4'-O-demethylsordarin. These results suggest that the identified sdn gene cluster is responsible for the biosynthesis of sordarin. Based on the isolated potential biosynthetic intermediates and bioinformatics analysis, a plausible biosynthetic pathway for sordarin is proposed.

  11. Role of mtDNA haplogroups in the prevalence of osteoarthritis in different geographic populations: a meta-analysis.

    PubMed

    Shen, Jin-Ming; Feng, Lei; Feng, Chun

    2014-01-01

    Osteoarthritis (OA) is the most common form of arthritis and has become an increasingly important public-health problem. However, the pathogenesis of OA is still unclear. In recent years, its correlation with mtDNA haplogroups attracts much attention. We aimed to perform a meta-analysis to investigate the association between mtDNA haplogroups and OA. Published English or Chinese literature from PubMed, Web of Science, SDOS, and CNKI was retrieved up until April 15, 2014. Case-control or cohort studies that detected the frequency of mtDNA haplogroups in OA patients and controls were included. The quality of the included studies was evaluated by the Newcastle-Ottawa Scale (NOS) assessment. A meta-analysis was conducted to calculate pooled odds ratio (OR) with 95% confidence interval (CI) through the random or fixed effect model, which was selected based on the between-study heterogeneity assessed by Q test and I2 test. Subgroup analysis was performed to explore the origin of heterogeneity. A total of 6 case-control studies (10590 cases and 7161 controls) with an average NOS score of 6.9 were involved. For the analysis between mtDNA haplogroup J and OA, random model was selected due to high heterogeneity. No significant association was found initially (OR = 0.73, 95%CI: 0.52-1.03), however, once any study from UK population was removed the association emerged. Further subgroup analysis demonstrated that there was a significant association in Spain population (OR = 0.57, 95%CI: 0.46-0.71), but not in UK population. Also, subgroup analysis revealed that there was a significant correlation between cluster TJ and OA in Spain population (OR = 0.70, 95%CI: 0.58-0.84), although not in UK population. No significant correlation was found between haplogroup T/cluster HV/cluster KU and OA. Our current meta-analysis suggests that mtDNA haplogroup J and cluster TJ correlate with the risk of OA in Spanish population, but the associations in other populations require further investigation.

  12. Age and Mass for 920 Large Magellanic Cloud Clusters Derived from 100 Million Monte Carlo Simulations

    NASA Astrophysics Data System (ADS)

    Popescu, Bogdan; Hanson, M. M.; Elmegreen, Bruce G.

    2012-06-01

    We present new age and mass estimates for 920 stellar clusters in the Large Magellanic Cloud (LMC) based on previously published broadband photometry and the stellar cluster analysis package, MASSCLEANage. Expressed in the generic fitting formula, d 2 N/dMdtvpropM α t β, the distribution of observed clusters is described by α = -1.5 to -1.6 and β = -2.1 to -2.2. For 288 of these clusters, ages have recently been determined based on stellar photometric color-magnitude diagrams, allowing us to gauge the confidence of our ages. The results look very promising, opening up the possibility that this sample of 920 clusters, with reliable and consistent age, mass, and photometric measures, might be used to constrain important characteristics about the stellar cluster population in the LMC. We also investigate a traditional age determination method that uses a χ2 minimization routine to fit observed cluster colors to standard infinite-mass limit simple stellar population models. This reveals serious defects in the derived cluster age distribution using this method. The traditional χ2 minimization method, due to the variation of U, B, V, R colors, will always produce an overdensity of younger and older clusters, with an underdensity of clusters in the log (age/yr) = [7.0, 7.5] range. Finally, we present a unique simulation aimed at illustrating and constraining the fading limit in observed cluster distributions that includes the complex effects of stochastic variations in the observed properties of stellar clusters.

  13. An image processing pipeline to detect and segment nuclei in muscle fiber microscopic images.

    PubMed

    Guo, Yanen; Xu, Xiaoyin; Wang, Yuanyuan; Wang, Yaming; Xia, Shunren; Yang, Zhong

    2014-08-01

    Muscle fiber images play an important role in the medical diagnosis and treatment of many muscular diseases. The number of nuclei in skeletal muscle fiber images is a key bio-marker of the diagnosis of muscular dystrophy. In nuclei segmentation one primary challenge is to correctly separate the clustered nuclei. In this article, we developed an image processing pipeline to automatically detect, segment, and analyze nuclei in microscopic image of muscle fibers. The pipeline consists of image pre-processing, identification of isolated nuclei, identification and segmentation of clustered nuclei, and quantitative analysis. Nuclei are initially extracted from background by using local Otsu's threshold. Based on analysis of morphological features of the isolated nuclei, including their areas, compactness, and major axis lengths, a Bayesian network is trained and applied to identify isolated nuclei from clustered nuclei and artifacts in all the images. Then a two-step refined watershed algorithm is applied to segment clustered nuclei. After segmentation, the nuclei can be quantified for statistical analysis. Comparing the segmented results with those of manual analysis and an existing technique, we find that our proposed image processing pipeline achieves good performance with high accuracy and precision. The presented image processing pipeline can therefore help biologists increase their throughput and objectivity in analyzing large numbers of nuclei in muscle fiber images. © 2014 Wiley Periodicals, Inc.

  14. Two Point Autocorrelation Analysis of Auger Highest Energy Events Backtracked in Galactic Magnetic Field

    NASA Astrophysics Data System (ADS)

    Petrov, Yevgeniy

    2009-10-01

    Searches for sources of the highest-energy cosmic rays traditionally have included looking for clusters of event arrival directions on the sky. The smallest cluster is a pair of events falling within some angular window. In contrast to the standard two point (2-pt) autocorrelation analysis, this work takes into account influence of the galactic magnetic field (GMF). The highest energy events, those above 50EeV, collected by the surface detector of the Pierre Auger Observatory between January 1, 2004 and May 31, 2009 are used in the analysis. Having assumed protons as primaries, events are backtracked through BSS/S, BSS/A, ASS/S and ASS/A versions of Harari-Mollerach-Roulet (HMR) model of the GMF. For each version of the model, a 2-pt autocorrelation analysis is applied to the backtracked events and to 105 isotropic Monte Carlo realizations weighted by the Auger exposure. Scans in energy, separation angular window and different model parameters reveal clustering at different angular scales. Small angle clustering at 2-3 deg is particularly interesting and it is compared between different field scenarios. The strength of the autocorrelation signal at those angular scales differs between BSS and ASS versions of the HMR model. The BSS versions of the model tend to defocus protons as they arrive to Earth whereas for the ASS, in contrary, it is more likely to focus them.

  15. Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.

    PubMed

    Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut

    2015-03-01

    Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Predictability of Sleep in Patients with Insomnia

    PubMed Central

    Vallières, Annie; Ivers, Hans; Beaulieu-Bonneau, Simon; Morin, Charles M.

    2011-01-01

    Study Objectives: To evaluate whether the night-to-night variability in insomnia follows specific predictable patterns and to characterize sleep patterns using objective sleep and clinical variables. Design: Prospective observational study. Setting: University-affiliated sleep disorders center. Participants: 146 participants suffering from chronic and primary insomnia. Measurements and Results: Daily sleep diaries were completed for an average of 48 days and self-reported questionnaires once. Three nights were spent in the sleep laboratory for polysomnographic (PSG) assessment. Sleep efficiency, sleep onset latency, wake after sleep onset, and total sleep time were derived from sleep diaries and PSG. Time-series diary data were used to compute conditional probabilities of having an insomnia night after 1, 2, or 3 consecutive insomnia night(s). Conditional probabilities were submitted to a k-means cluster analysis. A 3-cluster solution was retained. One cluster included 38 participants exhibiting an unpredictable insomnia pattern. Another included 30 participants with a low and decreasing probability to have an insomnia night. The last cluster included 49 participants exhibiting a high probability to have insomnia every night. Clusters differed on age, insomnia severity, and mental fatigue, and on subjective sleep variables, but not on PSG sleep variables. Conclusion: These findings replicate our previous study and provide additional evidence that unpredictability is a less prevalent feature of insomnia than suggested previously in the literature. The presence of the 3 clusters is discussed in term of sleep perception and sleep homeostasis dysregulation. Citation: Vallières A; Ivers H; Beaulieu-Bonneau S; Morin CM. Predictability of sleep in patients with insomnia. SLEEP 2011;34(5):609-617. PMID:21532954

  17. Superhydrophilic nanostructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mao, Samuel S; Zormpa, Vasileia; Chen, Xiaobo

    2015-05-12

    An embodiment of a superhydrophilic nanostructure includes nanoparticles. The nanoparticles are formed into porous clusters. The porous clusters are formed into aggregate clusters. An embodiment of an article of manufacture includes the superhydrophilic nanostructure on a substrate. An embodiment of a method of fabricating a superhydrophilic nanostructure includes applying a solution that includes nanoparticles to a substrate. The substrate is heated to form aggregate clusters of porous clusters of the nanoparticles.

  18. Interactive visual exploration and refinement of cluster assignments.

    PubMed

    Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R

    2017-09-12

    With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.

  19. Clusters of Occupations Based on Systematically Derived Work Dimensions: An Exploratory Study.

    ERIC Educational Resources Information Center

    Cunningham, J. W.; And Others

    The study explored the feasibility of deriving an educationally relevant occupational cluster structure based on Occupational Analysis Inventory (OAI) work dimensions. A hierarchical cluster analysis was applied to the factor score profiles of 814 occupations on 22 higher-order OAI work dimensions. From that analysis, 73 occupational clusters were…

  20. Using cluster analysis to identify phenotypes and validation of mortality in men with COPD.

    PubMed

    Chen, Chiung-Zuei; Wang, Liang-Yi; Ou, Chih-Ying; Lee, Cheng-Hung; Lin, Chien-Chung; Hsiue, Tzuen-Ren

    2014-12-01

    Cluster analysis has been proposed to examine phenotypic heterogeneity in chronic obstructive pulmonary disease (COPD). The aim of this study was to use cluster analysis to define COPD phenotypes and validate them by assessing their relationship with mortality. Male subjects with COPD were recruited to identify and validate COPD phenotypes. Seven variables were assessed for their relevance to COPD, age, FEV(1) % predicted, BMI, history of severe exacerbations, mMRC, SpO(2), and Charlson index. COPD groups were identified by cluster analysis and validated prospectively against mortality during a 4-year follow-up. Analysis of 332 COPD subjects identified five clusters from cluster A to cluster E. Assessment of the predictive validity of these clusters of COPD showed that cluster E patients had higher all cause mortality (HR 18.3, p < 0.0001), and respiratory cause mortality (HR 21.5, p < 0.0001) than those in the other four groups. Cluster E patients also had higher all cause mortality (HR 14.3, p = 0.0002) and respiratory cause mortality (HR 10.1, p = 0.0013) than patients in cluster D alone. COPD patient with severe airflow limitation, many symptoms, and a history of frequent severe exacerbations was a novel and distinct clinical phenotype predicting mortality in men with COPD.

  1. Clinical evaluation of a novel population-based regression analysis for detecting glaucomatous visual field progression.

    PubMed

    Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C

    2011-04-01

    The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF clusters. © Georg Thieme Verlag KG Stuttgart · New York.

  2. Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range

    NASA Astrophysics Data System (ADS)

    Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.

    2014-01-01

    Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 < z < 0.9), all with HST imaging data available. This survey has two main objectives: to constrain dark energy (DE) using weak lensing tomography on galaxy clusters and to build a database (deep multi-band imaging allowing photometric redshift estimates, spectroscopic data, X-ray data) of rich distant clusters to study their properties. Aims: We analyse the structures of all the clusters in the DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster pericentre approach and are relatively recent infalls. We also find hints of a decreasing X-ray gas density profile core radius with redshift. Conclusions: The percentage of mass included in substructures was found to be roughly constant with redshift values of 5-15%, in agreement both with the general CDM framework and with the results of numerical simulations. Galaxies in substructures show the same general behaviour as regular cluster galaxies; however, in substructures, there is a deficiency of both late type and old stellar population galaxies. Late type galaxies with recent bursts of star formation seem to be missing in the substructures close to the bottom of the host cluster potential well. However, our sample would need to be increased to allow a more robust analysis. Tables 1, 2, 4 and Appendices A-C are available in electronic form at http://www.aanda.org

  3. Concerted Changes in Gene Expression and Cell Physiology of the Cyanobacterium Synechocystis sp. Strain PCC 6803 during Transitions between Nitrogen and Light-Limited Growth1[W][OA

    PubMed Central

    Aguirre von Wobeser, Eneas; Ibelings, Bas W.; Bok, Jasper; Krasikov, Vladimir; Huisman, Jef; Matthijs, Hans C.P.

    2011-01-01

    Physiological adaptation and genome-wide expression profiles of the cyanobacterium Synechocystis sp. strain PCC 6803 in response to gradual transitions between nitrogen-limited and light-limited growth conditions were measured in continuous cultures. Transitions induced changes in pigment composition, light absorption coefficient, photosynthetic electron transport, and specific growth rate. Physiological changes were accompanied by reproducible changes in the expression of several hundred open reading frames, genes with functions in photosynthesis and respiration, carbon and nitrogen assimilation, protein synthesis, phosphorus metabolism, and overall regulation of cell function and proliferation. Cluster analysis of the nearly 1,600 regulated open reading frames identified eight clusters, each showing a different temporal response during the transitions. Two large clusters mirrored each other. One cluster included genes involved in photosynthesis, which were up-regulated during light-limited growth but down-regulated during nitrogen-limited growth. Conversely, genes in the other cluster were down-regulated during light-limited growth but up-regulated during nitrogen-limited growth; this cluster included several genes involved in nitrogen uptake and assimilation. These results demonstrate complementary regulation of gene expression for two major metabolic activities of cyanobacteria. Comparison with batch-culture experiments revealed interesting differences in gene expression between batch and continuous culture and illustrates that continuous-culture experiments can pick up subtle changes in cell physiology and gene expression. PMID:21205618

  4. Detailed analysis of the supermarket task included on the Japanese version of the Rapid Dementia Screening Test.

    PubMed

    Moriyama, Yasushi; Yoshino, Aihide; Muramatsu, Taro; Mimura, Masaru

    2017-05-01

    The supermarket task, which is included in the Japanese version of the Rapid Dementia Screening Test, requires the quick (1 min) generation of words for things that can be bought in a supermarket. Cluster size and switches are investigated during this task. We investigated how the severity of dementia related to cluster size and switches on the supermarket task in patients with Alzheimer's disease. We administered the Japanese version of the Rapid Dementia Screening Test to 250 patients with very mild to severe Alzheimer's disease and to 49 healthy volunteers. Patients had Mini-Mental State Examination scores from 12 to 26 and Clinical Dementia Rating scale scores from 0.5 to 3. Patients were divided into four groups based on their Clinical Dementia Rating score (0.5, 1, 2, 3). We performed statistical analyses between the four groups and control subjects based on cluster size and switch scores on the supermarket task. The score for cluster size and switches deteriorated according to the severity of dementia. Moreover, for subjects with a Clinical Dementia Rating score of 0.5, cluster size was impaired, but switches were intact. Our findings indicate that the scores for cluster size and switches on the supermarket task may be useful for detecting the severity of symptoms of dementia in patients with Alzheimer's disease. © 2016 The Authors. Psychogeriatrics © 2016 Japanese Psychogeriatric Society.

  5. Diffuse Optical Light in Galaxy Clusters. II. Correlations with Cluster Properties

    NASA Astrophysics Data System (ADS)

    Krick, J. E.; Bernstein, R. A.

    2007-08-01

    We have measured the flux, profile, color, and substructure in the diffuse intracluster light (ICL) in a sample of 10 galaxy clusters with a range of mass, morphology, redshift, and density. Deep, wide-field observations for this project were made in two bands at the 1 m Swope and 2.5 m du Pont telescopes at Las Campanas Observatory. Careful attention in reduction and analysis was paid to the illumination correction, background subtraction, point-spread function determination, and galaxy subtraction. ICL flux is detected in both bands in all 10 clusters ranging from 7.6×1010 to 7.0×1011 h-170 Lsolar in r and 1.4×1010 to 1.2×1011 h-170 Lsolar in the B band. These fluxes account for 6%-22% of the total cluster light within one-quarter of the virial radius in r and 4%-21% in the B band. Average ICL B-r colors range from 1.5 to 2.8 mag when k- and evolution corrected to the present epoch. In several clusters we also detect ICL in group environments near the cluster center and up to 1 h-170 Mpc distant from the cluster center. Our sample, having been selected from the Abell sample, is incomplete in that it does not include high-redshift clusters with low density, low flux, or low mass, and it does not include low-redshift clusters with high flux, high mass, or high density. This bias makes it difficult to interpret correlations between ICL flux and cluster properties. Despite this selection bias, we do find that the presence of a cD galaxy corresponds to both centrally concentrated galaxy profiles and centrally concentrated ICL profiles. This is consistent with ICL either forming from galaxy interactions at the center or forming at earlier times in groups and later combining in the center.

  6. A Catalog of Galaxy Clusters Observed by XMM-Newton

    NASA Technical Reports Server (NTRS)

    Snowden, S. L.; Mushotzky, R. M.; Kuntz, K. D.; Davis, David S.

    2007-01-01

    Images and the radial profiles of the temperature, abundance, and brightness for 70 clusters of galaxies observed by XMM-Newton are presented along with a detailed discussion of the data reduction and analysis methods, including background modeling, which were used in the processing. Proper consideration of the various background components is vital to extend the reliable determination of cluster parameters to the largest possible cluster radii. The various components of the background including the quiescent particle background, cosmic diffuse emission, soft proton contamination, and solar wind charge exchange emission are discussed along with suggested means of their identification, filtering, and/or their modeling and subtraction. Every component is spectrally variable, sometimes significantly so, and all components except the cosmic background are temporally variable as well. The distributions of the events over the FOV vary between the components, and some distributions vary with energy. The scientific results from observations of low surface brightness objects and the diffuse background itself can be strongly affected by these background components and therefore great care should be taken in their consideration.

  7. The Swift AGN and Cluster Survey

    NASA Astrophysics Data System (ADS)

    Dai, Xinyu

    A key question in astrophysics is to constrain the evolution of the largest gravitationally bound structures in the universe. The serendipitous observations of Swift-XRT form an excellent medium-deep and wide soft X-ray survey, with a sky area of 160 square degrees at the flux limit of 5e-15 erg/s/cm^2. This survey is about an order of magnitude deeper than previous surveys of similar areas, and an order of magnitude wider than previous surveys of similar depth. It is comparable to the planned eROSITA deep survey, but already with the data several years ahead. The unique combination of the survey area and depth enables it to fill in the gap between the deep, pencil beam surveys (such as the Chandra Deep Fields) and the shallow, wide area surveys measured with ROSAT. With it, we will place independent and complementary measurements on the number counts and luminosity functions of X-ray sources. It has been proved that this survey is excellent for X-ray selected galaxy cluster surveys, based on our initial analysis of 1/4 of the fields and other independent studies. The highest priority goal is to produce the largest, uniformly selected catalog of X-ray selected clusters and increase the sample of intermediate to high redshift clusters (z > 0.5) by an order of magnitude. From this catalog, we will study the evolution of cluster number counts, luminosity function, scaling relations, and eventually the mass function. For example, various smaller scale surveys concluded divergently on the evolution of a key scaling relation, between temperature and luminosity of clusters. With the statistical power from this large sample, we will resolve the debate whether clusters evolve self-similarly. This is a crucial step in mapping cluster evolution and constraining cosmological models. First, we propose to extract the complete serendipitous extended source list for all Swift-XRT data to 2015. Second, we will use optical/IR observations to further identify galaxy clusters. These optical/IR observations include data from the SDSS, WISE, and deep optical follow-up observations from the APO, MDM, Magellan, and NOAO telescopes. WISE will confirm all z0.5 clusters. We will use ground-based observations to measure redshifts for z>0.5 clusters, with a focus of measuring 1/10 of the spectroscopic redshifts of z>0.5 clusters within the budget period. Third, we will analyze our deep Suzaku Xray follow-up observations of a sample of medium redshift clusters, and the 1/10 bright Swift clusters suitable for spectral analysis. We will also perform stacking analysis using the Swift data for clusters in different redshift bins to constrain the evolution of cluster properties.

  8. Symptom clusters and treatment time delay in Korean patients with ST-elevation myocardial infarction on admission.

    PubMed

    Kim, Hee-Sook; Eun, Sang Jun; Hwang, Jin Yong; Lee, Kun-Sei; Cho, Sung-Il

    2018-05-01

    Most patients with acute myocardial infarction (AMI) experience more than one symptom at onset. Although symptoms are an important early indicator, patients and physicians may have difficulty interpreting symptoms and detecting AMI at an early stage. This study aimed to identify symptom clusters among Korean patients with ST-elevation myocardial infarction (STEMI), to examine the relationship between symptom clusters and patient-related variables, and to investigate the influence of symptom clusters on treatment time delay (decision time [DT], onset-to-balloon time [OTB]). This was a prospective multicenter study with a descriptive design that used face-to-face interviews. A total of 342 patients with STEMI were included in this study. To identify symptom clusters, two-step cluster analysis was performed using SPSS software. Multinomial logistic regression to explore factors related to each cluster and multiple logistic regression to determine the effect of symptom clusters on treatment time delay were conducted. Three symptom clusters were identified: cluster 1 (classic MI; characterized by chest pain); cluster 2 (stress symptoms; sweating and chest pain); and cluster 3 (multiple symptoms; dizziness, sweating, chest pain, weakness, and dyspnea). Compared with patients in clusters 2 and 3, those in cluster 1 were more likely to have diabetes or prior MI. Patients in clusters 2 and 3, who predominantly showed other symptoms in addition to chest pain, had a significantly shorter DT and OTB than those in cluster 1. In conclusion, to decrease treatment time delay, it seems important that patients and clinicians recognize symptom clusters, rather than relying on chest pain alone. Further research is necessary to translate our findings into clinical practice and to improve patient education and public education campaigns.

  9. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    PubMed

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  10. Raman spectroscopy of normal oral buccal mucosa tissues: study on intact and incised biopsies

    NASA Astrophysics Data System (ADS)

    Deshmukh, Atul; Singh, S. P.; Chaturvedi, Pankaj; Krishna, C. Murali

    2011-12-01

    Oral squamous cell carcinoma is one of among the top 10 malignancies. Optical spectroscopy, including Raman, is being actively pursued as alternative/adjunct for cancer diagnosis. Earlier studies have demonstrated the feasibility of classifying normal, premalignant, and malignant oral ex vivo tissues. Spectral features showed predominance of lipids and proteins in normal and cancer conditions, respectively, which were attributed to membrane lipids and surface proteins. In view of recent developments in deep tissue Raman spectroscopy, we have recorded Raman spectra from superior and inferior surfaces of 10 normal oral tissues on intact, as well as incised, biopsies after separation of epithelium from connective tissue. Spectral variations and similarities among different groups were explored by unsupervised (principal component analysis) and supervised (linear discriminant analysis, factorial discriminant analysis) methodologies. Clusters of spectra from superior and inferior surfaces of intact tissues show a high overlap; whereas spectra from separated epithelium and connective tissue sections yielded clear clusters, though they also overlap on clusters of intact tissues. Spectra of all four groups of normal tissues gave exclusive clusters when tested against malignant spectra. Thus, this study demonstrates that spectra recorded from the superior surface of an intact tissue may have contributions from deeper layers but has no bearing from the classification of a malignant tissues point of view.

  11. Lagged segmented Poincaré plot analysis for risk stratification in patients with dilated cardiomyopathy.

    PubMed

    Voss, Andreas; Fischer, Claudia; Schroeder, Rico; Figulla, Hans R; Goernig, Matthias

    2012-07-01

    The objectives of this study were to introduce a new type of heart-rate variability analysis improving risk stratification in patients with idiopathic dilated cardiomyopathy (DCM) and to provide additional information about impaired heart beat generation in these patients. Beat-to-beat intervals (BBI) of 30-min ECGs recorded from 91 DCM patients and 21 healthy subjects were analyzed applying the lagged segmented Poincaré plot analysis (LSPPA) method. LSPPA includes the Poincaré plot reconstruction with lags of 1-100, rotating the cloud of points, its normalized segmentation adapted to their standard deviations, and finally, a frequency-dependent clustering. The lags were combined into eight different clusters representing specific frequency bands within 0.012-1.153 Hz. Statistical differences between low- and high-risk DCM could be found within the clusters II-VIII (e.g., cluster IV: 0.033-0.038 Hz; p = 0.0002; sensitivity = 85.7 %; specificity = 71.4 %). The multivariate statistics led to a sensitivity of 92.9 %, specificity of 85.7 % and an area under the curve of 92.1 % discriminating these patient groups. We introduced the LSPPA method to investigate time correlations in BBI time series. We found that LSPPA contributes considerably to risk stratification in DCM and yields the highest discriminant power in the low and very low-frequency bands.

  12. Conceptions of Memorizing and Understanding in Learning, and Self-Efficacy Held by University Biology Majors

    NASA Astrophysics Data System (ADS)

    Lin, Tzu-Chiang; Liang, Jyh-Chong; Tsai, Chin-Chung

    2015-02-01

    This study aims to explore Taiwanese university students' conceptions of learning biology as memorizing or as understanding, and their self-efficacy. To this end, two questionnaires were utilized to survey 293 Taiwanese university students with biology-related majors. A questionnaire for measuring students' conceptions of memorizing and understanding was validated through an exploratory factor analysis of participants' responses. As for the questionnaire regarding the students' biology learning self-efficacy (BLSE), an exploratory factor analysis revealed a total of four factors including higher-order cognitive skills (BLSE-HC), everyday application (BLSE-EA), science communication (BLSE-SC), and practical works (BLSE-PW). The results of the cluster analysis according to the participants' conceptions of learning biology indicated that students in the two major clusters either viewed learning biology as understanding or possessed mixed-conceptions of memorizing and understanding. The students in the third cluster mainly focused on memorizing in their learning while the students in the fourth cluster showed less agreement with both conceptions of memorizing and understanding. This study further revealed that the conception of learning as understanding was positively associated with the BLSE of university students with biology-related majors. However, the conception of learning as memorizing may foster students' BLSE only when such a notion co-exists with the conception of learning with understanding.

  13. A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.

    The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in jobmore » queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.« less

  14. Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

    PubMed

    Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

    2014-07-01

    Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

  15. Joining X-Ray to Lensing: An Accurate Combined Analysis of MACS J0416.1-2403

    NASA Astrophysics Data System (ADS)

    Bonamigo, M.; Grillo, C.; Ettori, S.; Caminha, G. B.; Rosati, P.; Mercurio, A.; Annunziatella, M.; Balestra, I.; Lombardi, M.

    2017-06-01

    We present a novel approach for a combined analysis of X-ray and gravitational lensing data and apply this technique to the merging galaxy cluster MACS J0416.1-2403. The method exploits the information on the intracluster gas distribution that comes from a fit of the X-ray surface brightness and then includes the hot gas as a fixed mass component in the strong-lensing analysis. With our new technique, we can separate the collisional from the collision-less diffuse mass components, thus obtaining a more accurate reconstruction of the dark matter distribution in the core of a cluster. We introduce an analytical description of the X-ray emission coming from a set of dual pseudo-isothermal elliptical mass distributions, which can be directly used in most lensing softwares. By combining Chandra observations with Hubble Frontier Fields imaging and Multi Unit Spectroscopic Explorer spectroscopy in MACS J0416.1-2403, we measure a projected gas-to-total mass fraction of approximately 10% at 350 kpc from the cluster center. Compared to the results of a more traditional cluster mass model (diffuse halos plus member galaxies), we find a significant difference in the cumulative projected mass profile of the dark matter component and that the dark matter over total mass fraction is almost constant, out to more than 350 kpc. In the coming era of large surveys, these results show the need of multiprobe analyses for detailed dark matter studies in galaxy clusters.

  16. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality

    PubMed Central

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; MacEachren, Alan M

    2008-01-01

    Background Kulldorff's spatial scan statistic and its software implementation – SaTScan – are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. Results We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. Conclusion The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. Method We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit. PMID:18992163

  17. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality.

    PubMed

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; Maceachren, Alan M

    2008-11-07

    Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit.

  18. Shifting forest value orientations in the United States, 1980-2001: A computer content analysis

    Treesearch

    David N. Bengston; Trevor J. Webb; David P. Fan

    2004-01-01

    This paper examines three forest value orientations - clusters of interrelated values and basic beliefs about forests - that emerged from an analysis of the public discourse about forest planning, management, and policy in the United States. The value orientations include anthropocentric, biocentric, and moral/spiritual/aesthetic orientations toward forests. Computer...

  19. Spatiotemporal analysis of indigenous and imported dengue fever cases in Guangdong province, China.

    PubMed

    Li, Zhongjie; Yin, Wenwu; Clements, Archie; Williams, Gail; Lai, Shengjie; Zhou, Hang; Zhao, Dan; Guo, Yansha; Zhang, Yonghui; Wang, Jinfeng; Hu, Wenbiao; Yang, Weizhong

    2012-06-12

    Dengue fever has been a major public health concern in China since it re-emerged in Guangdong province in 1978. This study aimed to explore spatiotemporal characteristics of dengue fever cases for both indigenous and imported cases during recent years in Guangdong province, so as to identify high-risk areas of the province and thereby help plan resource allocation for dengue interventions. Notifiable cases of dengue fever were collected from all 123 counties of Guangdong province from 2005 to 2010. Descriptive temporal and spatial analysis were conducted, including plotting of seasonal distribution of cases, and creating choropleth maps of cumulative incidence by county. The space-time scan statistic was used to determine space-time clusters of dengue fever cases at the county level, and a geographical information system was used to visualize the location of the clusters. Analysis were stratified by imported and indigenous origin. 1658 dengue fever cases were recorded in Guangdong province during the study period, including 94 imported cases and 1564 indigenous cases. Both imported and indigenous cases occurred more frequently in autumn. The areas affected by the indigenous and imported cases presented a geographically expanding trend over the study period. The results showed that the most likely cluster of imported cases (relative risk = 7.52, p < 0.001) and indigenous cases (relative risk = 153.56, p < 0.001) occurred in the Pearl River Delta Area; while a secondary cluster of indigenous cases occurred in one district of the Chao Shan Area (relative risk = 471.25, p < 0.001). This study demonstrated that the geographic range of imported and indigenous dengue fever cases has expanded over recent years, and cases were significantly clustered in two heavily urbanised areas of Guangdong province. This provides the foundation for further investigation of risk factors and interventions in these high-risk areas.

  20. Beverage consumption patterns of Canadian adults aged 19 to 65 years.

    PubMed

    Nikpartow, Nooshin; Danyliw, Adrienne D; Whiting, Susan J; Lim, Hyun J; Vatanparast, Hassanali

    2012-12-01

    To investigate the beverage intake patterns of Canadian adults and explore characteristics of participants in different beverage clusters. Analyses of nationally representative data with cross-sectional complex stratified design. Canadian Community Health Survey, Cycle 2.2 (2004). A total of 14 277 participants aged 19-65 years, in whom dietary intake was assessed using a single 24 h recall, were included in the study. After determining total intake and the contribution of beverages to total energy intake among age/sex groups, cluster analysis (K-means method) was used to classify males and females into distinct clusters based on the dominant pattern of beverage intakes. To test differences across clusters, χ2 tests and 95 % confidence intervals of the mean intakes were used. Six beverage clusters in women and seven beverage clusters in men were identified. 'Sugar-sweetened' beverage clusters - regular soft drinks and fruit drinks - as well as a 'beer' cluster, appeared for both men and women. No 'milk' cluster appeared among women. The mean consumption of the dominant beverage in each cluster was higher among men than women. The 'soft drink' cluster in men had the lowest proportion of the higher levels of education, and in women the highest proportion of inactivity, compared with other beverage clusters. Patterns of beverage intake in Canadian women indicate high consumption of sugar-sweetened beverages particularly fruit drinks, low intake of milk and high intake of beer. These patterns in women have implications for poor bone health, risk of obesity and other morbidities.

Top