Sample records for factor analysis cluster

  1. Common factor analysis versus principal component analysis: choice for symptom cluster research.

    PubMed

    Kim, Hee-Ju

    2008-03-01

    The purpose of this paper is to examine differences between two factor analytical methods and their relevance for symptom cluster research: common factor analysis (CFA) versus principal component analysis (PCA). Literature was critically reviewed to elucidate the differences between CFA and PCA. A secondary analysis (N = 84) was utilized to show the actual result differences from the two methods. CFA analyzes only the reliable common variance of data, while PCA analyzes all the variance of data. An underlying hypothetical process or construct is involved in CFA but not in PCA. PCA tends to increase factor loadings especially in a study with a small number of variables and/or low estimated communality. Thus, PCA is not appropriate for examining the structure of data. If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research), CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.

  2. Analysis of risk factors for cluster behavior of dental implant failures.

    PubMed

    Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

    2017-08-01

    Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.

  3. Cardiometabolic Risk Clustering in Spinal Cord Injury: Results of Exploratory Factor Analysis

    PubMed Central

    2013-01-01

    Background: Evidence suggests an elevated prevalence of cardiometabolic risks among persons with spinal cord injury (SCI); however, the unique clustering of risk factors in this population has not been fully explored. Objective: The purpose of this study was to describe unique clustering of cardiometabolic risk factors differentiated by level of injury. Methods: One hundred twenty-one subjects (mean 37 ± 12 years; range, 18–73) with chronic C5 to T12 motor complete SCI were studied. Assessments included medical histories, anthropometrics and blood pressure, and fasting serum lipids, glucose, insulin, and hemoglobin A1c (HbA1c). Results: The most common cardiometabolic risk factors were overweight/obesity, high levels of low-density lipoprotein (LDL-C), and low levels of high-density lipoprotein (HDL-C). Risk clustering was found in 76.9% of the population. Exploratory principal component factor analysis using varimax rotation revealed a 3–factor model in persons with paraplegia (65.4% variance) and a 4–factor solution in persons with tetraplegia (73.3% variance). The differences between groups were emphasized by the varied composition of the extracted factors: Lipid Profile A (total cholesterol [TC] and LDL-C), Body Mass-Hypertension Profile (body mass index [BMI], systolic blood pressure [SBP], and fasting insulin [FI]); Glycemic Profile (fasting glucose and HbA1c), and Lipid Profile B (TG and HDL-C). BMI and SBP formed a separate factor only in persons with tetraplegia. Conclusions: Although the majority of the population with SCI has risk clustering, the composition of the risk clusters may be dependent on level of injury, based on a factor analysis group comparison. This is clinically plausible and relevant as tetraplegics tend to be hypo- to normotensive and more sedentary, resulting in lower HDL-C and a greater propensity toward impaired carbohydrate metabolism. PMID:23960702

  4. Cardiometabolic risk clustering in spinal cord injury: results of exploratory factor analysis.

    PubMed

    Libin, Alexander; Tinsley, Emily A; Nash, Mark S; Mendez, Armando J; Burns, Patricia; Elrod, Matt; Hamm, Larry F; Groah, Suzanne L

    2013-01-01

    Evidence suggests an elevated prevalence of cardiometabolic risks among persons with spinal cord injury (SCI); however, the unique clustering of risk factors in this population has not been fully explored. The purpose of this study was to describe unique clustering of cardiometabolic risk factors differentiated by level of injury. One hundred twenty-one subjects (mean 37 ± 12 years; range, 18-73) with chronic C5 to T12 motor complete SCI were studied. Assessments included medical histories, anthropometrics and blood pressure, and fasting serum lipids, glucose, insulin, and hemoglobin A1c (HbA1c). The most common cardiometabolic risk factors were overweight/obesity, high levels of low-density lipoprotein (LDL-C), and low levels of high-density lipoprotein (HDL-C). Risk clustering was found in 76.9% of the population. Exploratory principal component factor analysis using varimax rotation revealed a 3-factor model in persons with paraplegia (65.4% variance) and a 4-factor solution in persons with tetraplegia (73.3% variance). The differences between groups were emphasized by the varied composition of the extracted factors: Lipid Profile A (total cholesterol [TC] and LDL-C), Body Mass-Hypertension Profile (body mass index [BMI], systolic blood pressure [SBP], and fasting insulin [FI]); Glycemic Profile (fasting glucose and HbA1c), and Lipid Profile B (TG and HDL-C). BMI and SBP formed a separate factor only in persons with tetraplegia. Although the majority of the population with SCI has risk clustering, the composition of the risk clusters may be dependent on level of injury, based on a factor analysis group comparison. This is clinically plausible and relevant as tetraplegics tend to be hypo- to normotensive and more sedentary, resulting in lower HDL-C and a greater propensity toward impaired carbohydrate metabolism.

  5. Groundwater source contamination mechanisms: Physicochemical profile clustering, risk factor analysis and multivariate modelling

    NASA Astrophysics Data System (ADS)

    Hynds, Paul; Misstear, Bruce D.; Gill, Laurence W.; Murphy, Heather M.

    2014-04-01

    An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p = 0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p < 0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.

  6. The Computation of Orthogonal Independent Cluster Solutions and Their Oblique Analogs in Factor Analysis.

    ERIC Educational Resources Information Center

    Hofmann, Richard J.

    A very general model for the computation of independent cluster solutions in factor analysis is presented. The model is discussed as being either orthogonal or oblique. Furthermore, it is demonstrated that for every orthogonal independent cluster solution there is an oblique analog. Using three illustrative examples, certain generalities are made…

  7. Understanding the Support Needs of People with Intellectual and Related Developmental Disabilities through Cluster Analysis and Factor Analysis of Statewide Data

    ERIC Educational Resources Information Center

    Viriyangkura, Yuwadee

    2014-01-01

    Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…

  8. Recurrent-neural-network-based Boolean factor analysis and its application to word clustering.

    PubMed

    Frolov, Alexander A; Husek, Dusan; Polyakov, Pavel Yu

    2009-07-01

    The objective of this paper is to introduce a neural-network-based algorithm for word clustering as an extension of the neural-network-based Boolean factor analysis algorithm (Frolov , 2007). It is shown that this extended algorithm supports even the more complex model of signals that are supposed to be related to textual documents. It is hypothesized that every topic in textual data is characterized by a set of words which coherently appear in documents dedicated to a given topic. The appearance of each word in a document is coded by the activity of a particular neuron. In accordance with the Hebbian learning rule implemented in the network, sets of coherently appearing words (treated as factors) create tightly connected groups of neurons, hence, revealing them as attractors of the network dynamics. The found factors are eliminated from the network memory by the Hebbian unlearning rule facilitating the search of other factors. Topics related to the found sets of words can be identified based on the words' semantics. To make the method complete, a special technique based on a Bayesian procedure has been developed for the following purposes: first, to provide a complete description of factors in terms of component probability, and second, to enhance the accuracy of classification of signals to determine whether it contains the factor. Since it is assumed that every word may possibly contribute to several topics, the proposed method might be related to the method of fuzzy clustering. In this paper, we show that the results of Boolean factor analysis and fuzzy clustering are not contradictory, but complementary. To demonstrate the capabilities of this attempt, the method is applied to two types of textual data on neural networks in two different languages. The obtained topics and corresponding words are at a good level of agreement despite the fact that identical topics in Russian and English conferences contain different sets of keywords.

  9. Transcription factor clusters regulate genes in eukaryotic cells

    PubMed Central

    Hedlund, Erik G; Friemann, Rosmarie; Hohmann, Stefan

    2017-01-01

    Transcription is regulated through binding factors to gene promoters to activate or repress expression, however, the mechanisms by which factors find targets remain unclear. Using single-molecule fluorescence microscopy, we determined in vivo stoichiometry and spatiotemporal dynamics of a GFP tagged repressor, Mig1, from a paradigm signaling pathway of Saccharomyces cerevisiae. We find the repressor operates in clusters, which upon extracellular signal detection, translocate from the cytoplasm, bind to nuclear targets and turnover. Simulations of Mig1 configuration within a 3D yeast genome model combined with a promoter-specific, fluorescent translation reporter confirmed clusters are the functional unit of gene regulation. In vitro and structural analysis on reconstituted Mig1 suggests that clusters are stabilized by depletion forces between intrinsically disordered sequences. We observed similar clusters of a co-regulatory activator from a different pathway, supporting a generalized cluster model for transcription factors that reduces promoter search times through intersegment transfer while stabilizing gene expression. PMID:28841133

  10. Using Multilevel Factor Analysis with Clustered Data: Investigating the Factor Structure of the Positive Values Scale

    ERIC Educational Resources Information Center

    Huang, Francis L.; Cornell, Dewey G.

    2016-01-01

    Advances in multilevel modeling techniques now make it possible to investigate the psychometric properties of instruments using clustered data. Factor models that overlook the clustering effect can lead to underestimated standard errors, incorrect parameter estimates, and model fit indices. In addition, factor structures may differ depending on…

  11. Characteristic and factors of competitive maritime industry clusters in Indonesia

    NASA Astrophysics Data System (ADS)

    Marlyana, N.; Tontowi, A. E.; Yuniarto, H. A.

    2017-12-01

    Indonesia is situated in the strategic position between two oceans therefore is identified as a maritime state. The fact opens big opportunity to build a competitive maritime industry. However, potential factors to boost the competitive maritime industry still need to be explored. The objective of this paper is then to determine the main characteristics and potential factors of competitive maritime industry cluster. Qualitative analysis based on literature review has been carried out in two aspects. First, benchmarking analysis conducted to distinguish the most relevant factors of maritime clusters in several countries in Europe (Norway, Spain, South West of England) and Asia (China, South Korea, Malaysia). Seven key dimensions are used for this benchmarking. Secondly, the competitiveness of maritime clusters in Indonesia was diagnosed through a reconceptualization of Porter’s Diamond model. There were four interlinked of advanced factors in and between companies within clusters, which can be influenced in a proactive way by government.

  12. The contribution of psychological factors to recovery after mild traumatic brain injury: is cluster analysis a useful approach?

    PubMed

    Snell, Deborah L; Surgenor, Lois J; Hay-Smith, E Jean C; Williman, Jonathan; Siegert, Richard J

    2015-01-01

    Outcomes after mild traumatic brain injury (MTBI) vary, with slow or incomplete recovery for a significant minority. This study examines whether groups of cases with shared psychological factors but with different injury outcomes could be identified using cluster analysis. This is a prospective observational study following 147 adults presenting to a hospital-based emergency department or concussion services in Christchurch, New Zealand. This study examined associations between baseline demographic, clinical, psychological variables (distress, injury beliefs and symptom burden) and outcome 6 months later. A two-step approach to cluster analysis was applied (Ward's method to identify clusters, K-means to refine results). Three meaningful clusters emerged (high-adapters, medium-adapters, low-adapters). Baseline cluster-group membership was significantly associated with outcomes over time. High-adapters appeared recovered by 6-weeks and medium-adapters revealed improvements by 6-months. The low-adapters continued to endorse many symptoms, negative recovery expectations and distress, being significantly at risk for poor outcome more than 6-months after injury (OR (good outcome) = 0.12; CI = 0.03-0.53; p < 0.01). Cluster analysis supported the notion that groups could be identified early post-injury based on psychological factors, with group membership associated with differing outcomes over time. Implications for clinical care providers regarding therapy targets and cases that may benefit from different intensities of intervention are discussed.

  13. Psychological Factors Predict Local and Referred Experimental Muscle Pain: A Cluster Analysis in Healthy Adults

    PubMed Central

    Lee, Jennifer E.; Watson, David; Frey-Law, Laura A.

    2012-01-01

    Background Recent studies suggest an underlying three- or four-factor structure explains the conceptual overlap and distinctiveness of several negative emotionality and pain-related constructs. However, the validity of these latent factors for predicting pain has not been examined. Methods A cohort of 189 (99F; 90M) healthy volunteers completed eight self-report negative emotionality and pain-related measures (Eysenck Personality Questionnaire-Revised; Positive and Negative Affect Schedule; State-Trait Anxiety Inventory; Pain Catastrophizing Scale; Fear of Pain Questionnaire; Somatosensory Amplification Scale; Anxiety Sensitivity Index; Whiteley Index). Using principal axis factoring, three primary latent factors were extracted: General Distress; Catastrophic Thinking; and Pain-Related Fear. Using these factors, individuals clustered into three subgroups of high, moderate, and low negative emotionality responses. Experimental pain was induced via intramuscular acidic infusion into the anterior tibialis muscle, producing local (infusion site) and/or referred (anterior ankle) pain and hyperalgesia. Results Pain outcomes differed between clusters (multivariate analysis of variance and multinomial regression), with individuals in the highest negative emotionality cluster reporting the greatest local pain (p = 0.05), mechanical hyperalgesia (pressure pain thresholds; p = 0.009) and greater odds (2.21 OR) of experiencing referred pain compared to the lowest negative emotionality cluster. Conclusion Our results provide support for three latent psychological factors explaining the majority of the variance between several pain-related psychological measures, and that individuals in the high negative emotionality subgroup are at increased risk for (1) acute local muscle pain; (2) local hyperalgesia; and (3) referred pain using a standardized nociceptive input. PMID:23165778

  14. Application of Factor Analysis on the Financial Ratios of Indian Cement Industry and Validation of the Results by Cluster Analysis

    NASA Astrophysics Data System (ADS)

    De, Anupam; Bandyopadhyay, Gautam; Chakraborty, B. N.

    2010-10-01

    Financial ratio analysis is an important and commonly used tool in analyzing financial health of a firm. Quite a large number of financial ratios, which can be categorized in different groups, are used for this analysis. However, to reduce number of ratios to be used for financial analysis and regrouping them into different groups on basis of empirical evidence, Factor Analysis technique is being used successfully by different researches during the last three decades. In this study Factor Analysis has been applied over audited financial data of Indian cement companies for a period of 10 years. The sample companies are listed on the Stock Exchange India (BSE and NSE). Factor Analysis, conducted over 44 variables (financial ratios) grouped in 7 categories, resulted in 11 underlying categories (factors). Each factor is named in an appropriate manner considering the factor loads and constituent variables (ratios). Representative ratios are identified for each such factor. To validate the results of Factor Analysis and to reach final conclusion regarding the representative ratios, Cluster Analysis had been performed.

  15. Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis

    PubMed Central

    Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis

    2016-01-01

    Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230

  16. Subphenotypes of mild-to-moderate COPD by factor and cluster analysis of pulmonary function, CT imaging and breathomics in a population-based survey.

    PubMed

    Fens, Niki; van Rossum, Annelot G J; Zanen, Pieter; van Ginneken, Bram; van Klaveren, Rob J; Zwinderman, Aeilko H; Sterk, Peter J

    2013-06-01

    Classification of COPD is currently based on the presence and severity of airways obstruction. However, this may not fully reflect the phenotypic heterogeneity of COPD in the (ex-) smoking community. We hypothesized that factor analysis followed by cluster analysis of functional, clinical, radiological and exhaled breath metabolomic features identifies subphenotypes of COPD in a community-based population of heavy (ex-) smokers. Adults between 50-75 years with a smoking history of at least 15 pack-years derived from a random population-based survey as part of the NELSON study underwent detailed assessment of pulmonary function, chest CT scanning, questionnaires and exhaled breath molecular profiling using an electronic nose. Factor and cluster analyses were performed on the subgroup of subjects fulfilling the GOLD criteria for COPD (post-BD FEV1/FVC < 0.70). Three hundred subjects were recruited, of which 157 fulfilled the criteria for COPD and were included in the factor and cluster analysis. Four clusters were identified: cluster 1 (n = 35; 22%): mild COPD, limited symptoms and good quality of life. Cluster 2 (n = 48; 31%): low lung function, combined emphysema and chronic bronchitis and a distinct breath molecular profile. Cluster 3 (n = 60; 38%): emphysema predominant COPD with preserved lung function. Cluster 4 (n = 14; 9%): highly symptomatic COPD with mildly impaired lung function. In a leave-one-out validation analysis an accuracy of 97.4% was reached. This unbiased taxonomy for mild to moderate COPD reinforces clusters found in previous studies and thereby allows better phenotyping of COPD in the general (ex-) smoking population.

  17. Clinical Phenotype of Diabetic Peripheral Neuropathy and Relation to Symptom Patterns: Cluster and Factor Analysis in Patients with Type 2 Diabetes in Korea.

    PubMed

    Won, Jong Chul; Im, Yong-Jin; Lee, Ji-Hyun; Kim, Chong Hwa; Kwon, Hyuk Sang; Cha, Bong-Yun; Park, Tae Sun

    2017-01-01

    Patients with diabetic peripheral neuropathy (DPN) is the most common complication. However, patients are usually suffering from not only diverse sensory deficit but also neuropathy-related discomforts. The aim of this study is to identify distinct groups of patients with DPN with respect to its clinical impacts on symptom patterns and comorbidities. A hierarchical cluster analysis and factor analysis were performed to identify relevant subgroups of patients with DPN ( n = 1338) and symptom patterns. Patients with DPN were divided into three clusters: asymptomatic (cluster 1, n = 448, 33.5%), moderate symptoms with disturbed sleep (cluster 2, n = 562, 42.0%), and severe symptoms with decreased quality of life (cluster 3, n = 328, 24.5%). Patients in cluster 3, compared with clusters 1 and 2, were characterized by higher levels of HbA1c and more severe pain and physical impairments. Patients in cluster 2 had moderate pain levels but disturbed sleep patterns comparable to those in cluster 3. The frequency of symptoms on each item of MNSI by "painful" symptom pattern showed a similar distribution pattern with increasing intensities along the three clusters. Cluster and factor analysis endorsed the use of comprehensive and symptomatic subgrouping to individualize the evaluation of patients with DPN.

  18. Consanguinity and family clustering of male factor infertility in Lebanon.

    PubMed

    Inhorn, Marcia C; Kobeissi, Loulou; Nassar, Zaher; Lakkis, Da'ad; Fakih, Michael H

    2009-04-01

    To investigate the influence of consanguineous marriage on male factor infertility in Lebanon, where rates of consanguineous marriage remain high (29.6% among Muslims, 16.5% among Christians). Clinic-based, case-control study, using reproductive history, risk factor interview, and laboratory-based semen analysis. Two IVF clinics in Beirut, Lebanon, during an 8-month period (January-August 2003). One hundred twenty infertile male patients and 100 fertile male controls, distinguished by semen analysis and reproductive history. None. Standard clinical semen analysis. The rates of consanguineous marriage were relatively high among the study sample. Patients (46%) were more likely than controls (37%) to report first-degree (parental) and second-degree (grandparental) consanguinity. The study demonstrated a clear pattern of family clustering of male factor infertility, with patients significantly more likely than controls to report infertility among close male relatives (odds ratio = 2.58). Men with azoospermia and severe oligospermia showed high rates of both consanguinity (50%) and family clustering (41%). Consanguineous marriage is a socially supported institution throughout the Muslim world, yet its relationship to infertility is poorly understood. This study demonstrated a significant association between consanguinity and family clustering of male factor infertility cases, suggesting a strong genetic component.

  19. FACTOR ANALYTIC MODELS OF CLUSTERED MULTIVARIATE DATA WITH INFORMATIVE CENSORING

    EPA Science Inventory

    This paper describes a general class of factor analytic models for the analysis of clustered multivariate data in the presence of informative missingness. We assume that there are distinct sets of cluster-level latent variables related to the primary outcomes and to the censorin...

  20. Factor Analysis and Counseling Research

    ERIC Educational Resources Information Center

    Weiss, David J.

    1970-01-01

    Topics discussed include factor analysis versus cluster analysis, analysis of Q correlation matrices, ipsativity and factor analysis, and tests for the significance of a correlation matrix prior to application of factor analytic techniques. Techniques for factor extraction discussed include principal components, canonical factor analysis, alpha…

  1. Exploring syndrome differentiation using non-negative matrix factorization and cluster analysis in patients with atopic dermatitis.

    PubMed

    Yun, Younghee; Jung, Wonmo; Kim, Hyunho; Jang, Bo-Hyoung; Kim, Min-Hee; Noh, Jiseong; Ko, Seong-Gyu; Choi, Inhwa

    2017-08-01

    Syndrome differentiation (SD) results in a diagnostic conclusion based on a cluster of concurrent symptoms and signs, including pulse form and tongue color. In Korea, there is a strong interest in the standardization of Traditional Medicine (TM). In order to standardize TM treatment, standardization of SD should be given priority. The aim of this study was to explore the SD, or symptom clusters, of patients with atopic dermatitis (AD) using non-negative factorization methods and k-means clustering analysis. We screened 80 patients and enrolled 73 eligible patients. One TM dermatologist evaluated the symptoms/signs using an existing clinical dataset from patients with AD. This dataset was designed to collect 15 dermatologic and 18 systemic symptoms/signs associated with AD. Non-negative matrix factorization was used to decompose the original data into a matrix with three features and a weight matrix. The point of intersection of the three coordinates from each patient was placed in three-dimensional space. With five clusters, the silhouette score reached 0.484, and this was the best silhouette score obtained from two to nine clusters. Patients were clustered according to the varying severity of concurrent symptoms/signs. Through the distribution of the null hypothesis generated by 10,000 permutation tests, we found significant cluster-specific symptoms/signs from the confidence intervals in the upper and lower 2.5% of the distribution. Patients in each cluster showed differences in symptoms/signs and severity. In a clinical situation, SD and treatment are based on the practitioners' observations and clinical experience. SD, identified through informatics, can contribute to development of standardized, objective, and consistent SD for each disease. Copyright © 2017. Published by Elsevier Ltd.

  2. Spatial cluster for clustering the influence factor of birth and death child in Bogor Regency, West Java

    NASA Astrophysics Data System (ADS)

    Bekti, Rokhana Dwi; Rachmawati, Ro'fah

    2014-03-01

    The number of birth and death child is the benchmarks to determine and monitor the health and welfare in Indonesia. It can be used to identify groups of people who have a high mortality risk. Identifying group is important to compare the characteristics of human that have high and low risk. These characteristics can be seen from the factors that influenced it. Furthermore, there are factors which influence of birth and death child, such us economic, health facility, education, and others. The influence factors of every individual are different, but there are similarities some individuals which live close together or in the close locations. It means there was spatial effect. To identify group in this research, clustering is done by spatial cluster method, which is view to considering the influence of the location or the relationship between locations. One of spatial cluster method is Spatial 'K'luster Analysis by Tree Edge Removal (SKATER). The research was conducted in Bogor Regency, West Java. The goal was to get a cluster of districts based on the factors that influence birth and death child. SKATER build four number of cluster respectively consists of 26, 7, 2, and 5 districts. SKATER has good performance for clustering which include spatial effect. If it compare by other cluster method, Kmeans has good performance by MANOVA test.

  3. Research on the relationship between the elements and pharmacological activities in velvet antler using factor analysis and cluster analysis

    NASA Astrophysics Data System (ADS)

    Zhou, Libing

    2017-04-01

    Velvet antler has certain effect on improving the body's immune cells and the regulation of immune system function, nervous system, anti-stress, anti-aging and osteoporosis. It has medicinal applications to treat a wide range of diseases such as tissue wound healing, anti-tumor, cardiovascular disease, et al. Therefore, the research on the relationship between pharmacological activities and elements in velvet antler is of great significance. The objective of this study was to comprehensively evaluate 15 kinds of elements in different varieties of velvet antlers and study on the relationship between the elements and traditional Chinese medicine efficacy for the human. The factor analysis and the factor cluster analysis methods were used to analyze the data of elements in the sika velvet antler, cervus elaphus linnaeus, flower horse hybrid velvet antler, apiti (elk) velvet antler, male reindeer velvet antler and find out the relationship between 15 kinds of elements including Ca, P, Mg, Na, K, Fe, Cu, Mn, Al, Ba, Co, Sr, Cr, Zn and Ni. Combining with MATLAB2010 and SPSS software, the chemometrics methods were made on the relationship between the elements in velvet antler and the pharmacological activities. The first commonality factor F1 had greater load on the indexes of Ca, P, Mg, Co, Sr and Ni, and the second commonality factor F2 had greater load on the indexes of K, Mn, Zn and Cr, and the third commonality factor F3 had greater load on the indexes of Na, Cu and Ba, and the fourth commonality factor F4 had greater load on the indexes of Fe and Al. 15 kinds of elements in velvet antler in the order were elk velvet antler>flower horse hybrid velvet antler>cervus elaphus linnaeus>sika velvet antler>male reindeer velvet antler. Based on the factor analysis and the factor cluster analysis, a model for evaluating traditional Chinese medicine quality was constructed. These studies provide the scientific base and theoretical foundation for the future large-scale rational

  4. Cluster Correspondence Analysis.

    PubMed

    van de Velden, M; D'Enza, A Iodice; Palumbo, F

    2017-03-01

    A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.

  5. Clustering of risk factors for cardiometabolic diseases in low-income, female adolescents.

    PubMed

    Melo, Elza M F S de; Azevedo, George D; Silva, João B da; Lemos, Telma M A M; Maranhão, Técia M O; Freitas, Ana K M S O; Spyrides, Maria H; Costa, Eduardo C

    2016-02-16

    To assess the prevalence and clustering patterns of cardiometabolic risk factors among low-income, female adolescents. Cross-sectional study involving 196 students of public schools (11-19 years old). The following risk factors were considered in the analysis: excess weight, central obesity, dyslipidemia, high blood pressure, and high fasting glucose. The ratio between observed and expected prevalence and its confidence interval were used to identify clustering of risk factors that exceeded expected prevalence in the population. The most prevalent risk factors were dyslipidemia (70.9%), and central obesity (39.8%), followed by excess weight (29.6%), and high blood pressure (12.8%). A total of 42.9% of adolescents had two or more risk factors, and 24% had three or more. Excess weight, central obesity, and dyslipidemia were common risk factors in the clustering patterns that showed higher-than-expected prevalence. Clustering of risk factors (≥ two factors) among the adolescents showed considerable prevalence, and there was a non-casual coexistence of excess weight, central obesity, and dyslipidemia (mainly low HDL-cholesterol).

  6. Usage of K-cluster and factor analysis for grouping and evaluation the quality of olive oil in accordance with physico-chemical parameters

    NASA Astrophysics Data System (ADS)

    Milev, M.; Nikolova, Kr.; Ivanova, Ir.; Dobreva, M.

    2015-11-01

    25 olive oils were studied- different in origin and ways of extraction, in accordance with 17 physico-chemical parameters as follows: color parameters - a and b, light, fluorescence peaks, pigments - chlorophyll and β-carotene, fatty-acid content. The goals of the current study were: Conducting correlation analysis to find the inner relation between the studied indices; By applying factor analysis with the help of the method of Principal Components (PCA), to reduce the great number of variables into a few factors, which are of main importance for distinguishing the different types of olive oil;Using K-means cluster to compare and group the tested types olive oils based on their similarity. The inner relation between the studied indices was found by applying correlation analysis. A factor analysis using PCA was applied on the basis of the found correlation matrix. Thus the number of the studied indices was reduced to 4 factors, which explained 79.3% from the entire variation. The first one unified the color parameters, β-carotene and the related with oxidative products fluorescence peak - about 520 nm. The second one was determined mainly by the chlorophyll content and related to it fluorescence peak - about 670 nm. The third and the fourth factors were determined by the fatty-acid content of the samples. The third one unified the fatty-acids, which give us the opportunity to distinguish olive oil from the other plant oils - oleic, linoleic and stearin acids. The fourth factor included fatty-acids with relatively much lower content in the studied samples. It is enquired the number of clusters to be determined preliminary in order to apply the K-Cluster analysis. The variant K = 3 was worked out because the types of the olive oil were three. The first cluster unified all salad and pomace olive oils, the second unified the samples of extra virgin oilstaken as controls from producers, which were bought from the trade network. The third cluster unified samples from

  7. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    PubMed

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Factor analysis and cluster analysis applied to assess the water quality of middle and lower Han River in Central China

    NASA Astrophysics Data System (ADS)

    Kuo, Yi-Ming; Liu, Wen-Wen

    2015-04-01

    The Han River basin is one of the most important industrial and grain production bases in the central China. A lot of factories and towns have been established along the river where large farmlands are located nearby. In the last few decades the water quality of the Han River, specifically in middle and lower reaches, has gradually declined. The agricultural nonpoint pollution and municipal and industrial point pollution significantly degrade the water quality of the Han River. Factor analysis can be applied to reduce the dimensionality of a data set consisting of a large number of inter-related variables. Cluster analysis can classify the samples according to their similar characters. In this study, factor analysis is used to identify major pollution indicators, and cluster analysis is employed to classify the samples based on the sample locations and hydrochemical variables. Water samples were collected from 12 sample sites collected from Xiangyang City (middle Han River) to Wuhan City (lower Han River). Correlations among 25 hydrochemical variables are statistically examined. The important pollutants are determined by factor analysis. A three-factor model is determined and explains over 85% of the total river water quality variation. Factor 1, including SS, Chl-a, TN and TP, can be considered as the nonpoint source pollution. Factor 2, including Cl-, Br-, SO42-, Ca2+, Mg2+, K+, Fe2+ and PO43-, can be treated as the industrial pollutant pollution. Factor 3, including F- and NO3-, reflects the influence of the groundwater or self-purification capability of the river water. The various land uses along the Han River correlate well with the pollution types. In addition, the result showed that the water quality of Han River deteriorated gradually from middle to lower Han River. Some tributaries have been seriously polluted and significantly influence the mainstream water quality of the Han River. Finally, the result showed that the nonpoint pollution and the point

  9. Factor Analysis for Clustered Observations.

    ERIC Educational Resources Information Center

    Longford, N. T.; Muthen, B. O.

    1992-01-01

    A two-level model for factor analysis is defined, and formulas for a scoring algorithm for this model are derived. A simple noniterative method based on decomposition of total sums of the squares and cross-products is discussed and illustrated with simulated data and data from the Second International Mathematics Study. (SLD)

  10. Electrical Load Profile Analysis Using Clustering Techniques

    NASA Astrophysics Data System (ADS)

    Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.

    2017-03-01

    Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.

  11. Factors influencing the quality of life of haemodialysis patients according to symptom cluster.

    PubMed

    Shim, Hye Yeung; Cho, Mi-Kyoung

    2018-05-01

    To identify the characteristics in each symptom cluster and factors influencing the quality of life of haemodialysis patients in Korea according to cluster. Despite developments in renal replacement therapy, haemodialysis still restricts the activities of daily living due to pain and impairs physical functioning induced by the disease and its complications. Descriptive survey. Two hundred and thirty dialysis patients aged >18 years. They completed self-administered questionnaires of Dialysis Symptom Index and Kidney Disease Quality of Life instrument-Short Form 1.3. To determine the optimal number of clusters, the collected data were analysed using polytomous variable latent class analysis in R software (poLCA) to estimate the latent class models and the latent class regression models for polytomous outcome variables. Differences in characteristics, symptoms and QOL according to the symptom cluster of haemodialysis patients were analysed using the independent t test and chi-square test. The factors influencing the QOL according to symptom cluster were identified using hierarchical multiple regression analysis. Physical and emotional symptoms were significantly more severe, and the QOL was significantly worse in Cluster 1 than in Cluster 2. The factors influencing the QOL were spouse, job, insurance type and physical and emotional symptoms in Cluster 1, with these variables having an explanatory power of 60.9%. Physical and emotional symptoms were the only influencing factors in Cluster 2, and they had an explanatory power of 37.4%. Mitigating the symptoms experienced by haemodialysis patients and improving their QOL require educational and therapeutic symptom management interventions that are tailored according to the characteristics and symptoms in each cluster. The findings of this study are expected to lead to practical guidelines for addressing the symptoms experienced by haemodialysis patients, and they provide basic information for developing nursing

  12. Prevalence and risk factors of seizure clusters in adult patients with epilepsy.

    PubMed

    Chen, Baibing; Choi, Hyunmi; Hirsch, Lawrence J; Katz, Austen; Legge, Alexander; Wong, Rebecca A; Jiang, Alfred; Kato, Kenneth; Buchsbaum, Richard; Detyniecki, Kamil

    2017-07-01

    In the current study, we explored the prevalence of physician-confirmed seizure clusters. We also investigated potential clinical factors associated with the occurrence of seizure clusters overall and by epilepsy type. We reviewed medical records of 4116 adult (≥16years old) outpatients with epilepsy at our centers for documentation of seizure clusters. Variables including patient demographics, epilepsy details, medical and psychiatric history, AED history, and epilepsy risk factors were then tested against history of seizure clusters. Patients were then divided into focal epilepsy, idiopathic generalized epilepsy (IGE), or symptomatic generalized epilepsy (SGE), and the same analysis was run. Overall, seizure clusters were independently associated with earlier age of seizure onset, symptomatic generalized epilepsy (SGE), central nervous system (CNS) infection, cortical dysplasia, status epilepticus, absence of 1-year seizure freedom, and having failed 2 or more AEDs (P<0.0026). Patients with SGE (27.1%) were more likely to develop seizure clusters than patients with focal epilepsy (16.3%) and IGE (7.4%; all P<0.001). Analysis by epilepsy type showed that absence of 1-year seizure freedom since starting treatment at one of our centers was associated with seizure clustering in patients across all 3 epilepsy types. In patients with SGE, clusters were associated with perinatal/congenital brain injury. In patients with focal epilepsy, clusters were associated with younger age of seizure onset, complex partial seizures, cortical dysplasia, status epilepticus, CNS infection, and having failed 2 or more AEDs. In patients with IGE, clusters were associated with presence of an aura. Only 43.5% of patients with seizure clusters were prescribed rescue medications. Patients with intractable epilepsy are at a higher risk of developing seizure clusters. Factors such as having SGE, CNS infection, cortical dysplasia, status epilepticus or an early seizure onset, can also

  13. Comprehensive cluster analysis with Transitivity Clustering.

    PubMed

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  14. Suicide Clusters: A Review of Risk Factors and Mechanisms

    ERIC Educational Resources Information Center

    Haw, Camilla; Hawton, Keith; Niedzwiedz, Claire; Platt, Steve

    2013-01-01

    Suicide clusters, although uncommon, cause great concern in the communities in which they occur. We searched the world literature on suicide clusters and describe the risk factors and proposed psychological mechanisms underlying the spatio-temporal clustering of suicides (point clusters). Potential risk factors include male gender, being an…

  15. [Cluster analysis in biomedical researches].

    PubMed

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  16. Health in police officers: Role of risk factor clusters and police divisions.

    PubMed

    Habersaat, Stephanie A; Geiger, Ashley M; Abdellaoui, Sid; Wolf, Jutta M

    2015-10-01

    Law enforcement is a stressful occupation associated with significant health problems. To date, most studies have focused on one specific factor or one domain of risk factors (e.g., organizational, personal). However, it is more likely that specific combinations of risk factors are differentially health relevant and further, depend on the area of police work. A self-selected group of officers from the criminal, community, and emergency division (N = 84) of a Swiss state police department answered questionnaires assessing personal and organizational risk factors as well as mental and physical health indicators. In general, few differences were observed across divisions in terms of risk factors or health indicators. Cluster analysis of all risk factors established a high-risk and a low-risk cluster with significant links to all mental health outcomes. Risk cluster-by-division interactions revealed that, in the high-risk cluster, Emergency officers reported fewer physical symptoms, while community officers reported more posttraumatic stress symptoms. Criminal officers in the high-risk cluster tended to perceived more stress. Finally, perceived stress did not mediate the relationship between risk clusters and posttraumatic stress symptoms. In summary, our results support the notion that police officers are a heterogeneous population in terms of processes linking risk factors and health indicators. This heterogeneity thereby appeared to be more dependent on personal factors and individuals' perception of their own work conditions than division-specific work environments. Our findings further suggest that stress-reduction interventions that do not target job-relevant sources of stress may only show limited effectiveness in reducing health risks associated with police work. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Health in police officers: Role of risk factor clusters and police divisions

    PubMed Central

    Habersaat, Stephanie A.; Geiger, Ashley M.; Abdellaoui, Sid; Wolf, Jutta M.

    2015-01-01

    Objective Law enforcement is a stressful occupation associated with significant health problems. To date, most studies have focused on one specific factor or one domain of risk factors (e.g., organizational, personal). However, it is more likely that specific combinations of risk factors are differentially health relevant and further, depend on the area of police work. Methods A self-selected group of officers from the criminal, community, and emergency division (N = 84) of a Swiss state police department answered questionnaires assessing personal and organizational risk factors as well as mental and physical health indicators. Results In general, few differences were observed across divisions in terms of risk factors or health indicators. Cluster analysis of all risk factors established a high-risk and a low-risk cluster with significant links to all mental health outcomes. Risk cluster-by-division interactions revealed that, in the high-risk cluster, Emergency officers reported fewer physical symptoms, while community officers reported more posttraumatic stress symptoms. Criminal officers in the high-risk cluster tended to perceived more stress. Finally, perceived stress did not mediate the relationship between risk clusters and posttraumatic stress symptoms. Conclusion In summary, our results support the notion that police officers are a heterogeneous population in terms of processes linking risk factors and health indicators. This heterogeneity thereby appeared to be more dependent on personal factors and individuals' perception of their own work conditions than division-specific work environments. Our findings further suggest that stress-reduction interventions that do not target job-relevant sources of stress may only show limited effectiveness in reducing health risks associated with police work. PMID:26364008

  18. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

    PubMed Central

    Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

    2015-01-01

    Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700

  19. Clustering of risk factors for noncommunicable diseases in Brazilian adolescents: prevalence and correlates.

    PubMed

    Cureau, Felipe Vogt; Duarte, Paola; dos Santos, Daniela Lopes; Reichert, Felipe Fossati

    2014-07-01

    Few studies have investigated the prevalence and correlates of risk factors for noncommunicable diseases among Brazilian adolescents. We evaluated the clustering of risk factors and their associations with sociodemographic variables. We used a cross-sectional study carried out in 2011 comprising 1132 students aged 14-19 years from Santa Maria, Brazil. The cluster index was created as the sum of the risk factors. For the correlates analysis, a multinomial logistic regression was used. Furthermore, the observed/expected ratio was calculated. Prevalence of individual risk factors studied was as follows: 85.8% unhealthy diets, 53.5% physical inactivity, 31.3% elevated blood pressure, 23.9% overweight, 22.3% excessive drinking alcohol, and 8.6% smoking. Only 2.8% of the adolescents did not present any risk factor, while 21.7%, 40.9%, 23.1%, and 11.5% presented 1, 2, 3, and 4 or more risk factors, respectively. The most prevalent combination was between unhealthy diets and physical inactivity (observed/expected ratio =1.32; 95% CI: 1.16-1.49). Clustering of risk factors was directly associated with age and inversely associated with socioeconomic status. Clustering of risk factors for noncommunicable diseases is high in Brazilian adolescents. Preventive strategies are more likely to be successful if focusing on multiple risk factors, instead of a single one.

  20. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

    PubMed

    Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

    2015-01-01

    Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.

  1. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    PubMed

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, (P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

    PubMed

    He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

    2011-12-01

    Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.

  3. Altitude as a risk factor for the development of hypospadias. Geographical cluster distribution analysis in South America.

    PubMed

    Fernández, Nicolas; Lorenzo, Armando; Bägli, Darius; Zarante, Ignacio

    2016-10-01

    Hypospadias is the most common congenital anomaly affecting the genitals. It has been established as a multifactorial disease with increasing prevalence. Many risk factors have been identified such as prematurity, birth weight, mother's age, and exposure to endocrine disruptors. In recent decades multiple authors using surveillance systems have described an increase in prevalence of hypospadias, but most of the published literature comes from developed countries in Europe and North America and few of the published studies have involved cluster analysis. Few large-scale studies have been performed addressing the effect of altitude and other geographical aspects on the development of hypospadias. Acknowledging this limitation, we present novel results of a multinational spatial scan statistical analysis over a 30-year period in South America and an altitude analysis of hypospadias distribution on a continent level. A retrospective review was performed of the Latin American collaborative study of congenital malformations (ECLAMC). A total of 4,020,384 newborns was surveyed between 1982 and December 2011 in all participating centers. We selected all patients with hypospadias. All degrees of clinical severity were included in the analysis. Each participating center was geographically identified with its coordinates and altitude above sea level. A spatial scan statistical analysis was performed using Kulldorf's methodology and a prevalence trend analysis over time in centers below and above 2000 m. During the study period we found 159 hospitals in six different countries (Colombia, Bolivia, Brazil, Argentina, Chile, and Uruguay) with 4,537 cases of hypospadias and a global prevalence rate of 11.3/10,000 newborns. Trend analysis showed that centers below 2000 m had an increasing trend with an average of 10/10,000 newborns as opposed to those centers above 2000 m that showed a reducing trend with an average prevalence of 7.8 (p = 0.1246). We identified clusters with

  4. Graph analysis of cell clusters forming vascular networks

    NASA Astrophysics Data System (ADS)

    Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.

    2018-03-01

    This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.

  5. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    PubMed Central

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations

  6. Subtypes of female juvenile offenders: a cluster analysis of the Millon Adolescent Clinical Inventory.

    PubMed

    Stefurak, Tres; Calhoun, Georgia B

    2007-01-01

    The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.

  7. Mixture modelling for cluster analysis.

    PubMed

    McLachlan, G J; Chang, S U

    2004-10-01

    Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.

  8. CLUSFAVOR 5.0: hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles

    PubMed Central

    Peterson, Leif E

    2002-01-01

    CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816

  9. Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

    PubMed Central

    Sinyor, Mark; Schaffer, Ayal; Streiner, David L

    2014-01-01

    Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321

  10. A proximity-based graph clustering method for the identification and application of transcription factor clusters.

    PubMed

    Spadafore, Maxwell; Najarian, Kayvan; Boyle, Alan P

    2017-11-29

    Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions. Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF. We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions. The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics.

  11. Mismatch of Posttraumatic Stress Disorder (PTSD) Symptoms and DSM-IV Symptom Clusters in a Cancer Sample: Exploratory Factor Analysis of the PTSD Checklist-Civilian Version

    PubMed Central

    Shelby, Rebecca A.; Golden-Kreutz, Deanna M.; Andersen, Barbara L.

    2007-01-01

    The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV; American Psychiatric Association, 1994a) conceptualization of posttraumatic stress disorder (PTSD) includes three symptom clusters: reexperiencing, avoidance/numbing, and arousal. The PTSD Checklist-Civilian Version (PCL-C) corresponds to the DSM-IV PTSD symptoms. In the current study, we conducted exploratory factor analysis (EFA) of the PCL-C with two aims: (a) to examine whether the PCL-C evidenced the three-factor solution implied by the DSM-IV symptom clusters, and (b) to identify a factor solution for the PCL-C in a cancer sample. Women (N = 148) with Stage II or III breast cancer completed the PCL-C after completion of cancer treatment. We extracted two-, three-, four-, and five-factor solutions using EFA. Our data did not support the DSM-IV PTSD symptom clusters. Instead, EFA identified a four-factor solution including reexperiencing, avoidance, numbing, and arousal factors. Four symptom items, which may be confounded with illness and cancer treatment-related symptoms, exhibited poor factor loadings. Using these symptom items in cancer samples may lead to overdiagnosis of PTSD and inflated rates of PTSD symptoms. PMID:16281232

  12. Cluster analysis in phenotyping a Portuguese population.

    PubMed

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  13. A cluster analysis on road traffic accidents using genetic algorithms

    NASA Astrophysics Data System (ADS)

    Saharan, Sabariah; Baragona, Roberto

    2017-04-01

    The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.

  14. Identification of chronic rhinosinusitis phenotypes using cluster analysis.

    PubMed

    Soler, Zachary M; Hyer, J Madison; Ramakrishnan, Viswanathan; Smith, Timothy L; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J

    2015-05-01

    Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the Sino-Nasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Medical Outcomes Study Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT, and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Clustering was largely determined by age, severity of patient reported outcome measures, depression, and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures, including polyp/atopic status, prior surgery, B-SIT and asthma, did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score, and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. © 2015 ARS-AAOA, LLC.

  15. Data depth based clustering analysis

    DOE PAGES

    Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...

    2016-01-01

    Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less

  16. Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis

    PubMed Central

    Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.

    2012-01-01

    Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360

  17. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-07-01

    In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor

  18. Symptom clusters and related factors in bladder cancer patients three months after radical cystectomy.

    PubMed

    Ren, Hongyan; Tang, Ping; Zhao, Qinghua; Ren, Guosheng

    2017-08-23

    To identify symptom distress and clusters in patients 3 months after radical cystectomy and to explore their potential predictors. A cross-sectional design was used to investigate 99 bladder cancer patients 3 months after radical cystectomy. Data were collected by demographic and disease characteristic questionnaires, the symptom experience scale of the M.D. Anderson symptom inventory, two additional symptoms specific to radical cystectomy, and the functional assessment of cancer therapy questionnaire. A factor analysis, stepwise regression, and correlation analysis were applied. Three symptom clusters were identified: fatigue-malaise, gastrointestinal, and psycho-urinary. Age, complication severity, albumin post-surgery (negative), orthotropic neobladder reconstruction, adjuvant chemotherapy and American Society of Anesthesiologists (ASA) scores were significant predictors of fatigue-malaise. Adjuvant chemotherapy, orthotropic neobladder reconstruction, female gender, ASA scores and albumin (negative) were significant predictors of gastrointestinal symptoms. Being unmarried, having a higher educational level and complication severity were significant predictors of psycho-urinary symptoms. The correlations between clusters and for each cluster with quality of life were significant, with the highest correlation observed between the psycho-urinary cluster and quality of life. Bladder cancer patients experience concurrent symptoms that appear to cluster and are significantly correlated with quality of life. Moreover, symptom clusters may be predicted by certain demographic and clinical characteristics.

  19. Sputum neutrophils are associated with more severe asthma phenotypes using cluster analysis

    PubMed Central

    Moore, Wendy C.; Hastie, Annette T.; Li, Xingnan; Li, Huashi; Busse, William W.; Jarjour, Nizar N.; Wenzel, Sally E.; Peters, Stephen P.; Meyers, Deborah A.; Bleecker, Eugene R.

    2013-01-01

    Background Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified five asthma subphenotypes that represent the severity spectrum of early onset allergic asthma, late onset severe asthma and severe asthma with COPD characteristics. Analysis of induced sputum from a subset of SARP subjects showed four sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophils (≥2%) and neutrophils (≥40%) had characteristics of very severe asthma. Objective To better understand interactions between inflammation and clinical subphenotypes we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Methods Participants in SARP at three clinical sites who underwent sputum induction were included in this analysis (n=423). Fifteen variables including clinical characteristics and blood and sputum inflammatory cell assessments were selected by factor analysis for unsupervised cluster analysis. Results Four phenotypic clusters were identified. Cluster A (n=132) and B (n=127) subjects had mild-moderate early onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of Cluster C (n=117) and D (n=47) subjects who had moderate-severe asthma with frequent health care utilization despite treatment with high doses of inhaled or oral corticosteroids, and in Cluster D, reduced lung function. The majority these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophils were the most important variables determining cluster assignment. Conclusion This multivariate approach identified four asthma subphenotypes representing the severity spectrum from mild-moderate allergic asthma with minimal or eosinophilic predominant sputum inflammation to moderate-severe asthma with neutrophilic predominant or mixed granulocytic inflammation

  20. Using BMDP and SPSS for a Q factor analysis.

    PubMed

    Tanner, B A; Koning, S M

    1980-12-01

    While Euclidean distances and Q factor analysis may sometimes be preferred to correlation coefficients and cluster analysis for developing a typology, commercially available software does not always facilitate their use. Commands are provided for using BMDP and SPSS in a Q factor analysis with Euclidean distances.

  1. Stability-based validation of dietary patterns obtained by cluster analysis.

    PubMed

    Sauvageot, Nicolas; Schritz, Anna; Leite, Sonia; Alkerwi, Ala'a; Stranges, Saverio; Zannad, Faiez; Streel, Sylvie; Hoge, Axelle; Donneau, Anne-Françoise; Albert, Adelin; Guillaume, Michèle

    2017-01-14

    Cluster analysis is a data-driven method used to create clusters of individuals sharing similar dietary habits. However, this method requires specific choices from the user which have an influence on the results. Therefore, there is a need of an objective methodology helping researchers in their decisions during cluster analysis. The objective of this study was to use such a methodology based on stability of clustering solutions to select the most appropriate clustering method and number of clusters for describing dietary patterns in the NESCAV study (Nutrition, Environment and Cardiovascular Health), a large population-based cross-sectional study in the Greater Region (N = 2298). Clustering solutions were obtained with K-means, K-medians and Ward's method and a number of clusters varying from 2 to 6. Their stability was assessed with three indices: adjusted Rand index, Cramer's V and misclassification rate. The most stable solution was obtained with K-means method and a number of clusters equal to 3. The "Convenient" cluster characterized by the consumption of convenient foods was the most prevalent with 46% of the population having this dietary behaviour. In addition, a "Prudent" and a "Non-Prudent" patterns associated respectively with healthy and non-healthy dietary habits were adopted by 25% and 29% of the population. The "Convenient" and "Non-Prudent" clusters were associated with higher cardiovascular risk whereas the "Prudent" pattern was associated with a decreased cardiovascular risk. Associations with others factors showed that the choice of a specific dietary pattern is part of a wider lifestyle profile. This study is of interest for both researchers and public health professionals. From a methodological standpoint, we showed that using stability of clustering solutions could help researchers in their choices. From a public health perspective, this study showed the need of targeted health promotion campaigns describing the benefits of healthy

  2. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-11-01

    In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of

  3. Spatial clustering and risk factors of malaria infections in Bata district, Equatorial Guinea.

    PubMed

    Gómez-Barroso, Diana; García-Carrasco, Emely; Herrador, Zaida; Ncogo, Policarpo; Romay-Barja, María; Ondo Mangue, Martín Eka; Nseng, Gloria; Riloha, Matilde; Santana, Maria Angeles; Valladares, Basilio; Aparicio, Pilar; Benito, Agustín

    2017-04-12

    The transmission of malaria is intense in the majority of the countries of sub-Saharan Africa, particularly in those that are located along the Equatorial strip. The present study aimed to describe the current distribution of malaria prevalence among children and its environment-related factors as well as to detect malaria spatial clusters in the district of Bata, in Equatorial Guinea. From June to August 2013 a representative cross-sectional survey using a multistage, stratified, cluster-selected sample was carried out of children in urban and rural areas of Bata District. All children were tested for malaria using rapid diagnostic tests (RDTs). Results were linked to each household by global position system data. Two cluster analysis methods were used: hot spot analysis using the Getis-Ord Gi statistic, and the SaTScan™ spatial statistic estimates, based on the assumption of a Poisson distribution to detect spatial clusters. In addition, univariate associations and Poisson regression model were used to explore the association between malaria prevalence at household level with different environmental factors. A total of 1416 children aged 2 months to 15 years living in 417 households were included in this study. Malaria prevalence by RDTs was 47.53%, being highest in the age group 6-15 years (63.24%, p < 0.001). Those children living in rural areas were there malaria risk is greater (65.81%) (p < 0.001). Malaria prevalence was higher in those houses located <1 km from a river and <3 km to a forest (IRR: 1.31; 95% CI 1.13-1.51 and IRR: 1.44; 95% CI 1.25-1.66, respectively). Poisson regression analysis also showed a decrease in malaria prevalence with altitude (IRR: 0.73; 95% CI 0.62-0.86). A significant cluster inland of the district, in rural areas has been found. This study reveals a high prevalence of RDT-based malaria among children in Bata district. Those households situated in inland rural areas, near to a river, a green area and/or at low altitude

  4. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less

  5. Using Cluster Analysis to Examine Husband-Wife Decision Making

    ERIC Educational Resources Information Center

    Bonds-Raacke, Jennifer M.

    2006-01-01

    Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…

  6. DICON: interactive visual analysis of multidimensional clusters.

    PubMed

    Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin

    2011-12-01

    Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE

  7. Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis.

    PubMed

    Rennard, Stephen I; Locantore, Nicholas; Delafont, Bruno; Tal-Singer, Ruth; Silverman, Edwin K; Vestbo, Jørgen; Miller, Bruce E; Bakke, Per; Celli, Bartolomé; Calverley, Peter M A; Coxson, Harvey; Crim, Courtney; Edwards, Lisa D; Lomas, David A; MacNee, William; Wouters, Emiel F M; Yates, Julie C; Coca, Ignacio; Agustí, Alvar

    2015-03-01

    Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease that likely includes clinically relevant subgroups. To identify subgroups of COPD in ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) subjects using cluster analysis and to assess clinically meaningful outcomes of the clusters during 3 years of longitudinal follow-up. Factor analysis was used to reduce 41 variables determined at recruitment in 2,164 patients with COPD to 13 main factors, and the variables with the highest loading were used for cluster analysis. Clusters were evaluated for their relationship with clinically meaningful outcomes during 3 years of follow-up. The relationships among clinical parameters were evaluated within clusters. Five subgroups were distinguished using cross-sectional clinical features. These groups differed regarding outcomes. Cluster A included patients with milder disease and had fewer deaths and hospitalizations. Cluster B had less systemic inflammation at baseline but had notable changes in health status and emphysema extent. Cluster C had many comorbidities, evidence of systemic inflammation, and the highest mortality. Cluster D had low FEV1, severe emphysema, and the highest exacerbation and COPD hospitalization rate. Cluster E was intermediate for most variables and may represent a mixed group that includes further clusters. The relationships among clinical variables within clusters differed from that in the entire COPD population. Cluster analysis using baseline data in ECLIPSE identified five COPD subgroups that differ in outcomes and inflammatory biomarkers and show different relationships between clinical parameters, suggesting the clusters represent clinically and biologically different subtypes of COPD.

  8. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    PubMed

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  9. Generic, network schema agnostic sparse tensor factorization for single-pass clustering of heterogeneous information networks

    PubMed Central

    Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta

    2017-01-01

    Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic. PMID:28245222

  10. Generic, network schema agnostic sparse tensor factorization for single-pass clustering of heterogeneous information networks.

    PubMed

    Wu, Jibing; Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta

    2017-01-01

    Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic.

  11. Variable Screening for Cluster Analysis.

    ERIC Educational Resources Information Center

    Donoghue, John R.

    Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and…

  12. Somatotyping using 3D anthropometry: a cluster analysis.

    PubMed

    Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

    2013-01-01

    Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.

  13. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks

    PubMed Central

    Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin

    2017-01-01

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211

  14. Fascioliasis risk factors and space-time clusters in domestic ruminants in Bangladesh.

    PubMed

    Rahman, A K M Anisur; Islam, S K Shaheenur; Talukder, Md Hasanuzzaman; Hassan, Md Kumrul; Dhand, Navneet K; Ward, Michael P

    2017-05-08

    A retrospective observational study was conducted to identify fascioliasis hotspots, clusters, potential risk factors and to map fascioliasis risk in domestic ruminants in Bangladesh. Cases of fascioliasis in cattle, buffalo, sheep and goats from all districts in Bangladesh between 2011 and 2013 were identified via secondary surveillance data from the Department of Livestock Services' Epidemiology Unit. From each case report, date of report, species affected and district data were extracted. The total number of domestic ruminants in each district was used to calculate fascioliasis cases per ten thousand animals at risk per district, and this was used for cluster and hotspot analysis. Clustering was assessed with Moran's spatial autocorrelation statistic, hotspots with the local indicator of spatial association (LISA) statistic and space-time clusters with the scan statistic (Poisson model). The association between district fascioliasis prevalence and climate (temperature, precipitation), elevation, land cover and water bodies was investigated using a spatial regression model. A total of 1,723,971 cases of fascioliasis were reported in the three-year study period in cattle (1,164,560), goats (424,314), buffalo (88,924) and sheep (46,173). A total of nine hotspots were identified; one of these persisted in each of the three years. Only two local clusters were found. Five space-time clusters located within 22 districts were also identified. Annual risk maps of fascioliasis cases correlated with the hotspots and clusters detected. Cultivated and managed (P < 0.001) and artificial surface (P = 0.04) land cover areas, and elevation (P = 0.003) were positively and negatively associated with fascioliasis in Bangladesh, respectively. Results indicate that due to land use characteristics some areas of Bangladesh are at greater risk of fascioliasis. The potential risk factors, hot spots and clusters identified in this study can be used to guide science

  15. Adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization

    NASA Astrophysics Data System (ADS)

    Zhang, Tianzhen; Wang, Xiumei; Gao, Xinbo

    2018-04-01

    Nowadays, several datasets are demonstrated by multi-view, which usually include shared and complementary information. Multi-view clustering methods integrate the information of multi-view to obtain better clustering results. Nonnegative matrix factorization has become an essential and popular tool in clustering methods because of its interpretation. However, existing nonnegative matrix factorization based multi-view clustering algorithms do not consider the disagreement between views and neglects the fact that different views will have different contributions to the data distribution. In this paper, we propose a new multi-view clustering method, named adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization. The proposed algorithm can obtain the parts-based representation of multi-view data by nonnegative matrix factorization. Then, pairwise co-regularization is used to measure the disagreement between views. There is only one parameter to auto learning the weight values according to the contribution of each view to data distribution. Experimental results show that the proposed algorithm outperforms several state-of-the-arts algorithms for multi-view clustering.

  16. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

    NASA Astrophysics Data System (ADS)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

    2018-03-01

    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  17. Clustering of obesity and dental caries with lifestyle factors among Danish adolescents.

    PubMed

    Cinar, Ayse Basak; Christensen, Lisa Boge; Hede, Borge

    2011-01-01

    To assess any clustering between obesity, dental health, and lifestyle factors (dietary patterns, physical activity, smoking, and alcohol consumption) among adolescents. A cluster sample of 15-year-old Danish adolescents (DA) from eight municipalities was selected. Self-reported questionnaires for adolescents and their mothers to assess body-mass index (BMI), socioeconomic and lifestyle factors, and clinical examinations to examine adolescents' dental status (DMFT) were used. Descriptive statistics, chi-square tests, and factor analysis were applied. The mean DMFT was 2.03 and mean BMI was 21.30 among DA.Of the whole sample, 62% experienced caries (DMFT > 0) and 16% were classified as obese. No association appeared between obesity and DMFT (p > 0.05). Most adolescents were likely to have breakfast every day (76%), but their daily consumption of fruit was lower (38%). More than half of adolescents reported having physical exercise (66%) and no alcohol consumption (57%). Smokers were more likely to consume alcohol (80%) but less likely to exercise (44%) than nonsmokers (alcohol consumption, 55%; exercise, 68%), (P < 0.05). Principal component analysis revealed that DMFT and obesity were interrelated in DA. In line with earlier studies, obesity and dental caries share common lifestyle factors among adolescents, regardless of nationality and different health-care systems. Thus, it seems that dental health is a global health concern. There is a need for collaboration between dental and general health-care providers to manage both obesity and dental caries in adolescents by using a holistic approach.

  18. Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis.

    PubMed

    Moore, Wendy C; Hastie, Annette T; Li, Xingnan; Li, Huashi; Busse, William W; Jarjour, Nizar N; Wenzel, Sally E; Peters, Stephen P; Meyers, Deborah A; Bleecker, Eugene R

    2014-06-01

    Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified 5 asthma subphenotypes that represent the severity spectrum of early-onset allergic asthma, late-onset severe asthma, and severe asthma with chronic obstructive pulmonary disease characteristics. Analysis of induced sputum from a subset of SARP subjects showed 4 sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophil (≥2%) and neutrophil (≥40%) percentages had characteristics of very severe asthma. To better understand interactions between inflammation and clinical subphenotypes, we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Participants in SARP who underwent sputum induction at 3 clinical sites were included in this analysis (n = 423). Fifteen variables, including clinical characteristics and blood and sputum inflammatory cell assessments, were selected using factor analysis for unsupervised cluster analysis. Four phenotypic clusters were identified. Cluster A (n = 132) and B (n = 127) subjects had mild-to-moderate early-onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of cluster C (n = 117) and D (n = 47) subjects who had moderate-to-severe asthma with frequent health care use despite treatment with high doses of inhaled or oral corticosteroids and, in cluster D, reduced lung function. The majority of these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophil percentages were the most important variables determining cluster assignment. This multivariate approach identified 4 asthma subphenotypes representing the severity spectrum from mild-to-moderate allergic asthma with minimal or eosinophil-predominant sputum inflammation to moderate-to-severe asthma with neutrophil-predominant or mixed granulocytic

  19. Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

    ERIC Educational Resources Information Center

    DiStefano, Christine; Kamphaus, R. W.

    2006-01-01

    Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…

  20. Cluster analysis of cardiovascular and metabolic risk factors in women of reproductive age.

    PubMed

    Tzeng, Chii-Ruey; Chang, Yuan-chin Ivan; Chang, Yu-chia; Wang, Chia-Woei; Chen, Chi-Huang; Hsu, Ming-I

    2014-05-01

    To study the association between endocrine disturbances and metabolic complications in women seeking gynecologic care. Retrospective study, cluster analysis. Outpatient clinic, university medical center. 573 women, including 384 at low risk and 189 at high risk of cardiometabolic disease. None. Cardiovascular and metabolic parameters and clinical and biochemical characteristics. Risk factors for metabolic disease are associated with a low age of menarche, high levels of high-sensitivity C-reactive protein and liver enzymes, and low levels of sex hormone-binding globulin. Overweight/obese status, polycystic ovary syndrome, oligo/amenorrhea, and hyperandrogenism were found to increase the risk of cardiometabolic disease. However, hyperprolactinemia and premature ovarian failure were not associated with the risk of cardiometabolic disease. In terms of androgens, the serum total testosterone level and free androgen index but not androstenedione or dehydroepiandrosterone sulfate (DHEAS) were associated with cardiometabolic risk. Although polycystic ovary syndrome is associated with metabolic risk, obesity was the major determinant of cardiometabolic disturbances in reproductive-aged women. Hyperprolactinemia and premature ovarian failure were not associated with the risk of cardiovascular and metabolic diseases. NCT01826357. Copyright © 2014 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  1. Micro-scale Spatial Clustering of Cholera Risk Factors in Urban Bangladesh.

    PubMed

    Bi, Qifang; Azman, Andrew S; Satter, Syed Moinuddin; Khan, Azharul Islam; Ahmed, Dilruba; Riaj, Altaf Ahmed; Gurley, Emily S; Lessler, Justin

    2016-02-01

    Close interpersonal contact likely drives spatial clustering of cases of cholera and diarrhea, but spatial clustering of risk factors may also drive this pattern. Few studies have focused specifically on how exposures for disease cluster at small spatial scales. Improving our understanding of the micro-scale clustering of risk factors for cholera may help to target interventions and power studies with cluster designs. We selected sets of spatially matched households (matched-sets) near cholera case households between April and October 2013 in a cholera endemic urban neighborhood of Tongi Township in Bangladesh. We collected data on exposures to suspected cholera risk factors at the household and individual level. We used intra-class correlation coefficients (ICCs) to characterize clustering of exposures within matched-sets and households, and assessed if clustering depended on the geographical extent of the matched-sets. Clustering over larger spatial scales was explored by assessing the relationship between matched-sets. We also explored whether different exposures tended to appear together in individuals, households, and matched-sets. Household level exposures, including: drinking municipal supplied water (ICC = 0.97, 95%CI = 0.96, 0.98), type of latrine (ICC = 0.88, 95%CI = 0.71, 1.00), and intermittent access to drinking water (ICC = 0.96, 95%CI = 0.87, 1.00) exhibited strong clustering within matched-sets. As the geographic extent of matched-sets increased, the concordance of exposures within matched-sets decreased. Concordance between matched-sets of exposures related to water supply was elevated at distances of up to approximately 400 meters. Household level hygiene practices were correlated with infrastructure shown to increase cholera risk. Co-occurrence of different individual level exposures appeared to mostly reflect the differing domestic roles of study participants. Strong spatial clustering of exposures at a small spatial scale in a cholera endemic

  2. Micro-scale Spatial Clustering of Cholera Risk Factors in Urban Bangladesh

    PubMed Central

    Bi, Qifang; Azman, Andrew S.; Satter, Syed Moinuddin; Khan, Azharul Islam; Ahmed, Dilruba; Riaj, Altaf Ahmed; Gurley, Emily S.; Lessler, Justin

    2016-01-01

    Close interpersonal contact likely drives spatial clustering of cases of cholera and diarrhea, but spatial clustering of risk factors may also drive this pattern. Few studies have focused specifically on how exposures for disease cluster at small spatial scales. Improving our understanding of the micro-scale clustering of risk factors for cholera may help to target interventions and power studies with cluster designs. We selected sets of spatially matched households (matched-sets) near cholera case households between April and October 2013 in a cholera endemic urban neighborhood of Tongi Township in Bangladesh. We collected data on exposures to suspected cholera risk factors at the household and individual level. We used intra-class correlation coefficients (ICCs) to characterize clustering of exposures within matched-sets and households, and assessed if clustering depended on the geographical extent of the matched-sets. Clustering over larger spatial scales was explored by assessing the relationship between matched-sets. We also explored whether different exposures tended to appear together in individuals, households, and matched-sets. Household level exposures, including: drinking municipal supplied water (ICC = 0.97, 95%CI = 0.96, 0.98), type of latrine (ICC = 0.88, 95%CI = 0.71, 1.00), and intermittent access to drinking water (ICC = 0.96, 95%CI = 0.87, 1.00) exhibited strong clustering within matched-sets. As the geographic extent of matched-sets increased, the concordance of exposures within matched-sets decreased. Concordance between matched-sets of exposures related to water supply was elevated at distances of up to approximately 400 meters. Household level hygiene practices were correlated with infrastructure shown to increase cholera risk. Co-occurrence of different individual level exposures appeared to mostly reflect the differing domestic roles of study participants. Strong spatial clustering of exposures at a small spatial scale in a cholera endemic

  3. A hybrid monkey search algorithm for clustering analysis.

    PubMed

    Chen, Xin; Zhou, Yongquan; Luo, Qifang

    2014-01-01

    Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.

  4. Clustering of four major lifestyle risk factors among Korean adults with metabolic syndrome.

    PubMed

    Ha, Shin; Choi, Hui Ran; Lee, Yo Han

    2017-01-01

    The purpose of this study was to investigate the clustering pattern of four major lifestyle risk factors-smoking, heavy drinking, poor diet, and physical inactivity-among people with metabolic syndrome in South Korea. There were 2,469 adults with metabolic syndrome aged 30 years or older available with the 5th Korean National Health and Nutrition Examination Survey dataset. We calculated the ratio of the observed to expected (O/E) prevalence for the 16 different combinations and the prevalence odds ratios (POR) of four lifestyle risk factors. The four lifestyle risk factors tended to cluster in specific multiple combinations. Smoking and heavy drinking was clustered (POR: 1.86 for male, 4.46 for female), heavy drinking and poor diet were clustered (POR: 1.38 for male, 1.74 for female), and smoking and physical inactivity were also clustered (POR: 1.48 for male). Those who were male, younger, low-educated and living alone were much more likely to have a higher number of lifestyle risk factors. Some helpful implications can be drawn from the knowledge on clustering pattern of lifestyle risk factors for more effective intervention program targeting metabolic syndrome.

  5. Applications of cluster analysis to satellite soundings

    NASA Technical Reports Server (NTRS)

    Munteanu, M. J.; Jakubowicz, O.; Kalnay, E.; Piraino, P.

    1984-01-01

    The advantages of the use of cluster analysis in the improvement of satellite temperature retrievals were evaluated since the use of natural clusters, which are associated with atmospheric temperature soundings characteristic of different types of air masses, has the potential for improving stratified regression schemes in comparison with currently used methods which stratify soundings based on latitude, season, and land/ocean. The method of discriminatory analysis was used. The correct cluster of temperature profiles from satellite measurements was located in 85% of the cases. Considerable improvement was observed at all mandatory levels using regression retrievals derived in the clusters of temperature (weighted and nonweighted) in comparison with the control experiment and with the regression retrievals derived in the clusters of brightness temperatures of 3 MSU and 5 IR channels.

  6. Clustering of modifiable biobehavioral risk factors for chronic disease in US adults: a latent class analysis.

    PubMed

    Leventhal, Adam M; Huh, Jimi; Dunton, Genevieve F

    2014-11-01

    Examining the co-occurrence patterns of modifiable biobehavioral risk factors for deadly chronic diseases (e.g. cancer, cardiovascular disease, diabetes) can elucidate the etiology of risk factors and guide disease-prevention programming. The aims of this study were to (1) identify latent classes based on the clustering of five key biobehavioral risk factors among US adults who reported at least one risk factor and (2) explore the demographic correlates of the identified latent classes. Participants were respondents of the National Epidemiologic Survey of Alcohol and Related Conditions (2004-2005) with at least one of the following disease risk factors in the past year (N = 22,789), which were also the latent class indicators: (1) alcohol abuse/dependence, (2) drug abuse/dependence, (3) nicotine dependence, (4) obesity, and (5) physical inactivity. Housing sample units were selected to match the US National Census in location and demographic characteristics, with young adults oversampled. Participants were administered surveys by trained interviewers. Five latent classes were yielded: 'obese, active non-substance abusers' (23%); 'nicotine-dependent, active, and non-obese' (19%); 'active, non-obese alcohol abusers' (6%); 'inactive, non-substance abusers' (50%); and 'active, polysubstance abusers' (3.7%). Four classes were characterized by a 100% likelihood of having one risk factor coupled with a low or moderate likelihood of having the other four risk factors. The five classes exhibited unique demographic profiles. Risk factors may cluster together in a non-monotonic fashion, with the majority of the at-risk population of US adults expected to have a high likelihood of endorsing only one of these five risk factors. © Royal Society for Public Health 2013.

  7. A spatial cluster analysis of tractor overturns in Kentucky from 1960 to 2002

    USGS Publications Warehouse

    Saman, D.M.; Cole, H.P.; Odoi, A.; Myers, M.L.; Carey, D.I.; Westneat, S.C.

    2012-01-01

    Background: Agricultural tractor overturns without rollover protective structures are the leading cause of farm fatalities in the United States. To our knowledge, no studies have incorporated the spatial scan statistic in identifying high-risk areas for tractor overturns. The aim of this study was to determine whether tractor overturns cluster in certain parts of Kentucky and identify factors associated with tractor overturns. Methods: A spatial statistical analysis using Kulldorff's spatial scan statistic was performed to identify county clusters at greatest risk for tractor overturns. A regression analysis was then performed to identify factors associated with tractor overturns. Results: The spatial analysis revealed a cluster of higher than expected tractor overturns in four counties in northern Kentucky (RR = 2.55) and 10 counties in eastern Kentucky (RR = 1.97). Higher rates of tractor overturns were associated with steeper average percent slope of pasture land by county (p = 0.0002) and a greater percent of total tractors with less than 40 horsepower by county (p<0.0001). Conclusions: This study reveals that geographic hotspots of tractor overturns exist in Kentucky and identifies factors associated with overturns. This study provides policymakers a guide to targeted county-level interventions (e.g., roll-over protective structures promotion interventions) with the intention of reducing tractor overturns in the highest risk counties in Kentucky. ?? 2012 Saman et al.

  8. Cluster analysis of multiple planetary flow regimes

    NASA Technical Reports Server (NTRS)

    Mo, Kingtse; Ghil, Michael

    1987-01-01

    A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.

  9. Clustering of metabolic and cardiovascular risk factors in the polycystic ovary syndrome: a principal component analysis.

    PubMed

    Stuckey, Bronwyn G A; Opie, Nicole; Cussons, Andrea J; Watts, Gerald F; Burke, Valerie

    2014-08-01

    Polycystic ovary syndrome (PCOS) is a prevalent condition with heterogeneity of clinical features and cardiovascular risk factors that implies multiple aetiological factors and possible outcomes. To reduce a set of correlated variables to a smaller number of uncorrelated and interpretable factors that may delineate subgroups within PCOS or suggest pathogenetic mechanisms. We used principal component analysis (PCA) to examine the endocrine and cardiometabolic variables associated with PCOS defined by the National Institutes of Health (NIH) criteria. Data were retrieved from the database of a single clinical endocrinologist. We included women with PCOS (N = 378) who were not taking the oral contraceptive pill or other sex hormones, lipid lowering medication, metformin or other medication that could influence the variables of interest. PCA was performed retaining those factors with eigenvalues of at least 1.0. Varimax rotation was used to produce interpretable factors. We identified three principal components. In component 1, the dominant variables were homeostatic model assessment (HOMA) index, body mass index (BMI), high density lipoprotein (HDL) cholesterol and sex hormone binding globulin (SHBG); in component 2, systolic blood pressure, low density lipoprotein (LDL) cholesterol and triglycerides; in component 3, total testosterone and LH/FSH ratio. These components explained 37%, 13% and 11% of the variance in the PCOS cohort respectively. Multiple correlated variables from patients with PCOS can be reduced to three uncorrelated components characterised by insulin resistance, dyslipidaemia/hypertension or hyperandrogenaemia. Clustering of risk factors is consistent with different pathogenetic pathways within PCOS and/or differing cardiometabolic outcomes. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis.

    PubMed

    Vavougios, George D; George D, George; Pastaka, Chaido; Zarogiannis, Sotirios G; Gourgoulianis, Konstantinos I

    2016-02-01

    Phenotyping obstructive sleep apnea syndrome's comorbidity has been attempted for the first time only recently. The aim of our study was to determine phenotypes of comorbidity in obstructive sleep apnea syndrome patients employing a data-driven approach. Data from 1472 consecutive patient records were recovered from our hospital's database. Categorical principal component analysis and two-step clustering were employed to detect distinct clusters in the data. Univariate comparisons between clusters included one-way analysis of variance with Bonferroni correction and chi-square tests. Predictors of pairwise cluster membership were determined via a binary logistic regression model. The analyses revealed six distinct clusters: A, 'healthy, reporting sleeping related symptoms'; B, 'mild obstructive sleep apnea syndrome without significant comorbidities'; C1: 'moderate obstructive sleep apnea syndrome, obesity, without significant comorbidities'; C2: 'moderate obstructive sleep apnea syndrome with severe comorbidity, obesity and the exclusive inclusion of stroke'; D1: 'severe obstructive sleep apnea syndrome and obesity without comorbidity and a 33.8% prevalence of hypertension'; and D2: 'severe obstructive sleep apnea syndrome with severe comorbidities, along with the highest Epworth Sleepiness Scale score and highest body mass index'. Clusters differed significantly in apnea-hypopnea index, oxygen desaturation index; arousal index; age, body mass index, minimum oxygen saturation and daytime oxygen saturation (one-way analysis of variance P < 0.0001). Binary logistic regression indicated that older age, greater body mass index, lower daytime oxygen saturation and hypertension were associated independently with an increased risk of belonging in a comorbid cluster. Six distinct phenotypes of obstructive sleep apnea syndrome and its comorbidities were identified. Mapping the heterogeneity of the obstructive sleep apnea syndrome may help the early identification of at

  11. Geographic Clustering of Cardiometabolic Risk Factors in Metropolitan Centres in France and Australia

    PubMed Central

    Paquet, Catherine; Chaix, Basile; Howard, Natasha J.; Coffee, Neil T.; Adams, Robert J.; Taylor, Anne W.; Thomas, Frédérique; Daniel, Mark

    2016-01-01

    Understanding how health outcomes are spatially distributed represents a first step in investigating the scale and nature of environmental influences on health and has important implications for statistical power and analytic efficiency. Using Australian and French cohort data, this study aimed to describe and compare the extent of geographic variation, and the implications for analytic efficiency, across geographic units, countries and a range of cardiometabolic parameters (Body Mass Index (BMI) waist circumference, blood pressure, resting heart rate, triglycerides, cholesterol, glucose, HbA1c). Geographic clustering was assessed using Intra-Class Correlation (ICC) coefficients in biomedical cohorts from Adelaide (Australia, n = 3893) and Paris (France, n = 6430) for eight geographic administrative units. The median ICC was 0.01 suggesting 1% of risk factor variance attributable to variation between geographic units. Clustering differed by cardiometabolic parameters, administrative units and countries and was greatest for BMI and resting heart rate in the French sample, HbA1c in the Australian sample, and for smaller geographic units. Analytic inefficiency due to clustering was greatest for geographic units in which participants were nested in fewer, larger geographic units. Differences observed in geographic clustering across risk factors have implications for choice of geographic unit in sampling and analysis, and highlight potential cross-country differences in the distribution, or role, of environmental features related to cardiometabolic health. PMID:27213423

  12. Cluster analysis of multiple planetary flow regimes

    NASA Technical Reports Server (NTRS)

    Mo, Kingtse; Ghil, Michael

    1988-01-01

    A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.

  13. [Visual field progression in glaucoma: cluster analysis].

    PubMed

    Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M

    2012-11-01

    Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best

  14. Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

    PubMed

    van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

    2017-01-01

    In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.

  15. A pyrosequencing assay for the quantitative methylation analysis of the PCDHB gene cluster, the major factor in neuroblastoma methylator phenotype.

    PubMed

    Banelli, Barbara; Brigati, Claudio; Di Vinci, Angela; Casciano, Ida; Forlani, Alessandra; Borzì, Luana; Allemanni, Giorgio; Romani, Massimo

    2012-03-01

    Epigenetic alterations are hallmarks of cancer and powerful biomarkers, whose clinical utilization is made difficult by the absence of standardization and of common methods of data interpretation. The coordinate methylation of many loci in cancer is defined as 'CpG island methylator phenotype' (CIMP) and identifies clinically distinct groups of patients. In neuroblastoma (NB), CIMP is defined by a methylation signature, which includes different loci, but its predictive power on outcome is entirely recapitulated by the PCDHB cluster only. We have developed a robust and cost-effective pyrosequencing-based assay that could facilitate the clinical application of CIMP in NB. This assay permits the unbiased simultaneous amplification and sequencing of 17 out of 19 genes of the PCDHB cluster for quantitative methylation analysis, taking into account all the sequence variations. As some of these variations were at CpG doublets, we bypassed the data interpretation conducted by the methylation analysis software to assign the corrected methylation value at these sites. The final result of the assay is the mean methylation level of 17 gene fragments in the protocadherin B cluster (PCDHB) cluster. We have utilized this assay to compare the methylation levels of the PCDHB cluster between high-risk and very low-risk NB patients, confirming the predictive value of CIMP. Our results demonstrate that the pyrosequencing-based assay herein described is a powerful instrument for the analysis of this gene cluster that may simplify the data comparison between different laboratories and, in perspective, could facilitate its clinical application. Furthermore, our results demonstrate that, in principle, pyrosequencing can be efficiently utilized for the methylation analysis of gene clusters with high internal homologies.

  16. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    PubMed

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015

  17. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda†

    PubMed Central

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-01-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882

  18. Canine parvovirus in Australia: the role of socio-economic factors in disease clusters.

    PubMed

    Brady, S; Norris, J M; Kelman, M; Ward, M P

    2012-08-01

    To identify clusters of canine parvoviral related disease occurring in Australia during 2010 and investigate the role of socio-economic factors contributing to these clusters, reported cases of canine parvovirus were extracted from an on-line disease surveillance system. Reported residential postcode was used to locate cases, and clusters were identified using a scan statistic. Cases included in clusters were compared to those not included in such clusters with respect to human socioeconomic factors (postcode area relative socioeconomic disadvantage, economic resources, education and occupation) and dog factors (neuter status, breed, age, gender, vaccination status). During 2010, there were 1187 cases of canine parvovirus reported. Nineteen significant (P<0.05) disease clusters were identified, most commonly located in New South Wales. Eleven (58%) clusters occurred between April and July, and the average cluster length was 5.7 days. All clusters occurred in postcodes with a significantly (P<0.05) greater level of relative socioeconomic disadvantage and a lower rank in education and occupation, and it was noted that clustered cases were less likely to have been neutered (P=0.004). No significant difference (P>0.05) was found between cases reported from cluster postcodes and those not within clusters for dog age, gender, breed or vaccination status (although the latter needs to be interpreted with caution, since vaccination was absent in most of the cases). Further research is required to investigate the apparent association between indicators of poor socioeconomic status and clusters of reported canine parvovirus diseases; however these initial findings may be useful for developing geographically- and temporally-targeted prevention and disease control programs. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Cluster subgroups based on overall pressure pain sensitivity and psychosocial factors in chronic musculoskeletal pain: Differences in clinical outcomes.

    PubMed

    Almeida, Suzana C; George, Steven Z; Leite, Raquel D V; Oliveira, Anamaria S; Chaves, Thais C

    2018-05-17

    We aimed to empirically derive psychosocial and pain sensitivity subgroups using cluster analysis within a sample of individuals with chronic musculoskeletal pain (CMP) and to investigate derived subgroups for differences in pain and disability outcomes. Eighty female participants with CMP answered psychosocial and disability scales and were assessed for pressure pain sensitivity. A cluster analysis was used to derive subgroups, and analysis of variance (ANOVA) was used to investigate differences between subgroups. Psychosocial factors (kinesiophobia, pain catastrophizing, anxiety, and depression) and overall pressure pain threshold (PPT) were entered into the cluster analysis. Three subgroups were empirically derived: cluster 1 (high pain sensitivity and high psychosocial distress; n = 12) characterized by low overall PPT and high psychosocial scores; cluster 2 (high pain sensitivity and intermediate psychosocial distress; n = 39) characterized by low overall PPT and intermediate psychosocial scores; and cluster 3 (low pain sensitivity and low psychosocial distress; n = 29) characterized by high overall PPT and low psychosocial scores compared to the other subgroups. Cluster 1 showed higher values for mean pain intensity (F (2,77)  = 10.58, p < 0.001) compared with cluster 3, and cluster 1 showed higher values for disability (F (2,77)  = 3.81, p = 0.03) compared with both clusters 2 and 3. Only cluster 1 was distinct from cluster 3 according to both pain and disability outcomes. Pain catastrophizing, depression, and anxiety were the psychosocial variables that best differentiated the subgroups. Overall, these results call attention to the importance of considering pain sensitivity and psychosocial variables to obtain a more comprehensive characterization of CMP patients' subtypes.

  20. Nursing home care quality: a cluster analysis.

    PubMed

    Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit

    2017-02-13

    Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ 2 tests and one-way between-groups ANOVA were performed to characterise the clusters ( p<0.05). Findings Two clusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.

  1. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    PubMed Central

    Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large

  2. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  3. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    PubMed Central

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  4. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    ERIC Educational Resources Information Center

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  5. Cluster Analysis of Adolescent Blogs

    ERIC Educational Resources Information Center

    Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan

    2012-01-01

    Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…

  6. The applicability and effectiveness of cluster analysis

    NASA Technical Reports Server (NTRS)

    Ingram, D. S.; Actkinson, A. L.

    1973-01-01

    An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.

  7. Cross-scale analysis of cluster correspondence using different operational neighborhoods

    NASA Astrophysics Data System (ADS)

    Lu, Yongmei; Thill, Jean-Claude

    2008-09-01

    Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of multi-location events is examined for cluster correspondence if they are associated with the same operational neighborhood. This paper discusses the construction of operational neighborhoods for cluster correspondence analysis based on the spatial properties of the underlying zoning system and the scales at which the zones are aggregated into neighborhoods. Impacts of this construction on the degree of cluster correspondence are also analyzed. Empirical analyses of cluster correspondence between paired vehicle theft and recovery locations are conducted on different zoning methods and across a series of geographic scales and the dynamics of cluster correspondence patterns are discussed.

  8. Which modifiable health risk behaviours are related? A systematic review of the clustering of Smoking, Nutrition, Alcohol and Physical activity ('SNAP') health risk factors.

    PubMed

    Noble, Natasha; Paul, Christine; Turon, Heidi; Oldmeadow, Christopher

    2015-12-01

    There is a growing body of literature examining the clustering of health risk behaviours, but little consensus about which risk factors can be expected to cluster for which sub groups of people. This systematic review aimed to examine the international literature on the clustering of smoking, poor nutrition, excess alcohol and physical inactivity (SNAP) health behaviours among adults, including associated socio-demographic variables. A literature search was conducted in May 2014. Studies examining at least two SNAP risk factors, and using a cluster or factor analysis technique, or comparing observed to expected prevalence of risk factor combinations, were included. Fifty-six relevant studies were identified. A majority of studies (81%) reported a 'healthy' cluster characterised by the absence of any SNAP risk factors. More than half of the studies reported a clustering of alcohol with smoking, and half reported clustering of all four SNAP risk factors. The methodological quality of included studies was generally weak to moderate. Males and those with greater social disadvantage showed riskier patterns of behaviours; younger age was less clearly associated with riskier behaviours. Clustering patterns reported here reinforce the need for health promotion interventions to target multiple behaviours, and for such efforts to be specifically designed and accessible for males and those who are socially disadvantaged. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    PubMed

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  10. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    PubMed

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  11. Accident patterns for construction-related workers: a cluster analysis

    NASA Astrophysics Data System (ADS)

    Liao, Chia-Wen; Tyan, Yaw-Yauan

    2012-01-01

    The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.

  12. Accident patterns for construction-related workers: a cluster analysis

    NASA Astrophysics Data System (ADS)

    Liao, Chia-Wen; Tyan, Yaw-Yauan

    2011-12-01

    The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.

  13. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    PubMed

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  14. Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health

    PubMed Central

    Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S. Stanley

    2016-01-01

    Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability. PMID:27213413

  15. Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health.

    PubMed

    Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S Stanley

    2016-05-18

    Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability.

  16. Assessment of cluster yield components by image analysis.

    PubMed

    Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

    2015-04-01

    Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.

  17. Clustering of risk factors for chronic diseases among adolescents from Southern Brazil

    PubMed Central

    Dumith, Samuel C.; Muniz, Ludmila C.; Tassitano, Rafael M.; Hallal, Pedro C.; Menezes, Ana M.B.

    2012-01-01

    Objective To investigate the clustering of risk behaviors for chronic non-communicable diseases and their associated factors among adolescents from Southern Brazil. Methods In 2008, a survey was conducted with 3990 adolescents aged 14–15 years (mean: 14.3; SD: 0.6) from the 1993 Pelotas Birth Cohort Study. Clustering was determined by comparing observed (O) and expected (E) prevalence of all possible combinations of the four risk factors investigated (smoking, alcohol intake, low fruit intake, and physical inactivity). We carried out Poisson regression to evaluate the effect of individual characteristics on the presence of at least three risk behaviors. Results All risk factors tended to cluster together (O/E prevalence = 3.0), especially smoking and alcohol intake (odds ratio to present on behavior in the presence of other > 5.0). Approximately 15% of adolescents displayed three or more risk behaviors. Females (adjusted OR = 1.55), people 15 years and older (OR = 1.47), with black skin color (OR = 1.23), and of low socioeconomic level (OR = 1.29) were more likely to display three or more risk factors. Conclusion These findings suggest that lifestyle-related risk factors tend to cluster among adolescents. Identifying subgroups at greater risk of simultaneously engaging in multiple risk behaviors may aid in the planning of preventive strategies. PMID:22484392

  18. Advanced analysis of forest fire clustering

    NASA Astrophysics Data System (ADS)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  19. Cluster headache and the hypocretin receptor 2 reconsidered: a genetic association study and meta-analysis.

    PubMed

    Weller, Claudia M; Wilbrink, Leopoldine A; Houwing-Duistermaat, Jeanine J; Koelewijn, Stephany C; Vijfhuizen, Lisanne S; Haan, Joost; Ferrari, Michel D; Terwindt, Gisela M; van den Maagdenberg, Arn M J M; de Vries, Boukje

    2015-08-01

    Cluster headache is a severe neurological disorder with a complex genetic background. A missense single nucleotide polymorphism (rs2653349; p.Ile308Val) in the HCRTR2 gene that encodes the hypocretin receptor 2 is the only genetic factor that is reported to be associated with cluster headache in different studies. However, as there are conflicting results between studies, we re-evaluated its role in cluster headache. We performed a genetic association analysis for rs2653349 in our large Leiden University Cluster headache Analysis (LUCA) program study population. Systematic selection of the literature yielded three additional studies comprising five study populations, which were included in our meta-analysis. Data were extracted according to predefined criteria. A total of 575 cluster headache patients from our LUCA study and 874 controls were genotyped for HCRTR2 SNP rs2653349 but no significant association with cluster headache was found (odds ratio 0.91 (95% confidence intervals 0.75-1.10), p = 0.319). In contrast, the meta-analysis that included in total 1167 cluster headache cases and 1618 controls from the six study populations, which were part of four different studies, showed association of the single nucleotide polymorphism with cluster headache (random effect odds ratio 0.69 (95% confidence intervals 0.53-0.90), p = 0.006). The association became weaker, as the odds ratio increased to 0.80, when the meta-analysis was repeated without the initial single South European study with the largest effect size. Although we did not find evidence for association of rs2653349 in our LUCA study, which is the largest investigated study population thus far, our meta-analysis provides genetic evidence for a role of HCRTR2 in cluster headache. Regardless, we feel that the association should be interpreted with caution as meta-analyses with individual populations that have limited power have diminished validity. © International Headache Society 2014.

  20. Cluster analysis identifies three urodynamic patterns in patients with orthotopic neobladder reconstruction.

    PubMed

    Kim, Kwang Hyun; Yoon, Hyun Suk; Song, Wan; Choo, Hee Jung; Yoon, Hana; Chung, Woo Sik; Sim, Bong Suk; Lee, Dong Hyeon

    2017-01-01

    To classify patients with orthotopic neobladder based on urodynamic parameters using cluster analysis and to characterize the voiding function of each group. From January 2012 to November 2015, 142 patients with bladder cancer underwent radical cystectomy and Studer neobladder reconstruction at our institute. Of the 142 patients, 103 with complete urodynamic data and information on urinary functional outcomes were included in this study. K-means clustering was performed with urodynamic parameters which included maximal cystometric capacity, residual volume, maximal flow rate, compliance, and detrusor pressure at maximum flow rate. Three groups emerged by cluster analysis. Urodynamic parameters and urinary function outcomes were compared between three groups. Group 1 (n = 44) had ideal urodynamic parameters with a mean maximal bladder capacity of 513.3 ml and mean residual urine volume of 33.1 ml. Group 2 (n = 42) was characterized by small bladder capacity with low compliance. Patients in group 2 had higher rates of daytime incontinence and nighttime incontinence than patients in group 1. Group 3 (n = 17) was characterized by large residual urine volume with high compliance. When we examined gender differences in urodynamics and functional outcomes, residual urine volume and the rate of daytime incontinence were only marginally significant. However, females were significantly more likely to belong to group 2 or 3 (P = 0.003). In multivariate analysis to identify factors associated with group 1 which has the most ideal urodynamic pattern, age (OR 0.95, P = 0.017) and male gender (OR 7.57, P = 0.003) were identified as significant factors. While patients with ileal neobladder present with various voiding symptoms, three urodynamic patterns were identified by cluster analysis. Approximately half of patients had ideal urodynamic parameters. The other two groups were characterized by large residual urine and small capacity bladder with low compliance. Young age and male

  1. Cluster and constraint analysis in tetrahedron packings

    NASA Astrophysics Data System (ADS)

    Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang

    2015-04-01

    The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.

  2. How Teachers Use and Manage Their Blogs? A Cluster Analysis of Teachers' Blogs in Taiwan

    ERIC Educational Resources Information Center

    Liu, Eric Zhi-Feng; Hou, Huei-Tse

    2013-01-01

    The development of Web 2.0 has ushered in a new set of web-based tools, including blogs. This study focused on how teachers use and manage their blogs. A sample of 165 teachers' blogs in Taiwan was analyzed by factor analysis, cluster analysis and qualitative content analysis. First, the teachers' blogs were analyzed according to six criteria…

  3. ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data

    NASA Technical Reports Server (NTRS)

    Wharton, S. W.; Turner, B. J.

    1981-01-01

    An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.

  4. Alpha-cluster preformation factor within cluster-formation model for odd-A and odd-odd heavy nuclei

    NASA Astrophysics Data System (ADS)

    Saleh Ahmed, Saad M.

    2017-06-01

    The alpha-cluster probability that represents the preformation of alpha particle in alpha-decay nuclei was determined for high-intensity alpha-decay mode odd-A and odd-odd heavy nuclei, 82 < Z < 114, 111 < N < 174. This probability was calculated using the energy-dependent formula derived from the formulation of clusterisation states representation (CSR) and the hypothesised cluster-formation model (CFM) as in our previous work. Our previous successful determination of phenomenological values of alpha-cluster preformation factors for even-even nuclei motivated us to expand the work to cover other types of nuclei. The formation energy of interior alpha cluster needed to be derived for the different nuclear systems with considering the unpaired-nucleon effect. The results showed the phenomenological value of alpha preformation probability and reflected the unpaired nucleon effect and the magic and sub-magic effects in nuclei. These results and their analyses presented are very useful for future work concerning the calculation of the alpha decay constants and the progress of its theory.

  5. Clustering analysis strategies for electron energy loss spectroscopy (EELS).

    PubMed

    Torruella, Pau; Estrader, Marta; López-Ortega, Alberto; Baró, Maria Dolors; Varela, Maria; Peiró, Francesca; Estradé, Sònia

    2018-02-01

    In this work, the use of cluster analysis algorithms, widely applied in the field of big data, is proposed to explore and analyze electron energy loss spectroscopy (EELS) data sets. Three different data clustering approaches have been tested both with simulated and experimental data from Fe 3 O 4 /Mn 3 O 4 core/shell nanoparticles. The first method consists on applying data clustering directly to the acquired spectra. A second approach is to analyze spectral variance with principal component analysis (PCA) within a given data cluster. Lastly, data clustering on PCA score maps is discussed. The advantages and requirements of each approach are studied. Results demonstrate how clustering is able to recover compositional and oxidation state information from EELS data with minimal user input, giving great prospects for its usage in EEL spectroscopy. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Characterizing the course of back pain after osteoporotic vertebral fracture: a hierarchical cluster analysis of a prospective cohort study.

    PubMed

    Toyoda, Hiromitsu; Takahashi, Shinji; Hoshino, Masatoshi; Takayama, Kazushi; Iseki, Kazumichi; Sasaoka, Ryuichi; Tsujio, Tadao; Yasuda, Hiroyuki; Sasaki, Takeharu; Kanematsu, Fumiaki; Kono, Hiroshi; Nakamura, Hiroaki

    2017-09-23

    This study demonstrated four distinct patterns in the course of back pain after osteoporotic vertebral fracture (OVF). Greater angular instability in the first 6 months after the baseline was one factor affecting back pain after OVF. Understanding the natural course of symptomatic acute OVF is important in deciding the optimal treatment strategy. We used latent class analysis to classify the course of back pain after OVF and identify the risk factors associated with persistent pain. This multicenter cohort study included 218 consecutive patients with ≤ 2-week-old OVFs who were enrolled at 11 institutions. Dynamic x-rays and back pain assessment with a visual analog scale (VAS) were obtained at enrollment and at 1-, 3-, and 6-month follow-ups. The VAS scores were used to characterize patient groups, using hierarchical cluster analysis. VAS for 128 patients was used for hierarchical cluster analysis. Analysis yielded four clusters representing different patterns of back pain progression. Cluster 1 patients (50.8%) had stable, mild pain. Cluster 2 patients (21.1%) started with moderate pain and progressed quickly to very low pain. Patients in cluster 3 (10.9%) had moderate pain that initially improved but worsened after 3 months. Cluster 4 patients (17.2%) had persistent severe pain. Patients in cluster 4 showed significant high baseline pain intensity, higher degree of angular instability, and higher number of previous OVFs, and tended to lack regular exercise. In contrast, patients in cluster 2 had significantly lower baseline VAS and less angular instability. We identified four distinct groups of OVF patients with different patterns of back pain progression. Understanding the course of back pain after OVF may help in its management and contribute to future treatment trials.

  7. An effective fuzzy kernel clustering analysis approach for gene expression data.

    PubMed

    Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

    2015-01-01

    Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.

  8. Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

    PubMed

    Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

    2017-09-01

    We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians

  9. Atlas-Guided Cluster Analysis of Large Tractography Datasets

    PubMed Central

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292

  10. Atlas-guided cluster analysis of large tractography datasets.

    PubMed

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.

  11. Health and disease phenotyping in old age using a cluster network analysis.

    PubMed

    Valenzuela, Jesus Felix; Monterola, Christopher; Tong, Victor Joo Chuan; Ng, Tze Pin; Larbi, Anis

    2017-11-15

    Human ageing is a complex trait that involves the synergistic action of numerous biological processes that interact to form a complex network. Here we performed a network analysis to examine the interrelationships between physiological and psychological functions, disease, disability, quality of life, lifestyle and behavioural risk factors for ageing in a cohort of 3,270 subjects aged ≥55 years. We considered associations between numerical and categorical descriptors using effect-size measures for each variable pair and identified clusters of variables from the resulting pairwise effect-size network and minimum spanning tree. We show, by way of a correspondence analysis between the two sets of clusters, that they correspond to coarse-grained and fine-grained structure of the network relationships. The clusters obtained from the minimum spanning tree mapped to various conceptual domains and corresponded to physiological and syndromic states. Hierarchical ordering of these clusters identified six common themes based on interactions with physiological systems and common underlying substrates of age-associated morbidity and disease chronicity, functional disability, and quality of life. These findings provide a starting point for indepth analyses of ageing that incorporate immunologic, metabolomic and proteomic biomarkers, and ultimately offer low-level-based typologies of healthy and unhealthy ageing.

  12. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    ERIC Educational Resources Information Center

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  13. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    PubMed

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  14. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    PubMed

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  15. Cluster analysis: a new approach for identification of underlying risk factors for coronary artery disease in essential hypertensive patients.

    PubMed

    Guo, Qi; Lu, Xiaoni; Gao, Ya; Zhang, Jingjing; Yan, Bin; Su, Dan; Song, Anqi; Zhao, Xi; Wang, Gang

    2017-03-07

    Grading of essential hypertension according to blood pressure (BP) level may not adequately reflect clinical heterogeneity of hypertensive patients. This study was carried out to explore clinical phenotypes in essential hypertensive patients using cluster analysis. This study recruited 513 hypertensive patients and evaluated BP variations with ambulatory blood pressure monitoring. Four distinct hypertension groups were identified using cluster analysis: (1) younger male smokers with relatively high BP had the most severe carotid plaque thickness but no coronary artery disease (CAD); (2) older women with relatively low diastolic BP had more diabetes; (3) non-smokers with a low systolic BP level had neither diabetes nor CAD; (4) hypertensive patients with BP reverse dipping were most likely to have CAD but had least severe carotid plaque thickness. In binary logistic analysis, reverse dipping was significantly associated with prevalence of CAD. Cluster analysis was shown to be a feasible approach for investigating the heterogeneity of essential hypertension in clinical studies. BP reverse dipping might be valuable for prediction of CAD in hypertensive patients when compared with carotid plaque thickness. However, large-scale prospective trials with more information of plaque morphology are necessary to further compare the predicative power between BP dipping pattern and carotid plaque.

  16. Cluster analysis: a new approach for identification of underlying risk factors for coronary artery disease in essential hypertensive patients

    PubMed Central

    Guo, Qi; Lu, Xiaoni; Gao, Ya; Zhang, Jingjing; Yan, Bin; Su, Dan; Song, Anqi; Zhao, Xi; Wang, Gang

    2017-01-01

    Grading of essential hypertension according to blood pressure (BP) level may not adequately reflect clinical heterogeneity of hypertensive patients. This study was carried out to explore clinical phenotypes in essential hypertensive patients using cluster analysis. This study recruited 513 hypertensive patients and evaluated BP variations with ambulatory blood pressure monitoring. Four distinct hypertension groups were identified using cluster analysis: (1) younger male smokers with relatively high BP had the most severe carotid plaque thickness but no coronary artery disease (CAD); (2) older women with relatively low diastolic BP had more diabetes; (3) non-smokers with a low systolic BP level had neither diabetes nor CAD; (4) hypertensive patients with BP reverse dipping were most likely to have CAD but had least severe carotid plaque thickness. In binary logistic analysis, reverse dipping was significantly associated with prevalence of CAD. Cluster analysis was shown to be a feasible approach for investigating the heterogeneity of essential hypertension in clinical studies. BP reverse dipping might be valuable for prediction of CAD in hypertensive patients when compared with carotid plaque thickness. However, large-scale prospective trials with more information of plaque morphology are necessary to further compare the predicative power between BP dipping pattern and carotid plaque. PMID:28266630

  17. Is It Feasible to Identify Natural Clusters of TSC-Associated Neuropsychiatric Disorders (TAND)?

    PubMed

    Leclezio, Loren; Gardner-Lubbe, Sugnet; de Vries, Petrus J

    2018-04-01

    Tuberous sclerosis complex (TSC) is a genetic disorder with multisystem involvement. The lifetime prevalence of TSC-Associated Neuropsychiatric Disorders (TAND) is in the region of 90% in an apparently unique, individual pattern. This "uniqueness" poses significant challenges for diagnosis, psycho-education, and intervention planning. To date, no studies have explored whether there may be natural clusters of TAND. The purpose of this feasibility study was (1) to investigate the practicability of identifying natural TAND clusters, and (2) to identify appropriate multivariate data analysis techniques for larger-scale studies. TAND Checklist data were collected from 56 individuals with a clinical diagnosis of TSC (n = 20 from South Africa; n = 36 from Australia). Using R, the open-source statistical platform, mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Ward's method rendered six TAND clusters with good face validity and significant convergence with a six-factor exploratory factor analysis solution. The "bottom-up" data-driven strategies identified a "scholastic" cluster of TAND manifestations, an "autism spectrum disorder-like" cluster, a "dysregulated behavior" cluster, a "neuropsychological" cluster, a "hyperactive/impulsive" cluster, and a "mixed/mood" cluster. These feasibility results suggest that a combination of cluster analysis and exploratory factor analysis methods may be able to identify clinically meaningful natural TAND clusters. Findings require replication and expansion in larger dataset, and could include quantification of cluster or factor scores at an individual level. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Spatiotemporal Clustering Analysis of Malaria Infection in Pakistan.

    PubMed

    Umer, Muhammad Farooq; Zofeen, Shumaila; Majeed, Abdul; Hu, Wenbiao; Qi, Xin; Zhuang, Guihua

    2018-06-07

    Despite tremendous progress, malaria remains a serious public health problem in Pakistan. Very few studies have been done on spatiotemporal evaluation of malaria infection in Pakistan. The study aimed to detect the spatiotemporal pattern of malaria infection at the district level in Pakistan, and to identify the clusters of high-risk disease areas in the country. Annual data on malaria for two dominant species ( Plasmodium falciparum , Plasmodium vivax ) and mixed infections from 2011 to 2016 were obtained from the Directorate of Malaria Control Program, Pakistan. Population data were collected from the Pakistan Bureau of Statistics. A geographical information system was used to display the spatial distribution of malaria at the district level throughout Pakistan. Purely spatiotemporal clustering analysis was performed to identify the high-risk areas of malaria infection in Pakistan. A total of 1,593,409 positive cases were included in this study over a period of 6 years (2011⁻2016). The maximum number of P . vivax cases (474,478) were reported in Khyber Pakhtunkhwa (KPK). The highest burden of P . falciparum (145,445) was in Balochistan, while the highest counts of mixed Plasmodium cases were reported in Sindh (22,421) and Balochistan (22,229), respectively. In Balochistan, incidence of all three types of malaria was very high. Cluster analysis showed that primary clusters of P . vivax malaria were in the same districts in 2014, 2015 and 2016 (total 24 districts, 12 in Federally Administered Tribal Areas (FATA), 9 in KPK, 2 in Punjab and 1 in Balochistan); those of P . falciparum malaria were unchanged in 2012 and 2013 (total 18 districts, all in Balochistan), and mixed infections remained the same in 2014 and 2015 (total 7 districts, 6 in Balochistan and 1 in FATA). This study indicated that the transmission cycles of malaria infection vary in different spatiotemporal settings in Pakistan. Efforts in controlling P . vivax malaria in particular need to be

  19. Cluster analysis and subgrouping to investigate inter-individual variability to non-invasive brain stimulation: a systematic review.

    PubMed

    Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour

    2018-01-12

    Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.

  20. Cluster analysis for determining distribution center location

    NASA Astrophysics Data System (ADS)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  1. Using cluster analysis for medical resource decision making.

    PubMed

    Dilts, D; Khamalah, J; Plotkin, A

    1995-01-01

    Escalating costs of health care delivery have in the recent past often made the health care industry investigate, adapt, and apply those management techniques relating to budgeting, resource control, and forecasting that have long been used in the manufacturing sector. A strategy that has contributed much in this direction is the definition and classification of a hospital's output into "products" or groups of patients that impose similar resource or cost demands on the hospital. Existing classification schemes have frequently employed cluster analysis in generating these groupings. Unfortunately, the myriad articles and books on clustering and classification contain few formalized selection methodologies for choosing a technique for solving a particular problem, hence they often leave the novice investigator at a loss. This paper reviews the literature on clustering, particularly as it has been applied in the medical resource-utilization domain, addresses the critical choices facing an investigator in the medical field using cluster analysis, and offers suggestions (using the example of clustering low-vision patients) for how such choices can be made.

  2. Body Mass Index, Waist Circumference, and the Clustering of Cardiometabolic Risk Factors in Early Childhood.

    PubMed

    Anderson, Laura N; Lebovic, Gerald; Hamilton, Jill; Hanley, Anthony J; McCrindle, Brian W; Maguire, Jonathon L; Parkin, Patricia C; Birken, Catherine S

    2016-03-01

    Obesity has its origins in early childhood; however, there is limited evidence of the association between anthropometric indicators and cardiometabolic risk factors in young children. Our aim was to evaluate the associations between body mass index (BMI) and waist circumference (WC) in relation to cardiometabolic risk factors and to explore the clustering of these factors. A cross-sectional study was conducted in children aged 1-5 years through TARGet Kids! (n = 2917). Logistic regression was used to evaluate associations between BMI and WC z-scores and individual traditional and possible non-traditional cardiometabolic risk factors. The underlying clustering of these measures was evaluated using principal components analysis (PCA). Child obesity (BMI z-score >2) was associated with high (>90th percentile) leptin [odds ratio (OR) 8.15, 95% confidence interval (CI) 4.56, 14.58] and insulin (OR = 1.76; 95% CI 1.05, 2.94). WC z-score >1 was associated with high insulin (OR 1.59, 95% CI 1.11, 2.28), leptin (OR 5.48, 95% CI 3.48, 8.63) and 25-hydroxyvitamin D < 75 nmol/L (OR 1.39, 95% CI 1.08, 1.79). BMI and WC were not associated with other traditional cardiometabolic risk factors, including non-High Density Lipoprotein (HDL) cholesterol, and glucose. Among children 3-5 years (n = 1035) the PCA of traditional risk factors identified three components: adiposity/blood pressure, metabolic, and lipids. The inclusion of non-traditional risk factors identified four additional components but contributed minimally to the total variation explained. Anthropometric indicators are associated with selected cardiometabolic risk factors in early childhood, although the clustering of risk factors suggests that adiposity is only one distinct component of cardiometabolic risk. The measurement of other risk factors beyond BMI and WC may be important in defining cardiometabolic risk in early childhood. © 2015 John Wiley & Sons Ltd.

  3. Detecting Outliers in Factor Analysis Using the Forward Search Algorithm

    ERIC Educational Resources Information Center

    Mavridis, Dimitris; Moustaki, Irini

    2008-01-01

    In this article we extend and implement the forward search algorithm for identifying atypical subjects/observations in factor analysis models. The forward search has been mainly developed for detecting aberrant observations in regression models (Atkinson, 1994) and in multivariate methods such as cluster and discriminant analysis (Atkinson, Riani,…

  4. Cluster Analysis of Minnesota School Districts. A Research Report.

    ERIC Educational Resources Information Center

    Cleary, James

    The term "cluster analysis" refers to a set of statistical methods that classify entities with similar profiles of scores on a number of measured dimensions, in order to create empirically based typologies. A 1980 Minnesota House Research Report employed cluster analysis to categorize school districts according to their relative mixtures…

  5. Life history factors, personality and the social clustering of sexual experience in adolescents.

    PubMed

    van Leeuwen, Abram J; Mace, Ruth

    2016-10-01

    Adolescent sexual behaviour may show clustering in neighbourhoods, schools and friendship networks. This study aims to assess how experience with sexual intercourse clusters across the social world of adolescents and whether predictors implicated by life history theory or personality traits can account for its between-individual variation and social patterning. Using data on 2877 adolescents from the Avon Longitudinal Study of Parents and Children, we ran logistic multiple classification models to assess the clustering of sexual experience by approximately 17.5 years in schools, neighbourhoods and friendship networks. We examined how much clustering at particular levels could be accounted for by life history predictors and Big Five personality factors. Sexual experience exhibited substantial clustering in friendship networks, while clustering at the level of schools and neighbourhoods was minimal, suggesting a limited role for socio-ecological influences at those levels. While life history predictors did account for some variation in sexual experience, they did not explain clustering in friendship networks. Personality, especially extraversion, explained about a quarter of friends' similarity. After accounting for life history factors and personality, substantial unexplained similarity among friends remained, which may reflect a tendency to associate with similar individuals or the social transmission of behavioural norms.

  6. Life history factors, personality and the social clustering of sexual experience in adolescents

    PubMed Central

    2016-01-01

    Adolescent sexual behaviour may show clustering in neighbourhoods, schools and friendship networks. This study aims to assess how experience with sexual intercourse clusters across the social world of adolescents and whether predictors implicated by life history theory or personality traits can account for its between-individual variation and social patterning. Using data on 2877 adolescents from the Avon Longitudinal Study of Parents and Children, we ran logistic multiple classification models to assess the clustering of sexual experience by approximately 17.5 years in schools, neighbourhoods and friendship networks. We examined how much clustering at particular levels could be accounted for by life history predictors and Big Five personality factors. Sexual experience exhibited substantial clustering in friendship networks, while clustering at the level of schools and neighbourhoods was minimal, suggesting a limited role for socio-ecological influences at those levels. While life history predictors did account for some variation in sexual experience, they did not explain clustering in friendship networks. Personality, especially extraversion, explained about a quarter of friends' similarity. After accounting for life history factors and personality, substantial unexplained similarity among friends remained, which may reflect a tendency to associate with similar individuals or the social transmission of behavioural norms. PMID:27853543

  7. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    PubMed

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  8. A New Classification of Diabetic Gait Pattern Based on Cluster Analysis of Biomechanical Data

    PubMed Central

    Sawacha, Zimi; Guarneri, Gabriella; Avogaro, Angelo; Cobelli, Claudio

    2010-01-01

    Background The diabetic foot, one of the most serious complications of diabetes mellitus and a major risk factor for plantar ulceration, is determined mainly by peripheral neuropathy. Neuropathic patients exhibit decreased stability while standing as well as during dynamic conditions. A new methodology for diabetic gait pattern classification based on cluster analysis has been proposed that aims to identify groups of subjects with similar patterns of gait and verify if three-dimensional gait data are able to distinguish diabetic gait patterns from one of the control subjects. Method The gait of 20 nondiabetic individuals and 46 diabetes patients with and without peripheral neuropathy was analyzed [mean age 59.0 (2.9) and 61.1(4.4) years, mean body mass index (BMI) 24.0 (2.8), and 26.3 (2.0)]. K-means cluster analysis was applied to classify the subjects' gait patterns through the analysis of their ground reaction forces, joints and segments (trunk, hip, knee, ankle) angles, and moments. Results Cluster analysis classification led to definition of four well-separated clusters: one aggregating just neuropathic subjects, one aggregating both neuropathics and non-neuropathics, one including only diabetes patients, and one including either controls or diabetic and neuropathic subjects. Conclusions Cluster analysis was useful in grouping subjects with similar gait patterns and provided evidence that there were subgroups that might otherwise not be observed if a group ensemble was presented for any specific variable. In particular, we observed the presence of neuropathic subjects with a gait similar to the controls and diabetes patients with a long disease duration with a gait as altered as the neuropathic one. PMID:20920432

  9. Geographic atrophy phenotype identification by cluster analysis.

    PubMed

    Monés, Jordi; Biarnés, Marc

    2018-03-01

    To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm 2 /year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  10. Cluster analysis of the hot subdwarfs in the PG survey

    NASA Technical Reports Server (NTRS)

    Thejll, Peter; Charache, Darryl; Shipman, Harry L.

    1989-01-01

    Application of cluster analysis to the hot subdwarfs in the Palomar Green (PG) survey of faint blue high-Galactic-latitude objects is assessed, with emphasis on data noise and the number of clusters to subdivide the data into. The data used in the study are presented, and cluster analysis, using the CLUSTAN program, is applied to it. Distances are calculated using the Euclidean formula, and clustering is done by Ward's method. The results are discussed, and five groups representing natural divisions of the subdwarfs in the PG survey are presented.

  11. A Survey of Popular R Packages for Cluster Analysis

    ERIC Educational Resources Information Center

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  12. Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

    PubMed

    Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

    2017-05-01

    The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.

  13. A weak lensing analysis of the PLCK G100.2-30.4 cluster

    NASA Astrophysics Data System (ADS)

    Radovich, M.; Formicola, I.; Meneghetti, M.; Bartalucci, I.; Bourdin, H.; Mazzotta, P.; Moscardini, L.; Ettori, S.; Arnaud, M.; Pratt, G. W.; Aghanim, N.; Dahle, H.; Douspis, M.; Pointecouteau, E.; Grado, A.

    2015-07-01

    We present a mass estimate of the Planck-discovered cluster PLCK G100.2-30.4, derived from a weak lensing analysis of deep Subaru griz images. We perform a careful selection of the background galaxies using the multi-band imaging data, and undertake the weak lensing analysis on the deep (1 h) r -band image. The shape measurement is based on the Kaiser-Squires-Broadhurst algorithm; we adopt the PSFex software to model the point spread function (PSF) across the field and correct for this in the shape measurement. The weak lensing analysis is validated through extensive image simulations. We compare the resulting weak lensing mass profile and total mass estimate to those obtained from our re-analysis of XMM-Newton observations, derived under the hypothesis of hydrostatic equilibrium. The total integrated mass profiles agree remarkably well, within 1σ across their common radial range. A mass M500 ~ 7 × 1014M⊙ is derived for the cluster from our weak lensing analysis. Comparing this value to that obtained from our reanalysis of XMM-Newton data, we obtain a bias factor of (1-b) = 0.8 ± 0.1. This is compatible within 1σ with the value of (1-b) obtained in Planck 2015 from the calibration of the bias factor using newly available weak lensing reconstructed masses. Based on data collected at Subaru Telescope (University of Tokyo).

  14. [Study of the clinical phenotype of symptomatic chronic airways disease by hierarchical cluster analysis and two-step cluster analyses].

    PubMed

    Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M

    2016-09-01

    To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire

  15. Differences in Pedaling Technique in Cycling: A Cluster Analysis.

    PubMed

    Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A

    2016-10-01

    To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (PO VT2 ) compared with cycling at their maximal power output (PO MAX ). Twenty athletes performed an incremental cycling test to determine their power output (PO MAX and PO VT2 ; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in PO MAX and PO VT2 . Athletes were assigned to 2 clusters based on the behavior of outcome variables at PO VT2 and PO MAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at PO VT2 vs PO MAX , cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.

  16. Salient concerns in using analgesia for cancer pain among outpatients: A cluster analysis study.

    PubMed

    Meghani, Salimah H; Knafl, George J

    2017-02-10

    To identify unique clusters of patients based on their concerns in using analgesia for cancer pain and predictors of the cluster membership. This was a 3-mo prospective observational study ( n = 207). Patients were included if they were adults (≥ 18 years), diagnosed with solid tumors or multiple myelomas, and had at least one prescription of around-the-clock pain medication for cancer or cancer-treatment-related pain. Patients were recruited from two outpatient medical oncology clinics within a large health system in Philadelphia. A choice-based conjoint (CBC) analysis experiment was used to elicit analgesic treatment preferences (utilities). Patients employed trade-offs based on five analgesic attributes (percent relief from analgesics, type of analgesic, type of side-effects, severity of side-effects, out of pocket cost). Patients were clustered based on CBC utilities using novel adaptive statistical methods. Multiple logistic regression was used to identify predictors of cluster membership. The analyses found 4 unique clusters: Most patients made trade-offs based on the expectation of pain relief (cluster 1, 41%). For a subset, the main underlying concern was type of analgesic prescribed, i.e ., opioid vs non-opioid (cluster 2, 11%) and type of analgesic side effects (cluster 4, 21%), respectively. About one in four made trade-offs based on multiple concerns simultaneously including pain relief, type of side effects, and severity of side effects (cluster 3, 28%). In multivariable analysis, to identify predictors of cluster membership, clinical and socioeconomic factors (education, health literacy, income, social support) rather than analgesic attitudes and beliefs were found important; only the belief, i.e ., pain medications can mask changes in health or keep you from knowing what is going on in your body was found significant in predicting two of the four clusters [cluster 1 (-); cluster 4 (+)]. Most patients appear to be driven by a single salient concern

  17. A Cluster Analysis Typology of Suicide in the United States Air Force

    DTIC Science & Technology

    2011-08-01

    theorists such as Sigmund Freud , Edwin Shneidman and other suicidologists. Typologies: Psychoanalytic. In contrast to Durkheim’s theory and its emphasis...on societal factors, many early psychological explanations of suicide were rooted in Sigmund Freud’s psychodynamic theory. Freud wrote of two...The basic writings of Sigmund Freud (pp. 35-178). New York: Random House. Garson, G. D. (2009). Cluster analysis. Retrieved November, 1, 2009 from

  18. Phylogenetic Factor Analysis.

    PubMed

    Tolkoff, Max R; Alfaro, Michael E; Baele, Guy; Lemey, Philippe; Suchard, Marc A

    2018-05-01

    Phylogenetic comparative methods explore the relationships between quantitative traits adjusting for shared evolutionary history. This adjustment often occurs through a Brownian diffusion process along the branches of the phylogeny that generates model residuals or the traits themselves. For high-dimensional traits, inferring all pair-wise correlations within the multivariate diffusion is limiting. To circumvent this problem, we propose phylogenetic factor analysis (PFA) that assumes a small unknown number of independent evolutionary factors arise along the phylogeny and these factors generate clusters of dependent traits. Set in a Bayesian framework, PFA provides measures of uncertainty on the factor number and groupings, combines both continuous and discrete traits, integrates over missing measurements and incorporates phylogenetic uncertainty with the help of molecular sequences. We develop Gibbs samplers based on dynamic programming to estimate the PFA posterior distribution, over 3-fold faster than for multivariate diffusion and a further order-of-magnitude more efficiently in the presence of latent traits. We further propose a novel marginal likelihood estimator for previously impractical models with discrete data and find that PFA also provides a better fit than multivariate diffusion in evolutionary questions in columbine flower development, placental reproduction transitions and triggerfish fin morphometry.

  19. Network Analysis Tools: from biological networks to clusters and pathways.

    PubMed

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

    2008-01-01

    Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.

  20. Performance analysis of clustering techniques over microarray data: A case study

    NASA Astrophysics Data System (ADS)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  1. OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.

    PubMed

    Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A

    2014-10-16

    The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P <0.0001) among the four clusters. The four clusters divided the sample into severity levels: Cluster 1 reflects the lowest average levels across all symptoms, and cluster 4 reflects the highest average levels. Clusters 2 and 3 capture moderate symptoms levels. Clusters 2 and 3 differed mainly in profiles of anxiety and depression, with Cluster 2 having lower levels of depression and anxiety than Cluster 3, despite higher levels of pain. The results of the cluster analysis of the external sample (n = 478) looked very similar to those found in the original cluster analysis, except for a slight difference in sleep problems. This was despite having patients in the validation sample who were significantly younger (P <0.0001) and had more severe symptoms (higher FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a

  2. Somatosensory nociceptive characteristics differentiate subgroups in people with chronic low back pain: a cluster analysis.

    PubMed

    Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne

    2015-10-01

    The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking <300 minutes per week of moderate activity was significantly greater in cluster 1 than in clusters 2 and 3. Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups.

  3. Cluster analysis of Southeastern U.S. climate stations

    NASA Astrophysics Data System (ADS)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  4. Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.

    PubMed

    Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut

    2015-03-01

    Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland.

    PubMed

    Sasidharan, Lekshmi; Wu, Kun-Feng; Menendez, Monica

    2015-12-01

    One of the major challenges in traffic safety analyses is the heterogeneous nature of safety data, due to the sundry factors involved in it. This heterogeneity often leads to difficulties in interpreting results and conclusions due to unrevealed relationships. Understanding the underlying relationship between injury severities and influential factors is critical for the selection of appropriate safety countermeasures. A method commonly employed to address systematic heterogeneity is to focus on any subgroup of data based on the research purpose. However, this need not ensure homogeneity in the data. In this paper, latent class cluster analysis is applied to identify homogenous subgroups for a specific crash type-pedestrian crashes. The manuscript employs data from police reported pedestrian (2009-2012) crashes in Switzerland. The analyses demonstrate that dividing pedestrian severity data into seven clusters helps in reducing the systematic heterogeneity of the data and to understand the hidden relationships between crash severity levels and socio-demographic, environmental, vehicle, temporal, traffic factors, and main reason for the crash. The pedestrian crash injury severity models were developed for the whole data and individual clusters, and were compared using receiver operating characteristics curve, for which results favored clustering. Overall, the study suggests that latent class clustered regression approach is suitable for reducing heterogeneity and revealing important hidden relationships in traffic safety analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. A generalized analysis of hydrophobic and loop clusters within globular protein sequences

    PubMed Central

    Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

    2007-01-01

    Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The

  7. The Psychology of Yoga Practitioners: A Cluster Analysis.

    PubMed

    Genovese, Jeremy E C; Fondran, Kristine M

    2017-11-01

    Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall -Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.

  8. The Psychology of Yoga Practitioners: A Cluster Analysis.

    PubMed

    Genovese, Jeremy E C; Fondran, Kristine M

    2017-03-30

    Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall-Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.

  9. Cluster-based exposure variation analysis

    PubMed Central

    2013-01-01

    Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate

  10. Cluster analysis and prediction of treatment outcomes for chronic rhinosinusitis.

    PubMed

    Soler, Zachary M; Hyer, J Madison; Rudmik, Luke; Ramakrishnan, Viswanathan; Smith, Timothy L; Schlosser, Rodney J

    2016-04-01

    Current clinical classifications of chronic rhinosinusitis (CRS) have weak prognostic utility regarding treatment outcomes. Simplified discriminant analysis based on unsupervised clustering has identified novel phenotypic subgroups of CRS, but prognostic utility is unknown. We sought to determine whether discriminant analysis allows prognostication in patients choosing surgery versus continued medical management. A multi-institutional prospective study of patients with CRS in whom initial medical therapy failed who then self-selected continued medical management or surgical treatment was used to separate patients into 5 clusters based on a previously described discriminant analysis using total Sino-Nasal Outcome Test-22 (SNOT-22) score, age, and missed productivity. Patients completed the SNOT-22 at baseline and for 18 months of follow-up. Baseline demographic and objective measures included olfactory testing, computed tomography, and endoscopy scoring. SNOT-22 outcomes for surgical versus continued medical treatment were compared across clusters. Data were available on 690 patients. Baseline differences in demographics, comorbidities, objective disease measures, and patient-reported outcomes were similar to previous clustering reports. Three of 5 clusters identified by means of discriminant analysis had improved SNOT-22 outcomes with surgical intervention when compared with continued medical management (surgery was a mean of 21.2 points better across these 3 clusters at 6 months, P < .05). These differences were sustained at 18 months of follow-up. Two of 5 clusters had similar outcomes when comparing surgery with continued medical management. A simplified discriminant analysis based on 3 common clinical variables is able to cluster patients and provide prognostic information regarding surgical treatment versus continued medical management in patients with CRS. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All

  11. The Amish furniture cluster in Ohio: competitive factors and wood use estimates

    Treesearch

    Matthew Bumgardner; Robert Romig; William Luppold

    2008-01-01

    This paper is an assessment of wood use by the Amish furniture cluster located in northeastern Ohio. The paper also highlights the competitive and demographic factors that have enabled cluster growth and new business formation in a time of declining market share for the overall U.S. furniture industry. Several secondary information sources and discussions with local...

  12. Systematization of actinides using cluster analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kopyrin, A.A.; Terent`eva, T.N.; Khramov, N.N.

    1994-11-01

    A representation of the actinides in multidimensional property space is proposed for systematization of these elements using cluster analysis. Literature data for their atomic properties are used. Owing to the wide variation of published ionization potentials, medians are used to estimate them. Vertical dendograms are used for classification on the basis of distances between the actinides in atomic-property space. The properties of actinium and lawrencium are furthest removed from the main group. Thorium and mendelevium exhibit individualized properties. A cluster based on the einsteinium-fermium pair is joined by californium.

  13. A Multivariate Analysis of Galaxy Cluster Properties

    NASA Astrophysics Data System (ADS)

    Ogle, P. M.; Djorgovski, S.

    1993-05-01

    We have assembled from the literature a data base on on 394 clusters of galaxies, with up to 16 parameters per cluster. They include optical and x-ray luminosities, x-ray temperatures, galaxy velocity dispersions, central galaxy and particle densities, optical and x-ray core radii and ellipticities, etc. In addition, derived quantities, such as the mass-to-light ratios and x-ray gas masses are included. Doubtful measurements have been identified, and deleted from the data base. Our goal is to explore the correlations between these parameters, and interpret them in the framework of our understanding of evolution of clusters and large-scale structure, such as the Gott-Rees scaling hierarchy. Among the simple, monovariate correlations we found, the most significant include those between the optical and x-ray luminosities, x-ray temperatures, cluster velocity dispersions, and central galaxy densities, in various mutual combinations. While some of these correlations have been discussed previously in the literature, generally smaller samples of objects have been used. We will also present the results of a multivariate statistical analysis of the data, including a principal component analysis (PCA). Such an approach has not been used previously for studies of cluster properties, even though it is much more powerful and complete than the simple monovariate techniques which are commonly employed. The observed correlations may lead to powerful constraints for theoretical models of formation and evolution of galaxy clusters. P.M.O. was supported by a Caltech graduate fellowship. S.D. acknowledges a partial support from the NASA contract NAS5-31348 and the NSF PYI award AST-9157412.

  14. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    NASA Astrophysics Data System (ADS)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  15. Topic modeling for cluster analysis of large biological and medical datasets

    PubMed Central

    2014-01-01

    Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than

  16. Topic modeling for cluster analysis of large biological and medical datasets.

    PubMed

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  17. Cluster-cluster clustering

    NASA Technical Reports Server (NTRS)

    Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.

    1985-01-01

    The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.

  18. Multiscale visual quality assessment for cluster analysis with self-organizing maps

    NASA Astrophysics Data System (ADS)

    Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias

    2011-01-01

    Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.

  19. Analyzing the Role of Community and Individual Factors in Food Insecurity: Identifying Diverse Barriers Across Clustered Community Members.

    PubMed

    Jablonski, Becca B R; McFadden, Dawn Thilmany; Colpaart, Ashley

    2016-10-01

    This paper uses the results from a community food security assessment survey of 684 residents and three focus groups in Pueblo County, Colorado to examine the question: what community and individual factors contribute to or alleviate food insecurity, and are these factors consistent throughout a sub-county population. Importantly, we use a technique called cluster analysis to endogenously determine the key factors pertinent to food access and fruit and vegetable consumption. Our results show significant heterogeneity among sub-population clusters in terms of the community and individual factors that would make it easier to get access to fruits and vegetables. We find two distinct clusters of food insecure populations: the first was significantly less likely to identify increased access to fruits and vegetables proximate to where they live or work as a way to improve their household's healthy food consumption despite being significantly less likely to utilize a personal vehicle to get to the store; the second group did not report significant challenges with access, rather with affordability. We conclude that though interventions focused on improving the local food retail environment may be important for some subsamples of the food insecure population, it is unclear that proximity to a store with healthy food will support enhanced food security for all. We recommend that future research recognizes that determinants of food insecurity may vary within county or zip code level regions, and that multiple interventions that target sub-population clusters may elicit better improvements in access to and consumption of fruits and vegetables.

  20. Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.

    PubMed

    Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B

    2017-11-01

    Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  1. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    PubMed Central

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  2. Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh

    2013-11-30

    Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less

  3. Patterns in longitudinal growth of refraction in Southern Chinese children: cluster and principal component analysis.

    PubMed

    Chen, Yanxian; Chang, Billy Heung Wing; Ding, Xiaohu; He, Mingguang

    2016-11-22

    In the present study we attempt to use hypothesis-independent analysis in investigating the patterns in refraction growth in Chinese children, and to explore the possible risk factors affecting the different components of progression, as defined by Principal Component Analysis (PCA). A total of 637 first-born twins in Guangzhou Twin Eye Study with 6-year annual visits (baseline age 7-15 years) were available in the analysis. Cluster 1 to 3 were classified after a partitioning clustering, representing stable, slow and fast progressing groups of refraction respectively. Baseline age and refraction, paternal refraction, maternal refraction and proportion of two myopic parents showed significant differences across the three groups. Three major components of progression were extracted using PCA: "Average refraction", "Acceleration" and the combination of "Myopia stabilization" and "Late onset of refraction progress". In regression models, younger children with more severe myopia were associated with larger "Acceleration". The risk factors of "Acceleration" included change of height and weight, near work, and parental myopia, while female gender, change of height and weight were associated with "Stabilization", and increased outdoor time was related to "Late onset of refraction progress". We therefore concluded that genetic and environmental risk factors have different impacts on patterns of refraction progression.

  4. Patterns in longitudinal growth of refraction in Southern Chinese children: cluster and principal component analysis

    PubMed Central

    Chen, Yanxian; Chang, Billy Heung Wing; Ding, Xiaohu; He, Mingguang

    2016-01-01

    In the present study we attempt to use hypothesis-independent analysis in investigating the patterns in refraction growth in Chinese children, and to explore the possible risk factors affecting the different components of progression, as defined by Principal Component Analysis (PCA). A total of 637 first-born twins in Guangzhou Twin Eye Study with 6-year annual visits (baseline age 7–15 years) were available in the analysis. Cluster 1 to 3 were classified after a partitioning clustering, representing stable, slow and fast progressing groups of refraction respectively. Baseline age and refraction, paternal refraction, maternal refraction and proportion of two myopic parents showed significant differences across the three groups. Three major components of progression were extracted using PCA: “Average refraction”, “Acceleration” and the combination of “Myopia stabilization” and “Late onset of refraction progress”. In regression models, younger children with more severe myopia were associated with larger “Acceleration”. The risk factors of “Acceleration” included change of height and weight, near work, and parental myopia, while female gender, change of height and weight were associated with “Stabilization”, and increased outdoor time was related to “Late onset of refraction progress”. We therefore concluded that genetic and environmental risk factors have different impacts on patterns of refraction progression. PMID:27874105

  5. Characterization of the diffusion of epidermal growth factor receptor clusters by single particle tracking.

    PubMed

    Boggara, Mohan; Athmakuri, Krishna; Srivastava, Sunit; Cole, Richard; Kane, Ravi S

    2013-02-01

    A number of studies have shown that receptors of the epidermal growth factor receptor family (ErbBs) exist as higher-order oligomers (clusters) in cell membranes in addition to their monomeric and dimeric forms. Characterizing the lateral diffusion of such clusters may provide insights into their dynamics and help elucidate their functional relevance. To that end, we used single particle tracking to study the diffusion of clusters of the epidermal growth factor (EGF) receptor (EGFR; ErbB1) containing bound fluorescently-labeled ligand, EGF. EGFR clusters had a median diffusivity of 6.8×10(-11)cm(2)/s and were found to exhibit different modes of transport (immobile, simple, confined, and directed) similar to that previously reported for single EGFR molecules. Disruption of actin filaments increased the median diffusivity of EGFR clusters to 10.3×10(-11)cm(2)/s, while preserving the different modes of diffusion. Interestingly, disruption of microtubules rendered EGFR clusters nearly immobile. Our data suggests that microtubules may play an important role in the diffusion of EGFR clusters either directly or perhaps indirectly via other mechanisms. To our knowledge, this is the first report probing the effect of the cytoskeleton on the diffusion of EGFR clusters in the membranes of live cells. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis

    USGS Publications Warehouse

    McKenna, J.E.

    2003-01-01

    The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.

  7. Population changes in residential clusters in Japan.

    PubMed

    Sekiguchi, Takuya; Tamura, Kohei; Masuda, Naoki

    2018-01-01

    Population dynamics in urban and rural areas are different. Understanding factors that contribute to local population changes has various socioeconomic and political implications. In the present study, we use population census data in Japan to examine contributors to the population growth of residential clusters between years 2005 and 2010. The data set covers the entirety of Japan and has a high spatial resolution of 500 × 500 m2, enabling us to examine population dynamics in various parts of the country (urban and rural) using statistical analysis. We found that, in addition to the area, population density, and age, the shape of the cluster and the spatial distribution of inhabitants within the cluster are significantly related to the population growth rate of a residential cluster. Specifically, the population tends to grow if the cluster is "round" shaped (given the area) and the population is concentrated near the center rather than periphery of the cluster. Combination of the present results and analysis framework with other factors that have been omitted in the present study, such as migration, terrain, and transportation infrastructure, will be fruitful.

  8. Development of small scale cluster computer for numerical analysis

    NASA Astrophysics Data System (ADS)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  9. Autism spectrum disorder in Down syndrome: cluster analysis of Aberrant Behaviour Checklist data supports diagnosis.

    PubMed

    Ji, N Y; Capone, G T; Kaufmann, W E

    2011-11-01

    The diagnostic validity of autism spectrum disorder (ASD) based on Diagnostic and Statistical Manual of Mental Disorders (DSM) has been challenged in Down syndrome (DS), because of the high prevalence of cognitive impairments in this population. Therefore, we attempted to validate DSM-based diagnoses via an unbiased categorisation of participants with a DSM-independent behavioural instrument. Based on scores on the Aberrant Behaviour Checklist - Community, we performed sequential factor (four DS-relevant factors: Autism-Like Behaviour, Disruptive Behaviour, Hyperactivity, Self-Injury) and cluster analyses on a 293-participant paediatric DS clinic cohort. The four resulting clusters were compared with DSM-delineated groups: DS + ASD, DS + None (no DSM diagnosis), DS + DBD (disruptive behaviour disorder) and DS + SMD (stereotypic movement disorder), the latter two as comparison groups. Two clusters were identified with DS + ASD: Cluster 1 (35.1%) with higher disruptive behaviour and Cluster 4 (48.2%) with more severe autistic behaviour and higher percentage of late onset ASD. The majority of participants in DS + None (71.9%) and DS + DBD (87.5%) were classified into Cluster 2 and 3, respectively, while participants in DS + SMD were relatively evenly distributed throughout the four clusters. Our unbiased, DSM-independent analyses, using a rating scale specifically designed for individuals with severe intellectual disability, demonstrated that DSM-based criteria of ASD are applicable to DS individuals despite their cognitive impairments. Two DS + ASD clusters were identified and supported the existence of at least two subtypes of ASD in DS, which deserve further characterisation. Despite the prominence of stereotypic behaviour in DS, the SMD diagnosis was not identified by cluster analysis, suggesting that high-level stereotypy is distributed throughout DS. Further supporting DSM diagnoses, typically behaving DS participants were easily distinguished as a group from

  10. Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

    PubMed

    Williams, N J; Nasuto, S J; Saddy, J D

    2015-07-30

    The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.

    PubMed

    Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban

    2017-05-01

    Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.

  12. SUPERMODEL ANALYSIS OF GALAXY CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fusco-Femiano, R.; Cavaliere, A.; Lapi, A.

    2009-11-01

    We present the analysis of the X-ray brightness and temperature profiles for six clusters belonging to both the Cool Core (CC) and Non Cool Core (NCC) classes, in terms of the Supermodel (SM) developed by Cavaliere et al. Based on the gravitational wells set by the dark matter (DM) halos, the SM straightforwardly expresses the equilibrium of the intracluster plasma (ICP) modulated by the entropy deposited at the boundary by standing shocks from gravitational accretion, and injected at the center by outgoing blast waves from mergers or from outbursts of active galactic nuclei. The cluster set analyzed here highlights notmore » only how simply the SM represents the main dichotomy CC versus NCC clusters in terms of a few ICP parameters governing the radial entropy run, but also how accurately it fits even complex brightness and temperature profiles. For CC clusters like A2199 and A2597, the SM with a low level of central entropy straightforwardly yields the characteristic peaked profile of the temperature marked by a decline toward the center, without requiring currently strong radiative cooling and high mass deposition rates. NCC clusters like A1656 require instead a central entropy floor of a substantial level, and some like A2256 and even more A644 feature structured temperature profiles that also call for a definite floor extension; in such conditions the SM accurately fits the observations, and suggests that in these clusters the ICP has been just remolded by a merger event, in the way of a remnant cool core. The SM also predicts that DM halos with high concentration should correlate with flatter entropy profiles and steeper brightness in the outskirts; this is indeed the case with A1689, for which from X-rays we find concentration values c approx 10, the hallmark of an early halo formation. Thus, we show the SM to constitute a fast tool not only to provide wide libraries of accurate fits to X-ray temperature and density profiles, but also to retrieve from the

  13. A Comprehensive Careers Cluster Curriculum Model. Health Occupations Cluster Curriculum Project and Health-Care Aide Curriculum Project.

    ERIC Educational Resources Information Center

    Bortz, Richard F.

    To prepare learning materials for health careers programs at the secondary level, the developmental phase of two curriculum projects--the Health Occupations Cluster Curriculum Project and Health-Care Aide Curriculum Project--utilized a model which incorporated a key factor analysis technique. Entitled "A Comprehensive Careers Cluster Curriculum…

  14. Using Cluster Analysis for Data Mining in Educational Technology Research

    ERIC Educational Resources Information Center

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  15. [Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

    PubMed

    Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

    2018-03-06

    To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  16. Who are the obese? A cluster analysis exploring subgroups of the obese.

    PubMed

    Green, M A; Strong, M; Razak, F; Subramanian, S V; Relton, C; Bissell, P

    2016-06-01

    Body mass index (BMI) can be used to group individuals in terms of their height and weight as obese. However, such a distinction fails to account for the variation within this group across other factors such as health, demographic and behavioural characteristics. The study aims to examine the existence of subgroups of obese individuals. Data were taken from the Yorkshire Health Study (2010-12) including information on demographic, health and behavioural characteristics. Individuals with a BMI of ≥30 were included. A two-step cluster analysis was used to define groups of individuals who shared common characteristics. The cluster analysis found six distinct groups of individuals whose BMI was ≥30. These subgroups were heavy drinking males, young healthy females; the affluent and healthy elderly; the physically sick but happy elderly; the unhappy and anxious middle aged and a cluster with the poorest health. It is important to account for the important heterogeneity within individuals who are obese. Interventions introduced by clinicians and policymakers should not target obese individuals as a whole but tailor strategies depending upon the subgroups that individuals belong to. © The Author 2015. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Clusters of Occupations Based on Systematically Derived Work Dimensions: An Exploratory Study.

    ERIC Educational Resources Information Center

    Cunningham, J. W.; And Others

    The study explored the feasibility of deriving an educationally relevant occupational cluster structure based on Occupational Analysis Inventory (OAI) work dimensions. A hierarchical cluster analysis was applied to the factor score profiles of 814 occupations on 22 higher-order OAI work dimensions. From that analysis, 73 occupational clusters were…

  18. Low Back Pain Subgroups using Fear-Avoidance Model Measures: Results of a Cluster Analysis

    PubMed Central

    Beneciuk, Jason M.; Robinson, Michael E.; George, Steven Z.

    2012-01-01

    Objectives The purpose of this secondary analysis was to test the hypothesis that an empirically derived psychological subgrouping scheme based on multiple Fear-Avoidance Model (FAM) constructs would provide additional capabilities for clinical outcomes in comparison to a single FAM construct. Methods Patients (n = 108) with acute or sub-acute low back pain (LBP) enrolled in a clinical trial comparing behavioral physical therapy interventions to classification based physical therapy completed baseline questionnaires for pain catastrophizing (PCS), fear-avoidance beliefs (FABQ-PA, FABQ-W), and patient-specific fear (FDAQ). Clinical outcomes were pain intensity and disability measured at baseline, 4-weeks, and 6-months. A hierarchical agglomerative cluster analysis was used to create distinct cluster profiles among FAM measures and discriminant analysis was used to interpret clusters. Changes in clinical outcomes were investigated with repeated measures ANOVA and differences in results based on cluster membership were compared to FABQ-PA subgrouping used in the original trial. Results Three distinct FAM subgroups (Low Risk, High Specific Fear, and High Fear & Catastrophizing) emerged from cluster analysis. Subgroups differed on baseline pain and disability (p’s<.01) with the High Fear & Catastrophizing subgroup associated with greater pain than the Low Risk subgroup (p<.01) and the greatest disability (p’s<.05). Subgroup × time interactions were detected for both pain and disability (p’s<.05) with the High Fear & Catastrophizing subgroup reporting greater changes in pain and disability than other subgroups (p’s<.05). In contrast, FABQ-PA subgroups used in the original trial were not associated with interactions for clinical outcomes. Discussion These data suggest that subgrouping based on multiple FAM measures may provide additional information on clinical outcomes in comparison to determining subgroup status by FABQ-PA alone. Subgrouping methods for

  19. Identification and characterization of near-fatal asthma phenotypes by cluster analysis.

    PubMed

    Serrano-Pariente, J; Rodrigo, G; Fiz, J A; Crespo, A; Plaza, V

    2015-09-01

    Near-fatal asthma (NFA) is a heterogeneous clinical entity and several profiles of patients have been described according to different clinical, pathophysiological and histological features. However, there are no previous studies that identify in a unbiased way--using statistical methods such as clusters analysis--different phenotypes of NFA. Therefore, the aim of the present study was to identify and to characterize phenotypes of near fatal asthma using a cluster analysis. Over a period of 2 years, 33 Spanish hospitals enrolled 179 asthmatics admitted for an episode of NFA. A cluster analysis using two-steps algorithm was performed from data of 84 of these cases. The analysis defined three clusters of patients with NFA: cluster 1, the largest, including older patients with clinical and therapeutic criteria of severe asthma; cluster 2, with an high proportion of respiratory arrest (68%), impaired consciousness level (82%) and mechanical ventilation (93%); and cluster 3, which included younger patients, characterized by an insufficient anti-inflammatory treatment and frequent sensitization to Alternaria alternata and soybean. These results identify specific asthma phenotypes involved in NFA, confirming in part previous findings observed in studies with a clinical approach. The identification of patients with a specific NFA phenotype could suggest interventions to prevent future severe asthma exacerbations. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  20. Psychosocial Costs of Racism to Whites: Exploring Patterns through Cluster Analysis

    ERIC Educational Resources Information Center

    Spanierman, Lisa B.; Poteat, V. Paul; Beer, Amanda M.; Armstrong, Patrick Ian

    2006-01-01

    Participants (230 White college students) completed the Psychosocial Costs of Racism to Whites (PCRW) Scale. Using cluster analysis, we identified 5 distinct cluster groups on the basis of PCRW subscale scores: the unempathic and unaware cluster contained the lowest empathy scores; the insensitive and afraid cluster consisted of low empathy and…

  1. A scoping review of spatial cluster analysis techniques for point-event data.

    PubMed

    Fritz, Charles E; Schuurman, Nadine; Robertson, Colin; Lear, Scott

    2013-05-01

    Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians. In this research we conducted a scoping review to systematically search peer-reviewed journal databases for research that has employed spatial cluster analysis methods on individual-level, address location, or x and y coordinate derived data. To illustrate the thematic issues raised by our results, methods were tested using a dataset where known clusters existed. Point pattern methods, spatial clustering and cluster detection tests, and a locally weighted spatial regression model were most commonly used for individual-level, address location data (n = 29). The spatial scan statistic was the most popular method for address location data (n = 19). Six themes were identified relating to the application of spatial cluster analysis methods and subsequent analyses, which we recommend researchers to consider; exploratory analysis, visualization, spatial resolution, aetiology, scale and spatial weights. It is our intention that researchers seeking direction for using spatial cluster analysis methods, consider the caveats and strengths of each approach, but also explore the numerous other methods available for this type of analysis. Applied spatial epidemiology researchers and practitioners should give special consideration to applying multiple tests to a dataset. Future research should focus on developing frameworks for selecting appropriate methods and the corresponding spatial weighting schemes.

  2. Using cluster analysis to organize and explore regional GPS velocities

    USGS Publications Warehouse

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  3. Suicide in the oldest old: an observational study and cluster analysis.

    PubMed

    Sinyor, Mark; Tan, Lynnette Pei Lin; Schaffer, Ayal; Gallagher, Damien; Shulman, Kenneth

    2016-01-01

    The older population are at a high risk for suicide. This study sought to learn more about the characteristics of suicide in the oldest-old and to use a cluster analysis to determine if oldest-old suicide victims assort into clinically meaningful subgroups. Data were collected from a coroner's chart review of suicide victims in Toronto from 1998 to 2011. We compared two age groups (65-79 year olds, n = 335, and 80+ year olds, n = 191) and then conducted a hierarchical agglomerative cluster analysis using Ward's method to identify distinct clusters in the 80+ group. The younger and older age groups differed according to marital status, living circumstances and pattern of stressors. The cluster analysis identified three distinct clusters in the 80+ group. Cluster 1 was the largest (n = 124) and included people who were either married or widowed who had significantly more depression and somewhat more medical health stressors. In contrast, cluster 2 (n = 50) comprised people who were almost all single and living alone with significantly less identified depression and slightly fewer medical health stressors. All members of cluster 3 (n = 17) lived in a retirement residence or nursing home, and this group had the highest rates of depression, dementia, other mental illness and past suicide attempts. This is the first study to use the cluster analysis technique to identify meaningful subgroups among suicide victims in the oldest-old. The results reveal different patterns of suicide in the older population that may be relevant for clinical care. Copyright © 2015 John Wiley & Sons, Ltd.

  4. See Change: Cosmology Analysis Update for the Supernova Cosmology Project High-z Cluster Supernova Survey

    NASA Astrophysics Data System (ADS)

    Hayden, Brian; Aldering, Gregory; Amanullah, Rahman; Barbary, Kyle; Bohringer, Hans; Boone, Kyle Robert; Brodwin, Mark; Cunha, Carlos; Currie, Miles; Deustua, Susana; Dixon, Samantha; Eisenhardt, Peter; Fassbender, Rene; Fruchter, Andrew; Gladders, Michael; Gonzalez, Anthony; Goobar, Ariel; Hildebrandt, Hendrik; Hilton, Matt; Hoekstra, Henk; Hook, Isobel; Huang, Xiaosheng; Huterer, Dragan; Jee, Myungkook James; Kim, Alex; Kowalski, Marek; Lidman, Chris; Linder, Eric; Luther, Kyle; Meyers, Joshua; Muzzin, Adam; Nordin, Jakob; Pain, Reynald; Perlmutter, Saul; Richard, Johan; Rosati, Piero; Rozo, Eduardo; Rubin, David; Ruiz-Lapuente, Pilar; Rykoff, Eli; Santos, Joana; Myers Saunders, Clare; Sofiatti, Caroline; Spadafora, Anthony L.; Stanford, Spencer; Stern, Daniel; Suzuki, Nao; Webb, Tracy; Wechsler, Risa; Williams, Steven; Willis, Jon; Wilson, Gillian; Yen, Mike

    2018-01-01

    The Supernova Cosmology Project has finished executing a large (174 orbits, cycles 22-23) Hubble Space Telescope program, which has measured ~30 type Ia Supernovae above z~1 in the highest-redshift, most massive galaxy clusters known to date. We present the status of the ongoing blinded cosmology analysis, demonstrating substantial improvement to the uncertainty on the Dark Energy density above z~1. Our extensive HST and ground-based campaign has already produced unique results; we have confirmed several of the highest redshift cluster members known to date, confirmed the redshift of one of the most massive galaxy clusters expected across the entire sky, and characterized one of the most extreme starburst environments yet known in a z~1.7 cluster. We have also discovered a lensed SN Ia at z=2.22 magnified by a factor of ~2.8, which is the highest spectroscopic redshift SN Ia currently known.

  5. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-means Cluster Analysis.

    PubMed

    Craen, Saskia de; Commandeur, Jacques J F; Frank, Laurence E; Heiser, Willem J

    2006-06-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these populations showed a significant effect of lack of sphericity and group size. This effect was, however, not as large as expected, with still a recovery index of more than 0.5 in the "worst case scenario." An interaction effect between the two data aspects was also found. The decreasing trend in the recovery of clusters for increasing departures from sphericity is different for equal and unequal group sizes.

  6. Regional heatwaves in china: a cluster analysis

    NASA Astrophysics Data System (ADS)

    Wang, Pinya; Tang, Jianping; Wang, Shuyu; Dong, Xinning; Fang, Juan

    2018-03-01

    With the consideration of spatial extension of heatwave events, two kind of regional heatwaves using absolute and relative thresholds, namely RHWs-A and RHWs-R, are investigated during 1959-2013. The temperature data is derived from the daily maximum temperatures (DMTs) of 587 stations in China. Totally 298 RHWs-A and 374 RHWs-R are identified during the past 55 years, and both of them are growing more frequent since the mid-1980s. By utilizing the cluster analysis, several typical spatial distributions of RHWs-A/RHWs-R are obtained. For RHWs-A, there are three clusters covering the southeastern, northwestern China and the lower reaches of Yangtze River, of which the southeastern cluster groups the most heatwaves. For RHWs-R, there are seven clusters distributed throughout the whole regions of China. The clusters in the northwestern and northeastern China are more stable than others for both RHWs-A and RHWs-R, and the northern clusters are of larger intensity than that of the southern ones. All RHWs-A/RHWs-R are accompanied by the anomalous high systems along with the reduced soil moisture. The southern clusters are controlled by Northwestern Pacific subtropical high (WPSH), and the northern ones are influenced by the mid-latitude high systems. The influences of atmospheric circulations and soil moisture on regional heatwaves are further demonstrated by two case analyses of the severe RHW-A in 2003 and the RHW-R in 2013.

  7. Local bladder cancer clusters in southeastern Michigan accounting for risk factors, covariates and residential mobility.

    PubMed

    Jacquez, Geoffrey M; Shi, Chen; Meliker, Jaymie R

    2015-01-01

    In case control studies disease risk not explained by the significant risk factors is the unexplained risk. Considering unexplained risk for specific populations, places and times can reveal the signature of unidentified risk factors and risk factors not fully accounted for in the case-control study. This potentially can lead to new hypotheses regarding disease causation. Global, local and focused Q-statistics are applied to data from a population-based case-control study of 11 southeast Michigan counties. Analyses were conducted using both year- and age-based measures of time. The analyses were adjusted for arsenic exposure, education, smoking, family history of bladder cancer, occupational exposure to bladder cancer carcinogens, age, gender, and race. Significant global clustering of cases was not found. Such a finding would indicate large-scale clustering of cases relative to controls through time. However, highly significant local clusters were found in Ingham County near Lansing, in Oakland County, and in the City of Jackson, Michigan. The Jackson City cluster was observed in working-ages and is thus consistent with occupational causes. The Ingham County cluster persists over time, suggesting a broad-based geographically defined exposure. Focused clusters were found for 20 industrial sites engaged in manufacturing activities associated with known or suspected bladder cancer carcinogens. Set-based tests that adjusted for multiple testing were not significant, although local clusters persisted through time and temporal trends in probability of local tests were observed. Q analyses provide a powerful tool for unpacking unexplained disease risk from case-control studies. This is particularly useful when the effect of risk factors varies spatially, through time, or through both space and time. For bladder cancer in Michigan, the next step is to investigate causal hypotheses that may explain the excess bladder cancer risk localized to areas of Oakland and Ingham

  8. Regression analysis of clustered failure time data with informative cluster size under the additive transformation models.

    PubMed

    Chen, Ling; Feng, Yanqin; Sun, Jianguo

    2017-10-01

    This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.

  9. Dietary patterns in middle-aged Irish men and women defined by cluster analysis.

    PubMed

    Villegas, R; Salim, A; Collins, M M; Flynn, A; Perry, I J

    2004-12-01

    To identify and characterise dietary patterns in a middle-aged Irish population sample and study associations between these patterns, sociodemographic and anthropometric variables and major risk factors for cardiovascular disease. A cross-sectional study. A group of 1473 men and women were sampled from 17 general practice lists in the South of Ireland. A total of 1018 attended for screening, with a response rate of 69%. Participants completed a detailed health and lifestyle questionnaire and provided a fasting blood sample for glucose, lipids and homocysteine. Dietary intake was assessed using a standard food-frequency questionnaire adapted for use in the Irish population. The food-frequency questionnaire was a modification of that used in the UK arm of the European Prospective Investigation into Cancer study, which was based on that used in the US Nurses' Health Study. Dietary patterns were assessed primarily by K-means cluster analysis, following initial principal components analysis to identify the seeds. Three dietary patterns were identified. These clusters corresponded to a traditional Irish diet, a prudent diet and a diet characterised by high consumption of alcoholic drinks and convenience foods. Cluster 1 (Traditional Diet) had the highest intakes of saturated fat (SFA), monounsaturated fat (MUFA) and percentage of total energy from fat, and the lowest polyunsaturated fat (PUFA) intake and ratio of polyunsaturated to saturated fat (P:S). Cluster 2 (Prudent Diet) was characterised by significantly higher intakes of fibre, PUFA, P:S ratio and antioxidant vitamins (vitamins C and E), and lower intakes of total fat, MUFA, SFA and cholesterol. Cluster 3 (Alcohol & Convenience Foods) had the highest intakes of alcohol, protein, cholesterol, vitamin B(12), vitamin B(6), folate, iron, phosphorus, selenium and zinc, and the lowest intakes of PUFA, vitamin A and antioxidant vitamins (vitamins C and E). There were significant differences between clusters in gender

  10. The Productivity Analysis of Chennai Automotive Industry Cluster

    NASA Astrophysics Data System (ADS)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  11. Micro-heterogeneity versus clustering in binary mixtures of ethanol with water or alkanes.

    PubMed

    Požar, Martina; Lovrinčević, Bernarda; Zoranić, Larisa; Primorać, Tomislav; Sokolić, Franjo; Perera, Aurélien

    2016-08-24

    Ethanol is a hydrogen bonding liquid. When mixed in small concentrations with water or alkanes, it forms aggregate structures reminiscent of, respectively, the direct and inverse micellar aggregates found in emulsions, albeit at much smaller sizes. At higher concentrations, micro-heterogeneous mixing with segregated domains is found. We examine how different statistical methods, namely correlation function analysis, structure factor analysis and cluster distribution analysis, can describe efficiently these morphological changes in these mixtures. In particular, we explain how the neat alcohol pre-peak of the structure factor evolves into the domain pre-peak under mixing conditions, and how this evolution differs whether the co-solvent is water or alkane. This study clearly establishes the heuristic superiority of the correlation function/structure factor analysis to study the micro-heterogeneity, since cluster distribution analysis is insensitive to domain segregation. Correlation functions detect the domains, with a clear structure factor pre-peak signature, while the cluster techniques detect the cluster hierarchy within domains. The main conclusion is that, in micro-segregated mixtures, the domain structure is a more fundamental statistical entity than the underlying cluster structures. These findings could help better understand comparatively the radiation scattering experiments, which are sensitive to domains, versus the spectroscopy-NMR experiments, which are sensitive to clusters.

  12. Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations

    NASA Astrophysics Data System (ADS)

    Žibert, Janez; Pražnikar, Jure

    2012-09-01

    The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of

  13. A Note on Cluster Effects in Latent Class Analysis

    ERIC Educational Resources Information Center

    Kaplan, David; Keller, Bryan

    2011-01-01

    This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under…

  14. Classification and Clustering Methods for Multiple Environmental Factors in Gene-Environment Interaction: Application to the Multi-Ethnic Study of Atherosclerosis.

    PubMed

    Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A; Kardia, Sharon L R; Allison, Matthew; Diez Roux, Ana V

    2016-11-01

    There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.

  15. Social and Behavioral Risk Marker Clustering Associated with Biological Risk Factors for Coronary Heart Disease: NHANES 2001–2004

    PubMed Central

    Everage, Nicholas J.; Linkletter, Crystal D.; Gjelsvik, Annie; McGarvey, Stephen T.; Loucks, Eric B.

    2014-01-01

    Background. Social and behavioral risk markers (e.g., physical activity, diet, smoking, and socioeconomic position) cluster; however, little is known whether clustering is associated with coronary heart disease (CHD) risk. Objectives were to determine if sociobehavioral clustering is associated with biological CHD risk factors (total cholesterol, HDL cholesterol, systolic blood pressure, body mass index, waist circumference, and diabetes) and whether associations are independent of individual clustering components. Methods. Participants included 4,305 males and 4,673 females aged ≥20 years from NHANES 2001–2004. Sociobehavioral Risk Marker Index (SRI) included a summary score of physical activity, fruit/vegetable consumption, smoking, and educational attainment. Regression analyses evaluated associations of SRI with aforementioned biological CHD risk factors. Receiver operator curve analyses assessed independent predictive ability of SRI. Results. Healthful clustering (SRI = 0) was associated with improved biological CHD risk factor levels in 5 of 6 risk factors in females and 2 of 6 risk factors in males. Adding SRI to models containing age, race, and individual SRI components did not improve C-statistics. Conclusions. Findings suggest that healthful sociobehavioral risk marker clustering is associated with favorable CHD risk factor levels, particularly in females. These findings should inform social ecological interventions that consider health impacts of addressing social and behavioral risk factors. PMID:24719858

  16. Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.

    PubMed

    Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si

    2017-07-01

    Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.

  17. Input frequency and lexical variability in phonological development: a survival analysis of word-initial cluster production.

    PubMed

    Ota, Mitsuhiko; Green, Sam J

    2013-06-01

    Although it has been often hypothesized that children learn to produce new sound patterns first in frequently heard words, the available evidence in support of this claim is inconclusive. To re-examine this question, we conducted a survival analysis of word-initial consonant clusters produced by three children in the Providence Corpus (0 ; 11-4 ; 0). The analysis took account of several lexical factors in addition to lexical input frequency, including the age of first production, production frequency, neighborhood density and number of phonemes. The results showed that lexical input frequency was a significant predictor of the age at which the accuracy level of cluster production in each word first reached 80%. The magnitude of the frequency effect differed across cluster types. Our findings indicate that some of the between-word variance found in the development of sound production can indeed be attributed to the frequency of words in the child's ambient language.

  18. A Bayesian cluster analysis method for single-molecule localization microscopy data.

    PubMed

    Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick

    2016-12-01

    Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.

  19. Cluster Analysis Identifies 3 Phenotypes within Allergic Asthma.

    PubMed

    Sendín-Hernández, María Paz; Ávila-Zarza, Carmelo; Sanz, Catalina; García-Sánchez, Asunción; Marcos-Vadillo, Elena; Muñoz-Bellido, Francisco J; Laffond, Elena; Domingo, Christian; Isidoro-García, María; Dávila, Ignacio

    Asthma is a heterogeneous chronic disease with different clinical expressions and responses to treatment. In recent years, several unbiased approaches based on clinical, physiological, and molecular features have described several phenotypes of asthma. Some phenotypes are allergic, but little is known about whether these phenotypes can be further subdivided. We aimed to phenotype patients with allergic asthma using an unbiased approach based on multivariate classification techniques (unsupervised hierarchical cluster analysis). From a total of 54 variables of 225 patients with well-characterized allergic asthma diagnosed following American Thoracic Society (ATS) recommendation, positive skin prick test to aeroallergens, and concordant symptoms, we finally selected 19 variables by multiple correspondence analyses. Then a cluster analysis was performed. Three groups were identified. Cluster 1 was constituted by patients with intermittent or mild persistent asthma, without family antecedents of atopy, asthma, or rhinitis. This group showed the lowest total IgE levels. Cluster 2 was constituted by patients with mild asthma with a family history of atopy, asthma, or rhinitis. Total IgE levels were intermediate. Cluster 3 included patients with moderate or severe persistent asthma that needed treatment with corticosteroids and long-acting β-agonists. This group showed the highest total IgE levels. We identified 3 phenotypes of allergic asthma in our population. Furthermore, we described 2 phenotypes of mild atopic asthma mainly differentiated by a family history of allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  20. The Quantitative Analysis of Chennai Automotive Industry Cluster

    NASA Astrophysics Data System (ADS)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  1. Behavioral Health Risk Profiles of Undergraduate University Students in England, Wales, and Northern Ireland: A Cluster Analysis.

    PubMed

    El Ansari, Walid; Ssewanyana, Derrick; Stock, Christiane

    2018-01-01

    Limited research has explored clustering of lifestyle behavioral risk factors (BRFs) among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance. We assessed (BRFs), namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities. A two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious) with very high health awareness/consciousness, good nutrition, and physical activity (PA), and relatively low alcohol, tobacco, and other drug (ATOD) use. Cluster 2 (the abstinent) had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious) included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking) showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1), students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4) reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students. Our results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.

  2. Cardiovascular Risk Factors in Cluster Headache.

    PubMed

    Lasaosa, S Santos; Diago, E Bellosta; Calzada, J Navarro; Benito, A Velázquez

    2017-06-01

     Patients with cluster headache tend to have a dysregulation of systemic blood pressure such as increased blood pressure variability and decreased nocturnal dipping. This pattern of nocturnal nondipping is associated with end-organ damage and increased risk of cardiovascular disease.  To determine if cluster headache is associated with a higher risk of cardiovascular disease.  Cross-sectional study of 33 cluster headache patients without evidence of cardiovascular disease and 30 age- and gender-matched healthy controls. Ambulatory blood pressure monitoring was performed in all subjects. We evaluate anthropometric, hematologic, and structural parameters (carotid intima-media thickness and ankle-brachial index).  Of the 33 cluster headache patients, 16 (48.5%) were nondippers, a higher percentage than expected. Most of the cluster headache patients (69.7%) also presented a pathological ankle-brachial index. In terms of the carotid intima-media thickness values, 58.3% of the patients were in the 75th percentile, 25% were in the 90th percentile, and 20% were in the 95th percentile. In the control group, only five of the 30 subjects (16.7%) had a nondipper pattern ( P  =   0.004), with 4.54% in the 90th and 95th percentiles ( P  =   0.012 and 0.015).  Compared with healthy controls, patients with cluster headache presented a high incidence (48.5%) of nondipper pattern, pathological ankle-brachial index (69.7%), and intima-media thickness values above the 75th percentile. These findings support the hypothesis that patients with cluster headache present increased risk of cardiovascular disease. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  3. Robust continuous clustering

    PubMed Central

    Shah, Sohil Atul

    2017-01-01

    Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838

  4. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

    PubMed Central

    Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

    2016-01-01

    Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q-EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Conclusions: Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas

  5. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

    PubMed

    Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

    2016-11-01

    To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q -EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.

  6. Cluster analysis of cognitive performance in elderly and demented subjects.

    PubMed

    Giaquinto, S; Nolfe, G; Calvani, M

    1985-06-01

    48 elderly normals, 14 demented subjects and 76 young controls were tested for basic cognitive functions. All the tests were quantified and could therefore be subjected to statistical analysis. The results show a difference in the speed of information processing and in memory load between the young controls and elderly normals but the age groups differed in quantitative terms only. Cluster analysis showed that the elderly and the demented formed two distinctly separate groups at the qualitative level, the basic cognitive processes being damaged in the demented group. Age thus appears to be only a risk factor for dementia and not its cause. It is concluded that batteries based on precise and measurable tasks are the most appropriate not only for the study of dementia but for rehabilitation purposes too.

  7. Cluster analysis of obesity and asthma phenotypes.

    PubMed

    Sutherland, E Rand; Goleva, Elena; King, Tonya S; Lehman, Erik; Stevens, Allen D; Jackson, Leisa P; Stream, Amanda R; Fahy, John V; Leung, Donald Y M

    2012-01-01

    Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC). Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype. In a cohort of clinical trial participants (n = 250), minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα) and induction of MAP kinase phosphatase-1 (MKP-1) expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2)) and severity of asthma symptoms (AEQ score) the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively). Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ) and control (ACQ), exhaled nitric oxide concentration (F(E)NO) and airway hyperresponsiveness (methacholine PC(20)) but were similar with regard to measures of lung function (FEV(1) (%) and FEV(1)/FVC), airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP). Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasone Obesity is an important determinant of asthma phenotype in adults. There is heterogeneity in expression of clinical and inflammatory biomarkers of asthma across obese individuals

  8. Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.

    PubMed

    Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J

    2017-12-01

    Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the

  9. Graph-Theoretic Analysis of Monomethyl Phosphate Clustering in Ionic Solutions.

    PubMed

    Han, Kyungreem; Venable, Richard M; Bryant, Anne-Marie; Legacy, Christopher J; Shen, Rong; Li, Hui; Roux, Benoît; Gericke, Arne; Pastor, Richard W

    2018-02-01

    All-atom molecular dynamics simulations combined with graph-theoretic analysis reveal that clustering of monomethyl phosphate dianion (MMP 2- ) is strongly influenced by the types and combinations of cations in the aqueous solution. Although Ca 2+ promotes the formation of stable and large MMP 2- clusters, K + alone does not. Nonetheless, clusters are larger and their link lifetimes are longer in mixtures of K + and Ca 2+ . This "synergistic" effect depends sensitively on the Lennard-Jones interaction parameters between Ca 2+ and the phosphorus oxygen and correlates with the hydration of the clusters. The pronounced MMP 2- clustering effect of Ca 2+ in the presence of K + is confirmed by Fourier transform infrared spectroscopy. The characterization of the cation-dependent clustering of MMP 2- provides a starting point for understanding cation-dependent clustering of phosphoinositides in cell membranes.

  10. Linking Strengths: Identifying and Exploring Protective Factor Clusters in Academically Resilient Low-Socioeconomic Urban Students of Color

    ERIC Educational Resources Information Center

    Morales, Erik E.

    2010-01-01

    Based on data from qualitative interviews with 50 high-achieving low-socioeconomic students of color, two "clusters" of important and symbiotic protective factors are identified and explored. Each cluster consists of a series of interrelated protective factors identified by the participants as crucial to their statistically exceptional academic…

  11. Factor Structure of the PTSD Checklist for DSM-5: Relationships Among Symptom Clusters, Anger, and Impulsivity.

    PubMed

    Armour, Cherie; Contractor, Ateka; Shea, Tracie; Elhai, Jon D; Pietrzak, Robert H

    2016-02-01

    Scarce data are available regarding the dimensional structure of Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) posttraumatic stress disorder (PTSD) symptoms and how factors relate to external constructs. We evaluated six competing models of DSM-5 PTSD symptoms, including Anhedonia, Externalizing Behaviors, and Hybrid models, using confirmatory factor analyses in a sample of 412 trauma-exposed college students. We then examined whether PTSD symptom clusters were differentially related to measures of anger and impulsivity using Wald chi-square tests. The seven-factor Hybrid model was deemed optimal compared with the alternatives. All symptom clusters were associated with anger; the strongest association was between externalizing behaviors and anger (r = 0.54). All symptom clusters, except re-experiencing and avoidance, were associated with impulsivity, with the strongest association between externalizing behaviors and impulsivity (r = 0.49). A seven-factor Hybrid model provides superior fit to DSM-5 PTSD symptom data, with the externalizing behaviors factor being most strongly related to anger and impulsivity.

  12. Cluster analysis as a prediction tool for pregnancy outcomes.

    PubMed

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  13. Clustering analysis of line indices for LAMOST spectra with AstroStat

    NASA Astrophysics Data System (ADS)

    Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

    2018-06-01

    The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.

  14. Identifying Peer Institutions Using Cluster Analysis

    ERIC Educational Resources Information Center

    Boronico, Jess; Choksi, Shail S.

    2012-01-01

    The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…

  15. Analysis of correlated mutations in HIV-1 protease using spectral clustering.

    PubMed

    Liu, Ying; Eyal, Eran; Bahar, Ivet

    2008-05-15

    The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.

  16. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    PubMed

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  17. Classification of microvascular patterns via cluster analysis reveals their prognostic significance in glioblastoma.

    PubMed

    Chen, Long; Lin, Zhi-Xiong; Lin, Guo-Shi; Zhou, Chang-Fu; Chen, Yu-Peng; Wang, Xing-Fu; Zheng, Zong-Qing

    2015-01-01

    There are limited researches focusing on microvascular patterns (MVPs) in human glioblastoma and their prognostic impact. We evaluated MVPs of 78 glioblastomas by CD34/periodic acid-Schiff dual staining and by cluster analysis of the percentage of microvascular area for distinct microvascular formations. The distribution of 5 types of basic microvascular formations, that is, microvascular sprouting (MS), vascular cluster (VC), vascular garland (VG), glomeruloid vascular proliferation (GVP), and vasculogenic mimicry (VM), was variable. Accordingly, cluster analysis classified MVPs into 2 types: type I MVP displayed prominent MSs and VCs, whereas type II MVP had numerous VGs, GVPs, and VMs. By analyzing the proportion of microvascular area for each type of formation, we determined that glioblastomas with few MSs and VCs had many GVPs and VMs, and vice versa. VG seemed to be a transitional type of formation. In case of type I MVP, expression of Ki-67 and p53 but not MGMT was significantly higher as compared with those of type II MVP (P < .05). Survival analysis showed that the type of MVPs presented as an independent prognostic factor of progression-free survival (PFS) and overall survival (OS) (both P < .001). Type II MVP had a more negative influence on PFS and OS than did type I MVP. We conclude that the heterogeneous MVPs in glioblastoma can be categorized properly by certain histopathologic and statistical analyses and may influence clinical outcome. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  18. Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

    PubMed

    Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

    2018-07-01

    Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

    PubMed

    Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

    2017-10-01

    Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P < .0001 for all comparisons). We identified 3 clinical clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc

  20. Surface Analysis Cluster Tool | Materials Science | NREL

    Science.gov Websites

    spectroscopic ellipsometry during film deposition. The cluster tool can be used to study the effect of various prior to analysis. Here we illustrate the surface cleaning effect of an aqueous ammonia treatment on a

  1. A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.

    The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in jobmore » queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.« less

  2. Using cluster analysis to identify phenotypes and validation of mortality in men with COPD.

    PubMed

    Chen, Chiung-Zuei; Wang, Liang-Yi; Ou, Chih-Ying; Lee, Cheng-Hung; Lin, Chien-Chung; Hsiue, Tzuen-Ren

    2014-12-01

    Cluster analysis has been proposed to examine phenotypic heterogeneity in chronic obstructive pulmonary disease (COPD). The aim of this study was to use cluster analysis to define COPD phenotypes and validate them by assessing their relationship with mortality. Male subjects with COPD were recruited to identify and validate COPD phenotypes. Seven variables were assessed for their relevance to COPD, age, FEV(1) % predicted, BMI, history of severe exacerbations, mMRC, SpO(2), and Charlson index. COPD groups were identified by cluster analysis and validated prospectively against mortality during a 4-year follow-up. Analysis of 332 COPD subjects identified five clusters from cluster A to cluster E. Assessment of the predictive validity of these clusters of COPD showed that cluster E patients had higher all cause mortality (HR 18.3, p < 0.0001), and respiratory cause mortality (HR 21.5, p < 0.0001) than those in the other four groups. Cluster E patients also had higher all cause mortality (HR 14.3, p = 0.0002) and respiratory cause mortality (HR 10.1, p = 0.0013) than patients in cluster D alone. COPD patient with severe airflow limitation, many symptoms, and a history of frequent severe exacerbations was a novel and distinct clinical phenotype predicting mortality in men with COPD.

  3. Genomic and Metabolomic Profile Associated to Clustering of Cardio-Metabolic Risk Factors

    PubMed Central

    Marrachelli, Vannina G.; Rentero, Pilar; Mansego, María L.; Morales, Jose Manuel; Galan, Inma; Pardo-Tendero, Mercedes; Martinez, Fernando; Martin-Escudero, Juan Carlos; Briongos, Laisa; Chaves, Felipe Javier; Redon, Josep; Monleon, Daniel

    2016-01-01

    Background To identify metabolomic and genomic markers associated with the presence of clustering of cardiometabolic risk factors (CMRFs) from a general population. Methods and Findings One thousand five hundred and two subjects, Caucasian, > 18 years, representative of the general population, were included. Blood pressure measurement, anthropometric parameters and metabolic markers were measured. Subjects were grouped according the number of CMRFs (Group 1: <2; Group 2: 2; Group 3: 3 or more CMRFs). Using SNPlex, 1251 SNPs potentially associated to clustering of three or more CMRFs were analyzed. Serum metabolomic profile was assessed by 1H NMR spectra using a Brucker Advance DRX 600 spectrometer. From the total population, 1217 (mean age 54±19, 50.6% men) with high genotyping call rate were analysed. A differential metabolomic profile, which included products from mitochondrial metabolism, extra mitochondrial metabolism, branched amino acids and fatty acid signals were observed among the three groups. The comparison of metabolomic patterns between subjects of Groups 1 to 3 for each of the genotypes associated to those subjects with three or more CMRFs revealed two SNPs, the rs174577_AA of FADS2 gene and the rs3803_TT of GATA2 transcription factor gene, with minimal or no statistically significant differences. Subjects with and without three or more CMRFs who shared the same genotype and metabolomic profile differed in the pattern of CMRFS cluster. Subjects of Group 3 and the AA genotype of the rs174577 had a lower prevalence of hypertension compared to the CC and CT genotype. In contrast, subjects of Group 3 and the TT genotype of the rs3803 polymorphism had a lower prevalence of T2DM, although they were predominantly males and had higher values of plasma creatinine. Conclusions The results of the present study add information to the metabolomics profile and to the potential impact of genetic factors on the variants of clustering of cardiometabolic risk factors

  4. Genomic and Metabolomic Profile Associated to Clustering of Cardio-Metabolic Risk Factors.

    PubMed

    Marrachelli, Vannina G; Rentero, Pilar; Mansego, María L; Morales, Jose Manuel; Galan, Inma; Pardo-Tendero, Mercedes; Martinez, Fernando; Martin-Escudero, Juan Carlos; Briongos, Laisa; Chaves, Felipe Javier; Redon, Josep; Monleon, Daniel

    2016-01-01

    To identify metabolomic and genomic markers associated with the presence of clustering of cardiometabolic risk factors (CMRFs) from a general population. One thousand five hundred and two subjects, Caucasian, > 18 years, representative of the general population, were included. Blood pressure measurement, anthropometric parameters and metabolic markers were measured. Subjects were grouped according the number of CMRFs (Group 1: <2; Group 2: 2; Group 3: 3 or more CMRFs). Using SNPlex, 1251 SNPs potentially associated to clustering of three or more CMRFs were analyzed. Serum metabolomic profile was assessed by 1H NMR spectra using a Brucker Advance DRX 600 spectrometer. From the total population, 1217 (mean age 54±19, 50.6% men) with high genotyping call rate were analysed. A differential metabolomic profile, which included products from mitochondrial metabolism, extra mitochondrial metabolism, branched amino acids and fatty acid signals were observed among the three groups. The comparison of metabolomic patterns between subjects of Groups 1 to 3 for each of the genotypes associated to those subjects with three or more CMRFs revealed two SNPs, the rs174577_AA of FADS2 gene and the rs3803_TT of GATA2 transcription factor gene, with minimal or no statistically significant differences. Subjects with and without three or more CMRFs who shared the same genotype and metabolomic profile differed in the pattern of CMRFS cluster. Subjects of Group 3 and the AA genotype of the rs174577 had a lower prevalence of hypertension compared to the CC and CT genotype. In contrast, subjects of Group 3 and the TT genotype of the rs3803 polymorphism had a lower prevalence of T2DM, although they were predominantly males and had higher values of plasma creatinine. The results of the present study add information to the metabolomics profile and to the potential impact of genetic factors on the variants of clustering of cardiometabolic risk factors.

  5. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

    PubMed

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-06-18

    Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson

  6. Application of microarray analysis on computer cluster and cloud platforms.

    PubMed

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  7. A Cluster Analysis of Bronchial Asthma Patients with Depressive Symptoms.

    PubMed

    Seino, Yo; Hasegawa, Takashi; Koya, Toshiyuki; Sakagami, Takuro; Mashima, Ichiro; Shimizu, Natsue; Muramatsu, Yoshiyuki; Muramatsu, Kumiko; Suzuki, Eiichi; Kikuchi, Toshiaki

    2018-03-09

    Objective Whether or not depression affects the control or severity of asthma is unclear. We performed a cluster analysis of asthma patients with depressive symptoms to clarify their characteristics. Methods and subjects Multiple medical institutions in Niigata Prefecture, Japan, were surveyed in 2014. We recorded the age, disease duration, body mass index (BMI), medications, and surveyed asthma control status and severity, as well as depressive symptoms and adherence to treatment using questionnaires. A hierarchical cluster analysis was performed on the group of patients assessed as having depression. Results Of 2,273 patients, 128 were assessed as being positive for depressive symptoms (DS[+]). Thirty-three were excluded because of missing data, and the remaining 95 DS[+] patients were classified into 3 clusters (A, B, and C). The patients in cluster A (n=19) were elderly, had severe, poorly controlled asthma, and demonstrated possible adherence barriers; those in cluster B (n=26) were elderly with a low BMI and had no significant adherence barriers but had severe, poorly controlled asthma; and those in cluster C (n=50) were younger, with a high BMI, no significant adherence barriers, well-controlled asthma, and few were severely affected. The scores for depressive symptoms were not significantly different between clusters. Conclusion About half of the patients in the DS[+] group had severe, poorly controlled asthma, and these clusters were able to be distinguished by their ASK-12 score, which reflects adherence barriers. The control status and severity of asthma may also be related to the age, disease duration, and BMI in the DS[+] group.

  8. REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Glascoe, L G; Glaser, R E; Chin, H S

    2004-06-17

    The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less

  9. Bayesian network meta-analysis for cluster randomized trials with binary outcomes.

    PubMed

    Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard

    2017-06-01

    Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  10. Fuzzy cluster analysis of high-field functional MRI data.

    PubMed

    Windischberger, Christian; Barth, Markus; Lamm, Claus; Schroeder, Lee; Bauer, Herbert; Gur, Ruben C; Moser, Ewald

    2003-11-01

    Functional magnetic resonance imaging (fMRI) based on blood-oxygen level dependent (BOLD) contrast today is an established brain research method and quickly gains acceptance for complementary clinical diagnosis. However, neither the basic mechanisms like coupling between neuronal activation and haemodynamic response are known exactly, nor can the various artifacts be predicted or controlled. Thus, modeling functional signal changes is non-trivial and exploratory data analysis (EDA) may be rather useful. In particular, identification and separation of artifacts as well as quantification of expected, i.e. stimulus correlated, and novel information on brain activity is important for both, new insights in neuroscience and future developments in functional MRI of the human brain. After an introduction on fuzzy clustering and very high-field fMRI we present several examples where fuzzy cluster analysis (FCA) of fMRI time series helps to identify and locally separate various artifacts. We also present and discuss applications and limitations of fuzzy cluster analysis in very high-field functional MRI: differentiate temporal patterns in MRI using (a) a test object with static and dynamic parts, (b) artifacts due to gross head motion artifacts. Using a synthetic fMRI data set we quantitatively examine the influences of relevant FCA parameters on clustering results in terms of receiver-operator characteristics (ROC) and compare them with a commonly used model-based correlation analysis (CA) approach. The application of FCA in analyzing in vivo fMRI data is shown for (a) a motor paradigm, (b) data from multi-echo imaging, and (c) a fMRI study using mental rotation of three-dimensional cubes. We found that differentiation of true "neural" from false "vascular" activation is possible based on echo time dependence and specific activation levels, as well as based on their signal time-course. Exploratory data analysis methods in general and fuzzy cluster analysis in particular may

  11. First CCD UBVI photometric analysis of six open cluster candidates

    NASA Astrophysics Data System (ADS)

    Piatti, A. E.; Clariá, J. J.; Ahumada, A. V.

    2011-04-01

    We have obtained CCD UBVIKC photometry down to V ˜ 22 for the open cluster candidates Haffner 3, Haffner 5, NGC 2368, Haffner 25, Hogg 3 and Hogg 4 and their surrounding fields. None of these objects have been photometrically studied so far. Our analysis shows that these stellar groups are not genuine open clusters since no clear main sequences or other meaningful features can be seen in their colour-magnitude and colour-colour diagrams. We checked for possible differential reddening across the studied fields that could be hiding the characteristics of real open clusters. However, the dust in the directions to these objects appears to be uniformly distributed. Moreover, star counts carried out within and outside the open cluster candidate fields do not support the hypothesis that these objects are real open clusters or even open cluster remnants.

  12. Human factors analysis of workstation design: Earth Radiation Budget Satellite Mission Operations Room

    NASA Technical Reports Server (NTRS)

    Stewart, L. J.; Murphy, E. D.; Mitchell, C. M.

    1982-01-01

    A human factors analysis addressed three related yet distinct issues within the area of workstation design for the Earth Radiation Budget Satellite (ERBS) mission operation room (MOR). The first issue, physical layout of the MOR, received the most intensive effort. It involved the positioning of clusters of equipment within the physical dimensions of the ERBS MOR. The second issue for analysis was comprised of several environmental concerns, such as lighting, furniture, and heating and ventilation systems. The third issue was component arrangement, involving the physical arrangement of individual components within clusters of consoles, e.g., a communications panel.

  13. Density-cluster NMA: A new protein decomposition technique for coarse-grained normal mode analysis.

    PubMed

    Demerdash, Omar N A; Mitchell, Julie C

    2012-07-01

    Normal mode analysis has emerged as a useful technique for investigating protein motions on long time scales. This is largely due to the advent of coarse-graining techniques, particularly Hooke's Law-based potentials and the rotational-translational blocking (RTB) method for reducing the size of the force-constant matrix, the Hessian. Here we present a new method for domain decomposition for use in RTB that is based on hierarchical clustering of atomic density gradients, which we call Density-Cluster RTB (DCRTB). The method reduces the number of degrees of freedom by 85-90% compared with the standard blocking approaches. We compared the normal modes from DCRTB against standard RTB using 1-4 residues in sequence in a single block, with good agreement between the two methods. We also show that Density-Cluster RTB and standard RTB perform well in capturing the experimentally determined direction of conformational change. Significantly, we report superior correlation of DCRTB with B-factors compared with 1-4 residue per block RTB. Finally, we show significant reduction in computational cost for Density-Cluster RTB that is nearly 100-fold for many examples. Copyright © 2012 Wiley Periodicals, Inc.

  14. Body Composition Indices and Single and Clustered Cardiovascular Disease Risk Factors in Adolescents: Providing Clinical-Based Cut-Points.

    PubMed

    Gracia-Marco, Luis; Moreno, Luis A; Ruiz, Jonatan R; Ortega, Francisco B; de Moraes, Augusto César Ferreira; Gottrand, Frederic; Roccaldo, Romana; Marcos, Ascensión; Gómez-Martínez, Sonia; Dallongeville, Jean; Kafatos, Anthony; Molnar, Denes; Bueno, Gloria; de Henauw, Stefaan; Widhalm, Kurt; Wells, Jonathan C

    2016-01-01

    The aims of the present study in adolescents were 1) to examine how various body composition-screening tests relate to single and clustered cardiovascular disease (CVD) risk factors, 2) to examine how lean mass and body fatness (independently of each other) relate to clustered CVD risk factors, and 3) to calculate specific thresholds for body composition indices associated with an unhealthier clustered CVD risk. We measured 1089 European adolescents (46.7% boys, 12.5-17.49years) in 2006-2007. CVD risk factors included: systolic blood pressure, maximum oxygen uptake, homeostasis model assessment, C-reactive protein (n=748), total cholesterol/high density lipoprotein cholesterol and triglycerides. Body composition indices included: height, body mass index (BMI), lean mass, the sum of four skinfolds, central/peripheral skinfolds, waist circumference (WC), waist-to-height ratio (WHtR) and waist-to-hip ratio (WHR). Most body composition indices are associated with single CVD risk factors. The sum of four skinfolds, WHtR, BMI, WC and lean mass are strong and positively associated with clustered CVD risk. Interestingly, lean mass is positively associated with clustered CVD risk independently of body fatness in girls. Moderate and highly accurate thresholds for the sum of four skinfolds, WHtR, BMI, WC and lean mass are associated with an unhealthier clustered CVD risk (all AUC>0.773). In conclusion, our results support an association between most of the assessed body composition indices and single and clustered CVD risk factors. In addition, lean mass (independent of body fatness) is positively associated with clustered CVD risk in girls, which is a novel finding that helps to understand why an index such as BMI is a good index of CVD risk but a bad index of adiposity. Moderate to highly accurate thresholds for body composition indices associated with a healthier clustered CVD risk were found. Further studies with a longitudinal design are needed to confirm these findings

  15. Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

    PubMed

    Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

    2012-10-01

    The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  16. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    PubMed

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  17. Who are the healthy active seniors? A cluster analysis.

    PubMed

    Lai, Claudia K Y; Chan, Engle Angela; Chin, Kenny C W

    2014-12-01

    This paper reports a cluster analysis of a sample recruited from a randomized controlled trial that explored the effect of using a life story work approach to improve the psychological outcomes of older people in the community. 238 subjects from community centers were included in this analysis. After statistical testing, 169 seniors were assigned to the active ageing (AG) cluster and 69 to the inactive ageing (IG) cluster. Those in the AG were younger and healthier, with fewer chronic diseases and fewer depressive symptoms than those in the IG. They were more satisfied with their lives, and had higher self-esteem. They met with their family members more frequently, they engaged in more leisure activities and were more likely to have the ability to move freely. In summary, active ageing was observed in people with better health and functional performance. Our results echoed the limited findings reported in the literature.

  18. Tweets clustering using latent semantic analysis

    NASA Astrophysics Data System (ADS)

    Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

    2017-04-01

    Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

  19. The composite sequential clustering technique for analysis of multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  20. Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.

    PubMed

    Conley, Samantha

    2017-12-01

    The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.

  1. Cluster-based analysis of multi-model climate ensembles

    NASA Astrophysics Data System (ADS)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and

  2. Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

    PubMed

    Lukashin, A V; Fuchs, R

    2001-05-01

    Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.

  3. Cohort study on clustering of lifestyle risk factors and understanding its association with stress on health and wellbeing among school teachers in Malaysia (CLUSTer)--a study protocol.

    PubMed

    Moy, Foong Ming; Hoe, Victor Chee Wai; Hairi, Noran Naqiah; Buckley, Brian; Wark, Petra A; Koh, David; Bueno-de-Mesquita, H Bas; Bulgiba, Awang M

    2014-06-17

    The study on Clustering of Lifestyle risk factors and Understanding its association with Stress on health and wellbeing among school Teachers in Malaysia (CLUSTer) is a prospective cohort study which aims to extensively study teachers in Malaysia with respect to clustering of lifestyle risk factors and stress, and subsequently, to follow-up the population for important health outcomes. This study is being conducted in six states within Peninsular Malaysia. From each state, schools from each district are randomly selected and invited to participate in the study. Once the schools agree to participate, all teachers who fulfilled the inclusion criteria are invited to participate. Data collection includes a questionnaire survey and health assessment. Information collected in the questionnaire includes socio-demographic characteristics, participants' medical history and family history of chronic diseases, teaching characteristics and burden, questions on smoking, alcohol consumption and physical activities (IPAQ); a food frequency questionnaire, the job content questionnaire (JCQ); depression, anxiety and stress scale (DASS21); health related quality of life (SF12-V2); Voice Handicap Index 10 on voice disorder, questions on chronic pain, sleep duration and obstetric history for female participants. Following blood drawn for predefined clinical tests, additional blood and urine specimens are collected and stored for future analysis. Active follow up of exposure and health outcomes will be carried out every two years via telephone or face to face contact. Data collection started in March 2013 and as of the end of March 2014 has been completed for four states: Kuala Lumpur, Selangor, Melaka and Penang. Approximately 6580 participants have been recruited. The first round of data collection and blood sampling is expected to be completed by the end of 2014 with an expected 10,000 participants recruited. Our study will provide a good basis for exploring the clustering of

  4. Globular Cluster Abundances from High-resolution, Integrated-light Spectroscopy. II. Expanding the Metallicity Range for Old Clusters and Updated Analysis Techniques

    NASA Astrophysics Data System (ADS)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew

    2017-01-01

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ˜0.1 dex for GCs with metallicities as high as [Fe/H] = -0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na I, Mg I, Al I, Si I, Ca I, Ti I, Ti II, Sc II, V I, Cr I, Mn I, Co I, Ni I, Cu I, Y II, Zr I, Ba II, La II, Nd II, and Eu II. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe I, Ca I, Si I, Ni I, and Ba II. The elements that show the greatest differences include Mg I and Zr I. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.

  5. [Styles of interpersonal conflict in patients with panic disorder, alcoholism, rheumatoid arthritis and healthy controls: a cluster analysis study].

    PubMed

    Eher, R; Windhaber, J; Rau, H; Schmitt, M; Kellner, E

    2000-05-01

    Conflict and conflict resolution in intimate relationships are not only among the most important factors influencing relationship satisfaction but are also seen in association with clinical symptoms. Styles of conflict will be assessed in patients suffering from panic disorder with and without agoraphobia, in alcoholics and in patients suffering from rheumatoid arthritis. 176 patients and healthy controls filled out the Styles of Conflict Inventory and questionnaires concerning severity of clinical symptoms. A cluster analysis revealed 5 types of conflict management. Healthy controls showed predominantely assertive and constructive styles, patients with panic disorder showed high levels of cognitive and/or behavioral aggression. Alcoholics showed high levels of repressed aggression, and patients with rheumatoid arthritis often did not exhibit any aggression during conflict. 5 Clusters of conflict pattern have been identified by cluster analysis. Each patient group showed considerable different patterns of conflict management.

  6. Analysis of local bond-orientational order for liquid gallium at ambient pressure: Two types of cluster structures.

    PubMed

    Chen, Lin-Yuan; Tang, Ping-Han; Wu, Ten-Ming

    2016-07-14

    In terms of the local bond-orientational order (LBOO) parameters, a cluster approach to analyze local structures of simple liquids was developed. In this approach, a cluster is defined as a combination of neighboring seeds having at least nb local-orientational bonds and their nearest neighbors, and a cluster ensemble is a collection of clusters with a specified nb and number of seeds ns. This cluster analysis was applied to investigate the microscopic structures of liquid Ga at ambient pressure (AP). The liquid structures studied were generated through ab initio molecular dynamics simulations. By scrutinizing the static structure factors (SSFs) of cluster ensembles with different combinations of nb and ns, we found that liquid Ga at AP contained two types of cluster structures, one characterized by sixfold orientational symmetry and the other showing fourfold orientational symmetry. The SSFs of cluster structures with sixfold orientational symmetry were akin to the SSF of a hard-sphere fluid. On the contrary, the SSFs of cluster structures showing fourfold orientational symmetry behaved similarly as the anomalous SSF of liquid Ga at AP, which is well known for exhibiting a high-q shoulder. The local structures of a highly LBOO cluster whose SSF displayed a high-q shoulder were found to be more similar to the structure of β-Ga than those of other solid phases of Ga. More generally, the cluster structures showing fourfold orientational symmetry have an inclination to resemble more to β-Ga.

  7. Two worlds collide: Image analysis methods for quantifying structural variation in cluster molecular dynamics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less

  8. Two worlds collide: image analysis methods for quantifying structural variation in cluster molecular dynamics.

    PubMed

    Steenbergen, K G; Gaston, N

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.

  9. Molecular analysis of SCARECROW genes expressed in white lupin cluster roots

    PubMed Central

    Sbabou, Laila; Bucciarelli, Bruna; Miller, Susan; Liu, Junqi; Berhada, Fatiha; Filali-Maltouf, Abdelkarim; Allan, Deborah; Vance, Carroll

    2010-01-01

    The Scarecrow (SCR) transcription factor plays a crucial role in root cell radial patterning and is required for maintenance of the quiescent centre and differentiation of the endodermis. In response to phosphorus (P) deficiency, white lupin (Lupinus albus L.) root surface area increases some 50-fold to 70-fold due to the development of cluster (proteoid) roots. Previously it was reported that SCR-like expressed sequence tags (ESTs) were expressed during early cluster root development. Here the cloning of two white lupin SCR genes, LaSCR1 and LaSCR2, is reported. The predicted amino acid sequences of both LaSCR gene products are highly similar to AtSCR and contain C-terminal conserved GRAS family domains. LaSCR1 and LaSCR2 transcript accumulation localized to the endodermis of both normal and cluster roots as shown by in situ hybridization and gene promoter::reporter staining. Transcript analysis as evaluated by quantitative real-time-PCR (qRT-PCR) and RNA gel hybridization indicated that the two LaSCR genes are expressed predominantly in roots. Expression of LaSCR genes was not directly responsive to the P status of the plant but was a function of cluster root development. Suppression of LaSCR1 in transformed roots of lupin and Medicago via RNAi (RNA interference) delivered through Agrobacterium rhizogenes resulted in decreased root numbers, reflecting the potential role of LaSCR1 in maintaining root growth in these species. The results suggest that the functional orthologues of AtSCR have been characterized. PMID:20167612

  10. Focused maternity care in Ghana: results of a cluster analysis.

    PubMed

    Ayanore, Martin Amogre; Pavlova, Milena; Groot, Wim

    2016-08-17

    Ghana missed out in attaining Millennium Development Goal 5 in 2015. The provision of adequate prenatal and postnatal care remains problematic, with poor evidence on women's views on met and unmet maternity care needs across all regions in Ghana. This paper examines maternal care utilization in Ghana by applying WHO indicators for focused maternal care utilization. Two-step cluster analysis segregated women into groups based on the components of the maternity care used. Using cluster membership variables as dependent variables, we applied multinomial and binary regression to examine associations of care use with individual, household and regional characteristics. We identified three patterns of care use: adequate, less and least adquate care. The presence of a female and skilled provider is an indicator of adequate care. Women in Volta, Upper West, Northern and Western regions received less adequate care compared with other regions. Supply-related factors (drugs availability, distance/transport, health insurance ownership, rural residence) were associated with adequacy of care. The lack of female autonomy, widowed/divorced women, age and parity were associated with less adequate care. Care patterns were distinctively associated with the quality of health care support (skilled and female attendant) instead of with the number of visits made to the facility. Across regions and within rural settings, disparities exist, often compounded by supply-related factors. Efforts to address skilled workforce shortages, greater accountability for quality and equity, improving women motivation for care seeking and active participation are important for maternity care in Ghana.

  11. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    PubMed

    Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

    2015-01-01

    To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  12. Clustering of Risk Factors for Non-Communicable Diseases among Adolescents from Southern Brazil.

    PubMed

    Nunes, Heloyse Elaine Gimenes; Gonçalves, Eliane Cristina de Andrade; Vieira, Jéssika Aparecida Jesus; Silva, Diego Augusto Santos

    2016-01-01

    The aim of this study was to investigate the simultaneous presence of risk factors for non-communicable diseases and the association of these risk factors with demographic and economic factors among adolescents from southern Brazil. The study included 916 students (14-19 years old) enrolled in the 2014 school year at state schools in São José, Santa Catarina, Brazil. Risk factors related to lifestyle (i.e., physical inactivity, excessive alcohol consumption, smoking, sedentary behaviour and unhealthy diet), demographic variables (sex, age and skin colour) and economic variables (school shift and economic level) were assessed through a questionnaire. Simultaneous behaviours were assessed by the ratio between observed and expected prevalences of risk factors for non-communicable diseases. The clustering of risk factors was analysed by multinomial logistic regression. The clusters of risk factors that showed a higher prevalence were analysed by binary logistic regression. The clustering of two, three, four, and five risk factors were found in 22.2%, 49.3%, 21.7% and 3.1% of adolescents, respectively. Subgroups that were more likely to have both behaviours of physical inactivity and unhealthy diet simultaneously were mostly composed of girls (OR = 3.03, 95% CI = 1.57-5.85) and those with lower socioeconomic status (OR = 1.83, 95% CI = 1.05-3.21); simultaneous physical inactivity, excessive alcohol consumption, sedentary behaviour and unhealthy diet were mainly observed among older adolescents (OR = 1.49, 95% CI = 1.05-2.12). Subgroups less likely to have both behaviours of sedentary behaviour and unhealthy diet were mostly composed of girls (OR = 0.58, 95% CI = 0.38-0.89); simultaneous physical inactivity, sedentary behaviour and unhealthy diet were mainly observed among older individuals (OR = 0.66, 95% CI = 0.49-0.87) and those of the night shift (OR = 0.59, 95% CI = 0.43-0.82). Adolescents had a high prevalence of simultaneous risk factors for NCDs. Demographic

  13. Using factor analysis to identify neuromuscular synergies during treadmill walking

    NASA Technical Reports Server (NTRS)

    Merkle, L. A.; Layne, C. S.; Bloomberg, J. J.; Zhang, J. J.

    1998-01-01

    Neuroscientists are often interested in grouping variables to facilitate understanding of a particular phenomenon. Factor analysis is a powerful statistical technique that groups variables into conceptually meaningful clusters, but remains underutilized by neuroscience researchers presumably due to its complicated concepts and procedures. This paper illustrates an application of factor analysis to identify coordinated patterns of whole-body muscle activation during treadmill walking. Ten male subjects walked on a treadmill (6.4 km/h) for 20 s during which surface electromyographic (EMG) activity was obtained from the left side sternocleidomastoid, neck extensors, erector spinae, and right side biceps femoris, rectus femoris, tibialis anterior, and medial gastrocnemius. Factor analysis revealed 65% of the variance of seven muscles sampled aligned with two orthogonal factors, labeled 'transition control' and 'loading'. These two factors describe coordinated patterns of muscular activity across body segments that would not be evident by evaluating individual muscle patterns. The results show that factor analysis can be effectively used to explore relationships among muscle patterns across all body segments to increase understanding of the complex coordination necessary for smooth and efficient locomotion. We encourage neuroscientists to consider using factor analysis to identify coordinated patterns of neuromuscular activation that would be obscured using more traditional EMG analyses.

  14. Proteomics to predict the response to tumour necrosis factor-α inhibitors in rheumatoid arthritis using a supervised cluster-analysis based protein score.

    PubMed

    Cuppen, Bvj; Fritsch-Stork, Rde; Eekhout, I; de Jager, W; Marijnissen, A C; Bijlsma, Jwj; Custers, M; van Laar, J M; Lafeber, Fpjg; Welsing, Pmj

    2018-01-01

    In rheumatoid arthritis (RA), it is of major importance to identify non-responders to tumour necrosis factor-α inhibitors (TNFi) before starting treatment, to prevent a delay in effective treatment. We developed a protein score for the response to TNFi treatment in RA and investigated its predictive value. In RA patients eligible for biological treatment included in the BiOCURA registry, 53 inflammatory proteins were measured using xMAP® technology. A supervised cluster analysis method, partial least squares (PLS), was used to select the best combination of proteins. Using logistic regression, a predictive model containing readily available clinical parameters was developed and the potential of this model with and without the protein score to predict European League Against Rheumatism (EULAR) response was assessed using the area under the receiving operating characteristics curve (AUC-ROC) and the net reclassification index (NRI). For the development step (n = 65 patient), PLS revealed 12 important proteins: CCL3 (macrophage inflammatory protein, MIP1a), CCL17 (thymus and activation-regulated chemokine), CCL19 (MIP3b), CCL22 (macrophage-derived chemokine), interleukin-4 (IL-4), IL-6, IL-7, IL-15, soluble cluster of differentiation 14 (sCD14), sCD74 (macrophage migration inhibitory factor), soluble IL-1 receptor I, and soluble tumour necrosis factor receptor II. The protein score scarcely improved the AUC-ROC (0.72 to 0.77) and the ability to improve classification and reclassification (NRI = 0.05). In validation (n = 185), the model including protein score did not improve the AUC-ROC (0.71 to 0.67) or the reclassification (NRI = -0.11). No proteomic predictors were identified that were more suitable than clinical parameters in distinguishing TNFi non-responders from responders before the start of treatment. As the results of previous studies and this study are disparate, we currently have no proteomic predictors for the response to TNFi.

  15. Autoantibodies in pediatric systemic lupus erythematosus: ethnic grouping, cluster analysis, and clinical correlations.

    PubMed

    Jurencák, Roman; Fritzler, Marvin; Tyrrell, Pascal; Hiraki, Linda; Benseler, Susanne; Silverman, Earl

    2009-02-01

    (1) To evaluate the spectrum of serum autoantibodies in pediatric-onset systemic lupus erythematosus (pSLE) with a focus on ethnic differences; (2) using cluster analysis, to identify patients with similar autoantibody patterns and to determine their clinical associations. A single-center cohort study of all patients with newly diagnosed pSLE seen over an 8-year period was performed. Ethnicity, clinical, and serological data were prospectively collected from 156/169 patients (92%). The frequencies of 10 selected autoantibodies among ethnic groups were compared. Cluster analysis identified groups of patients with similar autoantibody profiles. Associations of these groups with clinical and laboratory features of pSLE were examined. Among our 5 ethnic groups, there were differences only in the prevalence of anti-U1RNP and anti-Sm antibodies, which occurred more frequently in non-Caucasian patients (p < 0.0001, p < 0.01, respectively). Cluster analysis revealed 3 autoantibody clusters. Cluster 1 consisted of anti-dsDNA antibodies. Cluster 2 consisted of anti-dsDNA, antichromatin, antiribosomal P, anti-U1RNP, anti-Sm, anti-Ro and anti-La autoantibody. Cluster 3 consisted of anti-dsDNA, anti-RNP, and anti-Sm autoantibody. The highest proportion of Caucasians was in cluster 1 (p < 0.05), which was characterized by a mild disease with infrequent major organ involvement compared to cluster 2, which had the highest frequency of nephritis, renal failure, serositis, and hemolytic anemia, or cluster 3, which was characterized by frequent neuropsychiatric disease and nephritis. We observed ethnic differences in autoantibody profiles in pSLE. Autoantibodies tended to cluster together and these clusters were associated with different clinical courses.

  16. Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma.

    PubMed

    Schatz, Michael; Hsu, Jin-Wen Y; Zeiger, Robert S; Chen, Wansu; Dorenbaum, Alejandro; Chipps, Bradley E; Haselkorn, Tmirah

    2014-06-01

    Asthma phenotyping can facilitate understanding of disease pathogenesis and potential targeted therapies. To further characterize the distinguishing features of phenotypic groups in difficult-to-treat asthma. Children ages 6-11 years (n = 518) and adolescents and adults ages ≥12 years (n = 3612) with severe or difficult-to-treat asthma from The Epidemiology and Natural History of Asthma: Outcomes and Treatment Regimens (TENOR) study were evaluated in this post hoc cluster analysis. Analyzed variables included sex, race, atopy, age of asthma onset, smoking (adolescents and adults), passive smoke exposure (children), obesity, and aspirin sensitivity. Cluster analysis used the hierarchical clustering algorithm with the Ward minimum variance method. The results were compared among clusters by χ(2) analysis; variables with significant (P < .05) differences among clusters were considered as distinguishing feature candidates. Associations among clusters and asthma-related health outcomes were assessed in multivariable analyses by adjusting for socioeconomic status, environmental exposures, and intensity of therapy. Five clusters were identified in each age stratum. Sex, atopic status, and nonwhite race were distinguishing variables in both strata; passive smoke exposure was distinguishing in children and aspirin sensitivity in adolescents and adults. Clusters were not related to outcomes in children, but 2 adult and adolescent clusters distinguished by nonwhite race and aspirin sensitivity manifested poorer quality of life (P < .0001), and the aspirin-sensitive cluster experienced more frequent asthma exacerbations (P < .0001). Distinct phenotypes appear to exist in patients with severe or difficult-to-treat asthma, which is related to outcomes in adolescents and adults but not in children. The study of the therapeutic implications of these phenotypes is warranted. Copyright © 2013 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights

  17. Patient clusters in acute, work-related back pain based on patterns of disability risk factors.

    PubMed

    Shaw, William S; Pransky, Glenn; Patterson, William; Linton, Steven J; Winters, Thomas

    2007-02-01

    To identify subgroups of patients with work-related back pain based on disability risk factors. Patients with work-related back pain (N = 528) completed a 16-item questionnaire of potential disability risk factors before their initial medical evaluation. Outcomes of pain, functional limitation, and work disability were assessed 1 and 3 months later. A K-Means cluster analysis of 5 disability risk factors (pain, depressed mood, fear avoidant beliefs, work inflexibility, and poor expectations for recovery) resulted in 4 sub-groups: low risk (n = 182); emotional distress (n = 103); severe pain/fear avoidant (n = 102); and concerns about job accommodation (n = 141). Pain and disability outcomes at follow-up were superior in the low-risk group and poorest in the severe pain/fear avoidant group. Patients with acute back pain can be discriminated into subgroups depending on whether disability is related to pain beliefs, emotional distress, or workplace concerns.

  18. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome.

    PubMed

    Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard

    2014-07-01

    Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were

  19. Calibrating the Planck cluster mass scale with cluster velocity dispersions

    NASA Astrophysics Data System (ADS)

    Amodeo, S.; Mei, S.; Stanford, S. A.; Bartlett, J. G.; Lawrence, C. L.; Chary, R. R.; Shim, H.; Marleau, F.; Stern, D.

    2017-12-01

    The potential of galaxy clusters as cosmological probes critically depends on the capability to obtain accurate estimates of their mass. This will be a key measurement for the next generation of cosmological surveys, such as Euclid. The discrepancy between the cosmological parameters determined from anisotropies in the cosmic microwave background and those derived from cluster abundance measurements from the Planck satellite calls for careful evaluation of systematic biases in cluster mass estimates. For this purpose, it is crucial to use independent techniques, like analysis of the thermal emission of the intracluster medium (ICM), observed either in the X-rays or through the Sunyaev-Zeldovich (SZ) effect, dynamics of member galaxies or gravitational lensing. We discuss possible bias in the Planck SZ mass proxy, which is based on X-ray observations. Using optical spectroscopy from the Gemini Multi-Object Spectrograph of 17 Planck-selected clusters, we present new estimates of the cluster mass based on the velocity dispersion of the member galaxies and independently of the ICM properties. We show how the difference between the velocity dispersion of galaxy and dark matter particles in simulations is the primary factor limiting interpretation of dynamical cluster mass measurements at this time, and we give the first observational constraints on the velocity bias.

  20. Cluster Analysis of the Luria-Nebraska Neuropsychological Battery with Learning Disabled Adults.

    ERIC Educational Resources Information Center

    McCue, Michael; And Others

    The study reports a cluster analysis of Luria-Nebraska Neuropsychological Battery sources of 25 learning disabled adults. The cluster analysis suggested the presence of three subgroups within this sample, one having high elevations on the Rhythm, Writing, Reading, and Arithmetic Rhythm scales, the second having an extremely high evelation on the…

  1. Multi-viewpoint clustering analysis

    NASA Technical Reports Server (NTRS)

    Mehrotra, Mala; Wild, Chris

    1993-01-01

    In this paper, we address the feasibility of partitioning rule-based systems into a number of meaningful units to enhance the comprehensibility, maintainability and reliability of expert systems software. Preliminary results have shown that no single structuring principle or abstraction hierarchy is sufficient to understand complex knowledge bases. We therefore propose the Multi View Point - Clustering Analysis (MVP-CA) methodology to provide multiple views of the same expert system. We present the results of using this approach to partition a deployed knowledge-based system that navigates the Space Shuttle's entry. We also discuss the impact of this approach on verification and validation of knowledge-based systems.

  2. Cluster Analysis of Vulnerable Groups in Acute Traumatic Brain Injury Rehabilitation.

    PubMed

    Kucukboyaci, N Erkut; Long, Coralynn; Smith, Michelle; Rath, Joseph F; Bushnik, Tamara

    2018-01-06

    To analyze the complex relation between various social indicators that contribute to socioeconomic status and health care barriers. Cluster analysis of historical patient data obtained from inpatient visits. Inpatient rehabilitation unit in a large urban university hospital. Adult patients (N=148) receiving acute inpatient care, predominantly for closed head injury. Not applicable. We examined the membership of patients with traumatic brain injury in various "vulnerable group" clusters (eg, homeless, unemployed, racial/ethnic minority) and characterized the rehabilitation outcomes of patients (eg, duration of stay, changes in FIM scores between admission to inpatient stay and discharge). The cluster analysis revealed 4 major clusters (ie, clusters A-D) separated by vulnerable group memberships, with distinct durations of stay and FIM gains during their stay. Cluster B, the largest cluster and also consisting of mostly racial/ethnic minorities, had the shortest duration of hospital stay and one of the lowest FIM improvements among the 4 clusters despite higher FIM scores at admission. In cluster C, also consisting of mostly ethnic minorities with multiple socioeconomic status vulnerabilities, patients were characterized by low cognitive FIM scores at admission and the longest duration of stay, and they showed good improvement in FIM scores. Application of clustering techniques to inpatient data identified distinct clusters of patients who may experience differences in their rehabilitation outcome due to their membership in various "at-risk" groups. The results identified patients (ie, cluster B, with minority patients; and cluster D, with elderly patients) who attain below-average gains in brain injury rehabilitation. The results also suggested that systemic (eg, duration of stay) or clinical service improvements (eg, staff's language skills, ability to offer substance abuse therapy, provide appropriate referrals, liaise with intensive social work services, or plan

  3. Clustering analysis for muon tomography data elaboration in the Muon Portal project

    NASA Astrophysics Data System (ADS)

    Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.

    2015-05-01

    Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.

  4. A novel symptom cluster analysis among ambulatory HIV/AIDS patients in Uganda.

    PubMed

    Namisango, Eve; Harding, Richard; Katabira, Elly T; Siegert, Richard J; Powell, Richard A; Atuhaire, Leonard; Moens, Katrien; Taylor, Steve

    2015-01-01

    Symptom clusters are gaining importance given HIV/AIDS patients experience multiple, concurrent symptoms. This study aimed to: determine clusters of patients with similar symptom combinations; describe symptom combinations distinguishing the clusters; and evaluate the clusters regarding patient socio-demographic, disease and treatment characteristics, quality of life (QOL) and functional performance. This was a cross-sectional study of 302 adult HIV/AIDS outpatients consecutively recruited at two teaching and referral hospitals in Uganda. Socio-demographic and seven-day period symptom prevalence and distress data were self-reported using the Memorial Symptom Assessment Schedule. QOL was assessed using the Medical Outcome Scale and functional performance using the Karnofsky Performance Scale. Symptom clusters were established using hierarchical cluster analysis with squared Euclidean distances using Ward's clustering methods based on symptom occurrence. Analysis of variance compared clusters on mean QOL and functional performance scores. Patient subgroups were categorised based on symptom occurrence rates. Five symptom occurrence clusters were identified: Cluster 1 (n=107), high-low for sensory discomfort and eating difficulties symptoms; Cluster 2 (n=47), high-low for psycho-gastrointestinal symptoms; Cluster 3 (n=71), high for pain and sensory disturbance symptoms; Cluster 4 (n=35), all high for general HIV/AIDS symptoms; and Cluster 5 (n=48), all low for mood-cognitive symptoms. The all high occurrence cluster was associated with worst functional status, poorest QOL scores and highest symptom-associated distress. Use of antiretroviral therapy was associated with all high symptom occurrence rate (Fisher's exact=4, P<0.001). CD4 count group below 200 was associated with the all high occurrence rate symptom cluster (Fisher's exact=41, P<0.001). Symptom clusters have a differential, affect HIV/AIDS patients' self-reported outcomes, with the subgroup experiencing high

  5. [Prognostic differences of phenotypes in pT1-2N0 invasive breast cancer: a large cohort study with cluster analysis].

    PubMed

    Wang, Z; Wang, W H; Wang, S L; Jin, J; Song, Y W; Liu, Y P; Ren, H; Fang, H; Tang, Y; Chen, B; Qi, S N; Lu, N N; Li, N; Tang, Y; Liu, X F; Yu, Z H; Li, Y X

    2016-06-23

    To find phenotypic subgroups of patients with pT1-2N0 invasive breast cancer by means of cluster analysis and estimate the prognosis and clinicopathological features of these subgroups. From 1999 to 2013, 4979 patients with pT1-2N0 invasive breast cancer were recruited for hierarchical clustering analysis. Age (≤40, 41-70, 70+ years), size of primary tumor, pathological type, grade of differentiation, microvascular invasion, estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER-2) were chosen as distance metric between patients. Hierarchical cluster analysis was performed using Ward's method. Cophenetic correlation coefficient (CPCC) and Spearman correlation coefficient were used to validate clustering structures. The CPCC was 0.603. The Spearman correlation coefficient was 0.617 (P<0.001), which indicated a good fit of hierarchy to the data. A twelve-cluster model seemed to best illustrate our patient cohort. Patients in cluster 5, 9 and 12 had best prognosis and were characterized by age >40 years, smaller primary tumor, lower histologic grade, positive ER and PR status, and mainly negative HER-2. Patients in the cluster 1 and 11 had the worst prognosis, The cluster 1 was characterized by a larger tumor, higher grade and negative ER and PR status, while the cluster 11 was characterized by positive microvascular invasion. Patients in other 7 clusters had a moderate prognosis, and patients in each cluster had distinctive clinicopathological features and recurrent patterns. This study identified distinctive clinicopathologic phenotypes in a large cohort of patients with pT1-2N0 breast cancer through hierarchical clustering and revealed different prognosis. This integrative model may help physicians to make more personalized decisions regarding adjuvant therapy.

  6. Cultural, social and intrapersonal factors associated with clusters of co-occurring health-related behaviours among adolescents.

    PubMed

    Klein Velderman, Mariska; Dusseldorp, Elise; van Nieuwenhuijzen, Maroesjka; Junger, Marianne; Paulussen, Theo G W M; Reijneveld, Sijmen A

    2015-02-01

    Adverse health-related behaviours (HRBs) have been shown to co-occur in adolescents. Evidence lacks on factors associated with these co-occurring HRBs. The Theory of Triadic Influence (TTI) offers a route to categorize these determinants according to type (social, cultural and intrapersonal) and distance in the causal pathway (ultimate or distal). Our aims were to identify cultural, social and intrapersonal factors associated with co-occurring HRBs and to assess the relative importance of ultimate and distal factors for each cluster of co-occurring HRBs. Respondents concerned a random sample of 898 adolescents aged 12-18 years, stratified by age, sex and educational level of head of household. Data were collected via face-to-face computer-assisted interviewing and internet questionnaires. Analyses were performed for young (12-15 years) and late (16-18 years) adolescents regarding two and three clusters of HRB, respectively. For each cluster of HRBs (e.g. smoking, delinquency), associated factors were found. These accounted for 27 to 57% of the total variance per cluster. Factors came in particular from the intrapersonal stream of the TTI at the ultimate level and the social stream at the distal level. Associations were strongest for parenting practices, risk behaviours of friends and parents and self-control. Results of this study confirm that it is possible to identify a selection of cultural, social and intrapersonal factors associated with co-occurring HRBs among adolescents. © The Author 2014. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.

  7. Co-variations and Clustering of Chronic Disease Behavioral Risk Factors in China: China Chronic Disease and Risk Factor Surveillance, 2007

    PubMed Central

    Li, Yichong; Zhang, Mei; Jiang, Yong; Wu, Fan

    2012-01-01

    Background Chronic diseases have become the leading causes of mortality in China and related behavioral risk factors (BRFs) changed dramatically in past decades. We aimed to examine the prevalence, co-variations, clustering and the independent correlates of five BRFs at the national level. Methodology/Principal Findings We used data from the 2007 China Chronic Disease and Risk Factor Surveillance, in which multistage clustering sampling was adopted to collect a nationally representative sample of 49,247 Chinese aged 15 to 69 years. We estimated the prevalence and clustering (mean number of BRFs) of five BRFs: tobacco use, excessive alcohol drinking, insufficient intake of vegetable and fruit, physical inactivity, and overweight or obesity. We conducted binary logistic regression models to examine the co-variations among five BRFs with adjustment of demographic and socioeconomic factors, chronic conditions and other BRFs. Ordinal logistic regression was constructed to investigate the independent associations between each covariate and the clustering of BRFs within individuals. Overall, 57.0% of Chinese population had at least two BRFs and the mean number of BRFs is 1.80 (95% confidence interval: 1.78–1.83). Eight of the ten pairs of bivariate associations between the five BRFs were found statistically significant. Chinese with older age, being a male, living in rural areas, having lower education level and lower yearly household income experienced increased likelihood of having more BRFs. Conclusions/Significance Current BRFs place the majority of Chinese aged 15 to 69 years at risk for the future development of chronic disease, which calls for urgent public health programs to reduce these risk factors. Prominent correlations between BRFs imply that a combined package of interventions targeting multiple BRFs might be appropriate. These interventions should target elder population, men, and rural residents, especially those with lower SES. PMID:22439010

  8. Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.

    PubMed

    Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan

    2017-01-01

    Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.

  9. SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.

    PubMed

    Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A

    2018-01-01

    Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.

  10. The Use of Cluster Analysis in Typological Research on Community College Students

    ERIC Educational Resources Information Center

    Bahr, Peter Riley; Bielby, Rob; House, Emily

    2011-01-01

    One useful and increasingly popular method of classifying students is known commonly as cluster analysis. The variety of techniques that comprise the cluster analytic family are intended to sort observations (for example, students) within a data set into subsets (clusters) that share similar characteristics and differ in meaningful ways from other…

  11. Cluster folding analysis of 20Ne+16O elastic transfer

    NASA Astrophysics Data System (ADS)

    Hamada, Sh.; Keeley, N.; Kemper, K. W.; Rusek, K.

    2018-05-01

    The available experimental data for the 20Ne+16O system in the energy range where the effect of α -cluster transfer is well observed are reanalyzed using the cluster folding model. The cluster folding potential, which includes both real and imaginary terms, reproduces the data at forward angles and the inclusion of the 16O(20Ne,16O)20Ne elastic transfer process provides a satisfactory description of the backward angles. The spectroscopic factor for the 20Ne→16O+α overlap was extracted and compared with other values from the literature. The present results suggest that the (20Ne,16O ) reaction might be an alternative means of exploring the α -particle structure of nuclei.

  12. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

    PubMed Central

    Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

    2015-01-01

    Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888

  13. Variable number of tandem repeats and pulsed-field gel electrophoresis cluster analysis of enterohemorrhagic Escherichia coli serovar O157 strains.

    PubMed

    Yokoyama, Eiji; Uchimura, Masako

    2007-11-01

    Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.

  14. Clustering of cardiovascular risk factors in a middle-income country: a call for urgency.

    PubMed

    Selvarajah, Sharmini; Haniff, Jamaiyah; Kaur, Gurpreet; Hiong, Tee Guat; Cheong, Kee Chee; Lim, Chiao Mei; Bots, Michiel L

    2013-04-01

    This study aimed to estimate the prevalence of cardiovascular risk factors and its clustering. The findings are to help shape the Malaysian future healthcare planning for cardiovascular disease prevention and management. Data from a nationally representative cross-sectional survey was used. The survey was conducted via a face-to-face interview using a standardised questionnaire. A total of 37,906 eligible participants aged 18 years and older was identified, of whom 34,505 (91%) participated. Focus was on hypertension, hyperglycaemia (diabetes and impaired fasting glucose), hypercholesterolaemia and central obesity. Overall, 63% (95% confidence limits 62, 65%) of the participants had at least one cardiovascular risk factor, 33% (32, 35%) had two or more and 14% (12, 15%) had three risk factors or more. The prevalence of hypertension, hyperglycaemia, hypercholesterolaemia and central obesity were 38%, 15%, 24% and 37%, respectively. Women were more likely to have a higher number of cardiovascular risk factors for most age groups; adjusted odds ratios ranging from 1.1 (0.91, 1.32) to 1.26 (1.12, 1.43) for the presence of one risk factor and 1.07 (0.91, 1.32) to 2.00 (1.78, 2.25) for two or more risk factors. Cardiovascular risk-factor clustering provides a clear impression of the true burden of cardiovascular disease risk in the population. Women displayed higher prevalence and a younger age shift in clustering was seen. These findings signal the presence of a cardiovascular epidemic in an upcoming middle-income country and provide evidence that drastic measures have to be taken to safeguard the health of the nation.

  15. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    PubMed

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  16. [Raman spectroscopy fluorescence background correction and its application in clustering analysis of medicines].

    PubMed

    Chen, Shan; Li, Xiao-ning; Liang, Yi-zeng; Zhang, Zhi-min; Liu, Zhao-xia; Zhang, Qi-ming; Ding, Li-xia; Ye, Fei

    2010-08-01

    During Raman spectroscopy analysis, the organic molecules and contaminations will obscure or swamp Raman signals. The present study starts from Raman spectra of prednisone acetate tablets and glibenclamide tables, which are acquired from the BWTek i-Raman spectrometer. The background is corrected by R package baselineWavelet. Then principle component analysis and random forests are used to perform clustering analysis. Through analyzing the Raman spectra of two medicines, the accurate and validity of this background-correction algorithm is checked and the influences of fluorescence background on Raman spectra clustering analysis is discussed. Thus, it is concluded that it is important to correct fluorescence background for further analysis, and an effective background correction solution is provided for clustering or other analysis.

  17. Equivalent damage validation by variable cluster analysis

    NASA Astrophysics Data System (ADS)

    Drago, Carlo; Ferlito, Rachele; Zucconi, Maria

    2016-06-01

    The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.

  18. Cohort study on clustering of lifestyle risk factors and understanding its association with stress on health and wellbeing among school teachers in Malaysia (CLUSTer) – a study protocol

    PubMed Central

    2014-01-01

    Background The study on Clustering of Lifestyle risk factors and Understanding its association with Stress on health and wellbeing among school Teachers in Malaysia (CLUSTer) is a prospective cohort study which aims to extensively study teachers in Malaysia with respect to clustering of lifestyle risk factors and stress, and subsequently, to follow-up the population for important health outcomes. Method/design This study is being conducted in six states within Peninsular Malaysia. From each state, schools from each district are randomly selected and invited to participate in the study. Once the schools agree to participate, all teachers who fulfilled the inclusion criteria are invited to participate. Data collection includes a questionnaire survey and health assessment. Information collected in the questionnaire includes socio-demographic characteristics, participants’ medical history and family history of chronic diseases, teaching characteristics and burden, questions on smoking, alcohol consumption and physical activities (IPAQ); a food frequency questionnaire, the job content questionnaire (JCQ); depression, anxiety and stress scale (DASS21); health related quality of life (SF12-V2); Voice Handicap Index 10 on voice disorder, questions on chronic pain, sleep duration and obstetric history for female participants. Following blood drawn for predefined clinical tests, additional blood and urine specimens are collected and stored for future analysis. Active follow up of exposure and health outcomes will be carried out every two years via telephone or face to face contact. Data collection started in March 2013 and as of the end of March 2014 has been completed for four states: Kuala Lumpur, Selangor, Melaka and Penang. Approximately 6580 participants have been recruited. The first round of data collection and blood sampling is expected to be completed by the end of 2014 with an expected 10,000 participants recruited. Discussion Our study will provide a good basis

  19. [Principal component analysis and cluster analysis of inorganic elements in sea cucumber Apostichopus japonicus].

    PubMed

    Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie

    2011-11-01

    The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.

  20. Searching for a Gulf War syndrome using cluster analysis.

    PubMed

    Everitt, B; Ismail, K; David, A S; Wessely, S

    2002-11-01

    Gulf veterans report medically unexplained symptoms more frequently than non-Gulf veterans did. We examined whether Gulf and non-Gulf veterans could be distinguished by their patterns of symptom reporting. A k-means cluster analysis was applied to 500 randomly sampled veterans from each of three United Kingdom military cohorts of veterans; those deployed to the Gulf conflict between 1990 and 1991; to the Bosnia peacekeeping mission between 1992 and 1997; and military personnel who were in active service but not deployed to the Gulf (Era). Sociodemographic, health variables and scores for ten symptom groups were calculated. The gap statistic indicated the five-group solution as one that provided a particularly informative description of the structure in the data. Cluster 1 consisted of low scores for all symptom groups. Cluster 2 had veterans with highest symptom scores for musculoskeletal symptoms and high scores for psychiatric symptoms. Cluster 3 had high scores for psychiatric symptoms and marginally elevated scores for the remaining nine groups symptom groups. Cluster 4 had elevated scores for musculoskeletal symptoms only and cluster 5 was distinguishable from the other clusters in having high scores in all symptom groups, especially psychiatric and musculoskeletal. The findings do not support the existence of a unique syndrome affecting a subgroup of Gulf veterans but emphasize the excess of non-specific self-reported ill health in this group.

  1. Subspace K-means clustering.

    PubMed

    Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

    2013-12-01

    To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).

  2. Clustering of health-related behaviors among early and mid-adolescents in Tuscany: results from a representative cross-sectional study.

    PubMed

    Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto

    2018-03-01

    A huge amount of literature suggests that adolescents' health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students' data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: 'Alcohol drinking', 'Smoking', 'Physical activity', 'Screen time', 'Signs & symptoms', 'Healthy eating', 'Violence' and 'Sweet tooth'. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany.

  3. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity

  4. Analysis of RXTE data on Clusters of Galaxies

    NASA Technical Reports Server (NTRS)

    Petrosian, Vahe

    2004-01-01

    This grant provided support for the reduction, analysis and interpretation of of hard X-ray (HXR, for short) observations of the cluster of galaxies RXJO658--5557 scheduled for the week of August 23, 2002 under the RXTE Cycle 7 program (PI Vahe Petrosian, Obs. ID 70165). The goal of the observation was to search for and characterize the shape of the HXR component beyond the well established thermal soft X-ray (SXR) component. Such hard components have been detected in several nearby clusters. distant cluster would provide information on the characteristics of this radiation at a different epoch in the evolution of the imiverse and shed light on its origin. We (Petrosian, 2001) have argued that thermal bremsstrahlung, as proposed earlier, cannot be the mechanism for the production of the HXRs and that the most likely mechanism is Compton upscattering of the cosmic microwave radiation by relativistic electrons which are known to be present in the clusters and be responsible for the observed radio emission. Based on this picture we estimated that this cluster, in spite of its relatively large distance, will have HXR signal comparable to the other nearby ones. The planned observation of a relatively The proposed RXTE observations were carried out and the data have been analyzed. We detect a hard X-ray tail in the spectrum of this cluster with a flux very nearly equal to our predicted value. This has strengthen the case for the Compton scattering model. We intend the data obtained via this observation to be a part of a larger data set. We have identified other clusters of galaxies (in archival RXTE and other instrument data sets) with sufficiently high quality data where we can search for and measure (or at least put meaningful limits) on the strength of the hard component. With these studies we expect to clarify the mechanism for acceleration of particles in the intercluster medium and provide guidance for future observations of this intriguing phenomenon by instrument

  5. The human RHOX gene cluster: target genes and functional analysis of gene variants in infertile men.

    PubMed

    Borgmann, Jennifer; Tüttelmann, Frank; Dworniczak, Bernd; Röpke, Albrecht; Song, Hye-Won; Kliesch, Sabine; Wilkinson, Miles F; Laurentino, Sandra; Gromoll, Jörg

    2016-11-15

    The X-linked reproductive homeobox (RHOX) gene cluster encodes transcription factors preferentially expressed in reproductive tissues. This gene cluster has important roles in male fertility based on phenotypic defects of Rhox-mutant mice and the finding that aberrant RHOX promoter methylation is strongly associated with abnormal human sperm parameters. However, little is known about the molecular mechanism of RHOX function in humans. Using gene expression profiling, we identified genes regulated by members of the human RHOX gene cluster. Some genes were uniquely regulated by RHOXF1 or RHOXF2/2B, while others were regulated by both of these transcription factors. Several of these regulated genes encode proteins involved in processes relevant to spermatogenesis; e.g. stress protection and cell survival. One of the target genes of RHOXF2/2B is RHOXF1, suggesting cross-regulation to enhance transcriptional responses. The potential role of RHOX in human infertility was addressed by sequencing all RHOX exons in a group of 250 patients with severe oligozoospermia. This revealed two mutations in RHOXF1 (c.515G > A and c.522C > T) and four in RHOXF2/2B (-73C > G, c.202G > A, c.411C > T and c.679G > A), of which only one (c.202G > A) was found in a control group of men with normal sperm concentration. Functional analysis demonstrated that c.202G > A and c.679G > A significantly impaired the ability of RHOXF2/2B to regulate downstream genes. Molecular modelling suggested that these mutations alter RHOXF2/F2B protein conformation. By combining clinical data with in vitro functional analysis, we demonstrate how the X-linked RHOX gene cluster may function in normal human spermatogenesis and we provide evidence that it is impaired in human male fertility.

  6. Adherence Determinants in Cystic Fibrosis: Cluster Analysis of Parental Psychosocial, Religious, and/or Spiritual Factors.

    PubMed

    Grossoehme, Daniel H; Szczesniak, Rhonda D; Britton, LaCrecia L; Siracusa, Christopher M; Quittner, Alexandra L; Chini, Barbara A; Dimitriou, Sophia M; Seid, Michael

    2015-06-01

    Cystic fibrosis is a progressive disease requiring a complex, time-consuming treatment regimen. Nonadherence may contribute to an acceleration of the disease process. Spirituality influences some parental healthcare behaviors and medical decision-making. We hypothesized that parents of children with cystic fibrosis, when classified into groups based on adherence rates, would share certain psychosocial and religious and/or spiritual variables distinguishing them from other adherence groups. We conducted a multisite, prospective, observational study focused on parents of children younger than 13 years old at two cystic fibrosis center sites (Site 1, n= 83; Site 2, n = 59). Religious and/or spiritual constructs, depression, and marital adjustment were measured by using previously validated questionnaires. Determinants of adherence included parental attitude toward treatment, perceived behavioral norms, motivation, and self-efficacy. Adherence patterns were measured with the Daily Phone Diary, a validated instrument used to collect adherence data. Cluster analysis identified discrete adherence patterns, including parents' completion of more treatments than prescribed. For airway clearance therapy, four adherence groups were identified: median adherence rates of 23%, 52%, 77%, and 120%. These four groups differed significantly for parental depression, sanctification of their child's body, and self-efficacy. Three adherence groups were identified for nebulized medications: median adherence rates of 35%, 82%, and 130%. These three groups differed significantly for sanctification of their child's body and self-efficacy. Our results indicated that parents in each group shared psychosocial and religious and/or spiritual factors that differentiated them. Therefore, conversations about adherence likely should be tailored to baseline adherence patterns. Development of efficacious religious and/or spiritual interventions that promote adherence by caregivers of children with

  7. Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis.

    PubMed

    Gorzalczany, Marian B; Rudzinski, Filip

    2017-06-07

    This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain--during learning--to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network--working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)--to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.

  8. A formal concept analysis approach to consensus clustering of multi-experiment expression data

    PubMed Central

    2014-01-01

    Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological

  9. NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.

    PubMed

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques

    2008-07-01

    The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.

  10. Clustering eating habits: frequent consumption of different dietary patterns among the Italian general population in the association with obesity, physical activity, sociocultural characteristics and psychological factors.

    PubMed

    Denoth, Francesca; Scalese, Marco; Siciliano, Valeria; Di Renzo, Laura; De Lorenzo, Antonino; Molinaro, Sabrina

    2016-06-01

    (a) To identify clusters of eating patterns among the Italian population aged 15-64 years, focusing on typical Mediterranean diet (Med-diet) items consumption; (b) to examine the distribution of eating habits, as identified clusters, among age classes and genders; (c) evaluate the impact of: belonging to a specific eating cluster, level of physical activity (PA), sociocultural and psychological factors, as elements determining weight abnormalities. Data for this cross-sectional study were collected using self-reporting questionnaires administered to a sample of 33,127 subjects participating in the Italian population survey on alcohol and other drugs (IPSAD(®)2011). The cluster analysis was performed on a subsample (n = 5278 subjects) which provided information on eating habits, and adapted to identify categories of eating patterns. Stepwise multinomial regression analysis was performed to evaluate the associations between weight categories and eating clusters, adjusted for the following background variables: PA levels, sociocultural and psychological factors. Three clusters were identified: "Mediterranean-like", "Western-like" and "low fruit/vegetables". Frequent consumption of Med-diet patterns was more common among females and elderly. The relationship between overweight/obesity and male gender, educational level, PA, depression and eating disorders (p < 0.05) was confirmed. Belonging to a cluster other than "Mediterranean-like" was significantly associated with obesity. The low consumption of Med-diet patterns among youth, and the frequent association of sociocultural, psychological issues and inappropriate lifestyle with overweight/obesity, highlight the need for an interdisciplinary approach including market policies, to promote a wider awareness of the Mediterranean eating habit benefits in combination with an appropriate lifestyle.

  11. Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

    PubMed

    Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

    2017-01-01

    Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.

  12. Minor digestive symptoms and their impact in the general population: a cluster analysis approach.

    PubMed

    L'Heureux-Bouron, Diane; Legrain-Raspaud, Sophie; Carruthers, Helen R; Whorwell, P J

    2018-01-01

    The classification and treatment of patients who do not meet the criteria for a functional gastrointestinal (GI) disorder has not been well established. This study aimed to record the prevalence of minor digestive symptoms (MDSs) in the general population attempting to divide them into symptom clusters as well as trying to assess their impact and the way sufferers cope with them. Following face-to-face interviews, a web-based, self-administered questionnaire was designed to capture a range of GI sensations using 34 questions and 12 images depicting abdominal symptoms. A randomly selected sample of 1515 women and 409 men representing the general population in France was studied. Cluster analysis was used to identify groups of respondents with naturally co-occurring symptoms. Data were also collected on other factors such as exacerbating and relieving strategies. MDSs were reported at least every 2 months in 66.5% of women and 47.7% of men. A total of 11 symptom clusters were identified: constipation-like, flatulence, abdominal pressure, abdominal swelling, acid reflux, diarrhoea-like, intestinal heaviness, intestinal pain, gurgling, burning and gastric pain. Despite being minor, these problems had a major impact on vitality and self-image as well as emotional, social and physical well-being. Respondents considered lifestyle, food and disordered function as the main factors responsible for MDSs. Physical measures and dietary modification were the most frequent strategies adopted to obtain relief. MDSs are common and improved methods of recognition are needed so that better management strategies can be developed for individuals with these symptoms. The definition of symptom clusters may offer one way of achieving this goal.

  13. Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.

    PubMed

    Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T

    2010-01-01

    Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new

  14. Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

    NASA Astrophysics Data System (ADS)

    Yen, Chi-Fu; Sivasankar, Sanjeevi

    2018-03-01

    Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.

  15. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections

    PubMed Central

    Jaeger, Sébastien; Thieffry, Denis

    2017-01-01

    Abstract Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. PMID:28591841

  16. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    PubMed

    Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001); being on ART (highest proportions for clusters two and three, p=0.012); global distress (F=26.8, p<0.001), physical distress (F=36.3, p<0.001) and psychological distress subscale (F=21.8, p<0.001) (all subscales worst for cluster one, best for cluster four). The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to

  17. Distinct Phenotypes of Cigarette Smokers Identified by Cluster Analysis of Patients with Severe Asthma.

    PubMed

    Konno, Satoshi; Taniguchi, Natsuko; Makita, Hironi; Nakamaru, Yuji; Shimizu, Kaoruko; Shijubo, Noriharu; Fuke, Satoshi; Takeyabu, Kimihiro; Oguri, Mitsuru; Kimura, Hirokazu; Maeda, Yukiko; Suzuki, Masaru; Nagai, Katsura; Ito, Yoichi M; Wenzel, Sally E; Nishimura, Masaharu

    2015-12-01

    Smoking may have multifactorial effects on asthma phenotypes, particularly in severe asthma. Cluster analysis has been applied to explore novel phenotypes, which are not based on any a priori hypotheses. To explore novel severe asthma phenotypes by cluster analysis when including cigarette smokers. We recruited a total of 127 subjects with severe asthma, including 59 current or ex-smokers, from our university hospital and its 29 affiliated hospitals/pulmonary clinics. Twelve clinical variables obtained during a 2-day hospital stay were used for cluster analysis. After clustering using clinical variables, the sputum levels of 14 molecules were measured to biologically characterize the clinical clusters. Five clinical clusters were identified, including two characterized by high pack-year exposure to cigarette smoking and low FEV1/FVC. There were marked differences between the two clusters of cigarette smokers. One had high levels of circulating eosinophils, high IgE levels, and a high sinus disease score. The other was characterized by low levels of the same parameters. Sputum analysis revealed increased levels of IL-5 in the former cluster and increased levels of IL-6 and osteopontin in the latter. The other three clusters were similar to those previously reported: young onset/atopic, nonsmoker/less eosinophilic, and female/obese. Key clinical variables were confirmed to be stable and consistent 1 year later. This study reveals two distinct phenotypes of severe asthma in current and former cigarette smokers with potentially different biological pathways contributing to fixed airflow limitation. Clinical trial registered with www.umin.ac.jp (000003254).

  18. The dynamics of cyclone clustering in re-analysis and a high-resolution climate model

    NASA Astrophysics Data System (ADS)

    Priestley, Matthew; Pinto, Joaquim; Dacre, Helen; Shaffrey, Len

    2017-04-01

    Extratropical cyclones have a tendency to occur in groups (clusters) in the exit of the North Atlantic storm track during wintertime, potentially leading to widespread socioeconomic impacts. The Winter of 2013/14 was the stormiest on record for the UK and was characterised by the recurrent clustering of intense extratropical cyclones. This clustering was associated with a strong, straight and persistent North Atlantic 250 hPa jet with Rossby wave-breaking (RWB) on both flanks, pinning the jet in place. Here, we provide for the first time an analysis of all clustered events in 36 years of the ERA-Interim Re-analysis at three latitudes (45˚ N, 55˚ N, 65˚ N) encompassing various regions of Western Europe. The relationship between the occurrence of RWB and cyclone clustering is studied in detail. Clustering at 55˚ N is associated with an extended and anomalously strong jet flanked on both sides by RWB. However, clustering at 65(45)˚ N is associated with RWB to the south (north) of the jet, deflecting the jet northwards (southwards). A positive correlation was found between the intensity of the clustering and RWB occurrence to the north and south of the jet. However, there is considerable spread in these relationships. Finally, analysis has shown that the relationships identified in the re-analysis are also present in a high-resolution coupled global climate model (HiGEM). In particular, clustering is associated with the same dynamical conditions at each of our three latitudes in spite of the identified biases in frequency and intensity of RWB.

  19. Sun Protection Belief Clusters: Analysis of Amazon Mechanical Turk Data.

    PubMed

    Santiago-Rivas, Marimer; Schnur, Julie B; Jandorf, Lina

    2016-12-01

    This study aimed (i) to determine whether people could be differentiated on the basis of their sun protection belief profiles and individual characteristics and (ii) explore the use of a crowdsourcing web service for the assessment of sun protection beliefs. A sample of 500 adults completed an online survey of sun protection belief items using Amazon Mechanical Turk. A two-phased cluster analysis (i.e., hierarchical and non-hierarchical K-means) was utilized to determine clusters of sun protection barriers and facilitators. Results yielded three distinct clusters of sun protection barriers and three distinct clusters of sun protection facilitators. Significant associations between gender, age, sun sensitivity, and cluster membership were identified. Results also showed an association between barrier and facilitator cluster membership. The results of this study provided a potential alternative approach to developing future sun protection promotion initiatives in the population. Findings add to our knowledge regarding individuals who support, oppose, or are ambivalent toward sun protection and inform intervention research by identifying distinct subtypes that may best benefit from (or have a higher need for) skin cancer prevention efforts.

  20. Clustering of Risk Factors for Non-Communicable Diseases among Adolescents from Southern Brazil

    PubMed Central

    2016-01-01

    Introduction The aim of this study was to investigate the simultaneous presence of risk factors for non-communicable diseases and the association of these risk factors with demographic and economic factors among adolescents from southern Brazil. Methods The study included 916 students (14–19 years old) enrolled in the 2014 school year at state schools in São José, Santa Catarina, Brazil. Risk factors related to lifestyle (i.e., physical inactivity, excessive alcohol consumption, smoking, sedentary behaviour and unhealthy diet), demographic variables (sex, age and skin colour) and economic variables (school shift and economic level) were assessed through a questionnaire. Simultaneous behaviours were assessed by the ratio between observed and expected prevalences of risk factors for non-communicable diseases. The clustering of risk factors was analysed by multinomial logistic regression. The clusters of risk factors that showed a higher prevalence were analysed by binary logistic regression. Results The clustering of two, three, four, and five risk factors were found in 22.2%, 49.3%, 21.7% and 3.1% of adolescents, respectively. Subgroups that were more likely to have both behaviours of physical inactivity and unhealthy diet simultaneously were mostly composed of girls (OR = 3.03, 95% CI = 1.57–5.85) and those with lower socioeconomic status (OR = 1.83, 95% CI = 1.05–3.21); simultaneous physical inactivity, excessive alcohol consumption, sedentary behaviour and unhealthy diet were mainly observed among older adolescents (OR = 1.49, 95% CI = 1.05–2.12). Subgroups less likely to have both behaviours of sedentary behaviour and unhealthy diet were mostly composed of girls (OR = 0.58, 95% CI = 0.38–0.89); simultaneous physical inactivity, sedentary behaviour and unhealthy diet were mainly observed among older individuals (OR = 0.66, 95% CI = 0.49–0.87) and those of the night shift (OR = 0.59, 95% CI = 0.43–0.82). Conclusion Adolescents had a high prevalence

  1. Gathering Real World Evidence with Cluster Analysis for Clinical Decision Support.

    PubMed

    Xia, Eryu; Liu, Haifeng; Li, Jing; Mei, Jing; Li, Xuejun; Xu, Enliang; Li, Xiang; Hu, Gang; Xie, Guotong; Xu, Meilin

    2017-01-01

    Clinical decision support systems are information technology systems that assist clinical decision-making tasks, which have been shown to enhance clinical performance. Cluster analysis, which groups similar patients together, aims to separate patient cases into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. Useful as it is, the application of cluster analysis in clinical decision support systems is less reported. Here, we describe the usage of cluster analysis in clinical decision support systems, by first dividing patient cases into similar groups and then providing diagnosis or treatment suggestions based on the group profiles. This integration provides data for clinical decisions and compiles a wide range of clinical practices to inform the performance of individual clinicians. We also include an example usage of the system under the scenario of blood lipid management in type 2 diabetes. These efforts represent a step toward promoting patient-centered care and enabling precision medicine.

  2. GLOBULAR CLUSTER ABUNDANCES FROM HIGH-RESOLUTION, INTEGRATED-LIGHT SPECTROSCOPY. II. EXPANDING THE METALLICITY RANGE FOR OLD CLUSTERS AND UPDATED ANALYSIS TECHNIQUES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew

    2017-01-10

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clustersmore » may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.« less

  3. In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites

    PubMed Central

    2016-01-01

    Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons. PMID:27698666

  4. In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites.

    PubMed

    Irizarry, Kristopher J L; Bryden, Randall L

    2016-01-01

    Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus . Our results provide insight into pigment phenotypes in pythons.

  5. Chaotic map clustering algorithm for EEG analysis

    NASA Astrophysics Data System (ADS)

    Bellotti, R.; De Carlo, F.; Stramaglia, S.

    2004-03-01

    The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.

  6. Instability of Hierarchical Cluster Analysis Due to Input Order of the Data: The PermuCLUSTER Solution

    ERIC Educational Resources Information Center

    van der Kloot, Willem A.; Spaans, Alexander M. J.; Heiser, Willem J.

    2005-01-01

    Hierarchical agglomerative cluster analysis (HACA) may yield different solutions under permutations of the input order of the data. This instability is caused by ties, either in the initial proximity matrix or arising during agglomeration. The authors recommend to repeat the analysis on a large number of random permutations of the rows and columns…

  7. Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

    PubMed

    Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

    2014-07-01

    Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

  8. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    NASA Astrophysics Data System (ADS)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  9. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    PubMed Central

    Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta

    2014-01-01

    Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499

  10. Investigating the effects of climate variations on bacillary dysentery incidence in northeast China using ridge regression and hierarchical cluster analysis

    PubMed Central

    Huang, Desheng; Guan, Peng; Guo, Junqiao; Wang, Ping; Zhou, Baosen

    2008-01-01

    Background The effects of climate variations on bacillary dysentery incidence have gained more recent concern. However, the multi-collinearity among meteorological factors affects the accuracy of correlation with bacillary dysentery incidence. Methods As a remedy, a modified method to combine ridge regression and hierarchical cluster analysis was proposed for investigating the effects of climate variations on bacillary dysentery incidence in northeast China. Results All weather indicators, temperatures, precipitation, evaporation and relative humidity have shown positive correlation with the monthly incidence of bacillary dysentery, while air pressure had a negative correlation with the incidence. Ridge regression and hierarchical cluster analysis showed that during 1987–1996, relative humidity, temperatures and air pressure affected the transmission of the bacillary dysentery. During this period, all meteorological factors were divided into three categories. Relative humidity and precipitation belonged to one class, temperature indexes and evaporation belonged to another class, and air pressure was the third class. Conclusion Meteorological factors have affected the transmission of bacillary dysentery in northeast China. Bacillary dysentery prevention and control would benefit from by giving more consideration to local climate variations. PMID:18816415

  11. Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

    PubMed

    Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana

    2016-08-31

    Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed

  12. A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

    PubMed

    Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2015-01-01

    Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.

  13. Case-control geographic clustering for residential histories accounting for risk factors and covariates.

    PubMed

    Jacquez, Geoffrey M; Meliker, Jaymie R; Avruskin, Gillian A; Goovaerts, Pierre; Kaufmann, Andy; Wilson, Mark L; Nriagu, Jerome

    2006-08-03

    Methods for analyzing space-time variation in risk in case-control studies typically ignore residential mobility. We develop an approach for analyzing case-control data for mobile individuals and apply it to study bladder cancer in 11 counties in southeastern Michigan. At this time data collection is incomplete and no inferences should be drawn - we analyze these data to demonstrate the novel methods. Global, local and focused clustering of residential histories for 219 cases and 437 controls is quantified using time-dependent nearest neighbor relationships. Business address histories for 268 industries that release known or suspected bladder cancer carcinogens are analyzed. A logistic model accounting for smoking, gender, age, race and education specifies the probability of being a case, and is incorporated into the cluster randomization procedures. Sensitivity of clustering to definition of the proximity metric is assessed for 1 to 75 k nearest neighbors. Global clustering is partly explained by the covariates but remains statistically significant at 12 of the 14 levels of k considered. After accounting for the covariates 26 Local clusters are found in Lapeer, Ingham, Oakland and Jackson counties, with the clusters in Ingham and Oakland counties appearing in 1950 and persisting to the present. Statistically significant focused clusters are found about the business address histories of 22 industries located in Oakland (19 clusters), Ingham (2) and Jackson (1) counties. Clusters in central and southeastern Oakland County appear in the 1930's and persist to the present day. These methods provide a systematic approach for evaluating a series of increasingly realistic alternative hypotheses regarding the sources of excess risk. So long as selection of cases and controls is population-based and not geographically biased, these tools can provide insights into geographic risk factors that were not specifically assessed in the case-control study design.

  14. Case-control geographic clustering for residential histories accounting for risk factors and covariates

    PubMed Central

    2006-01-01

    Background Methods for analyzing space-time variation in risk in case-control studies typically ignore residential mobility. We develop an approach for analyzing case-control data for mobile individuals and apply it to study bladder cancer in 11 counties in southeastern Michigan. At this time data collection is incomplete and no inferences should be drawn – we analyze these data to demonstrate the novel methods. Global, local and focused clustering of residential histories for 219 cases and 437 controls is quantified using time-dependent nearest neighbor relationships. Business address histories for 268 industries that release known or suspected bladder cancer carcinogens are analyzed. A logistic model accounting for smoking, gender, age, race and education specifies the probability of being a case, and is incorporated into the cluster randomization procedures. Sensitivity of clustering to definition of the proximity metric is assessed for 1 to 75 k nearest neighbors. Results Global clustering is partly explained by the covariates but remains statistically significant at 12 of the 14 levels of k considered. After accounting for the covariates 26 Local clusters are found in Lapeer, Ingham, Oakland and Jackson counties, with the clusters in Ingham and Oakland counties appearing in 1950 and persisting to the present. Statistically significant focused clusters are found about the business address histories of 22 industries located in Oakland (19 clusters), Ingham (2) and Jackson (1) counties. Clusters in central and southeastern Oakland County appear in the 1930's and persist to the present day. Conclusion These methods provide a systematic approach for evaluating a series of increasingly realistic alternative hypotheses regarding the sources of excess risk. So long as selection of cases and controls is population-based and not geographically biased, these tools can provide insights into geographic risk factors that were not specifically assessed in the case

  15. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    ERIC Educational Resources Information Center

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  16. Using Cluster Analysis to Compartmentalize a Large Managed Wetland Based on Physical, Biological, and Climatic Geospatial Attributes.

    PubMed

    Hahus, Ian; Migliaccio, Kati; Douglas-Mankin, Kyle; Klarenberg, Geraldine; Muñoz-Carpena, Rafael

    2018-04-27

    Hierarchical and partitional cluster analyses were used to compartmentalize Water Conservation Area 1, a managed wetland within the Arthur R. Marshall Loxahatchee National Wildlife Refuge in southeast Florida, USA, based on physical, biological, and climatic geospatial attributes. Single, complete, average, and Ward's linkages were tested during the hierarchical cluster analyses, with average linkage providing the best results. In general, the partitional method, partitioning around medoids, found clusters that were more evenly sized and more spatially aggregated than those resulting from the hierarchical analyses. However, hierarchical analysis appeared to be better suited to identify outlier regions that were significantly different from other areas. The clusters identified by geospatial attributes were similar to clusters developed for the interior marsh in a separate study using water quality attributes, suggesting that similar factors have influenced variations in both the set of physical, biological, and climatic attributes selected in this study and water quality parameters. However, geospatial data allowed further subdivision of several interior marsh clusters identified from the water quality data, potentially indicating zones with important differences in function. Identification of these zones can be useful to managers and modelers by informing the distribution of monitoring equipment and personnel as well as delineating regions that may respond similarly to future changes in management or climate.

  17. A cluster analysis investigation of workaholism as a syndrome.

    PubMed

    Aziz, Shahnaz; Zickar, Michael J

    2006-01-01

    Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.

  18. Sejong Open Cluster Survey (SOS). 0. Target Selection and Data Analysis

    NASA Astrophysics Data System (ADS)

    Sung, Hwankyung; Lim, Beomdu; Bessell, Michael S.; Kim, Jinyoung S.; Hur, Hyeonoh; Chun, Moo-Young; Park, Byeong-Gon

    2013-06-01

    Star clusters are superb astrophysical laboratories containing cospatial and coeval samples of stars with similar chemical composition. We initiate the Sejong Open cluster Survey (SOS) - a project dedicated to providing homogeneous photometry of a large number of open clusters in the SAAO Johnson-Cousins' UBVI system. To achieve our main goal, we pay much attention to the observation of standard stars in order to reproduce the SAAO standard system. Many of our targets are relatively small sparse clusters that escaped previous observations. As clusters are considered building blocks of the Galactic disk, their physical properties such as the initial mass function, the pattern of mass segregation, etc. give valuable information on the formation and evolution of the Galactic disk. The spatial distribution of young open clusters will be used to revise the local spiral arm structure of the Galaxy. In addition, the homogeneous data can also be used to test stellar evolutionary theory, especially concerning rare massive stars. In this paper we present the target selection criteria, the observational strategy for accurate photometry, and the adopted calibrations for data analysis such as color-color relations, zero-age main sequence relations, Sp - M_V relations, Sp - T_{eff} relations, Sp - color relations, and T_{eff} - BC relations. Finally we provide some data analysis such as the determination of the reddening law, the membership selection criteria, and distance determination.

  19. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    PubMed

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  20. Text grouping in patent analysis using adaptive K-means clustering algorithm

    NASA Astrophysics Data System (ADS)

    Shanie, Tiara; Suprijadi, Jadi; Zulhanif

    2017-03-01

    Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.

  1. Clustering of health-related behaviors among early and mid-adolescents in Tuscany: results from a representative cross-sectional study

    PubMed Central

    Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto

    2018-01-01

    Abstract Background A huge amount of literature suggests that adolescents’ health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. Methods The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students’ data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Results Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: ‘Alcohol drinking’, ‘Smoking’, ‘Physical activity’, ‘Screen time’, ‘Signs & symptoms’, ‘Healthy eating’, ‘Violence’ and ‘Sweet tooth’. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Conclusions Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany. PMID:27908972

  2. Visual verification and analysis of cluster detection for molecular dynamics.

    PubMed

    Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

    2007-01-01

    A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.

  3. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    PubMed

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  4. Deconstructing Bipolar Disorder and Schizophrenia: A cross-diagnostic cluster analysis of cognitive phenotypes.

    PubMed

    Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F

    2017-02-01

    Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. The X-ray luminosity functions of Abell clusters from the Einstein Cluster Survey

    NASA Technical Reports Server (NTRS)

    Burg, R.; Giacconi, R.; Forman, W.; Jones, C.

    1994-01-01

    We have derived the present epoch X-ray luminosity function of northern Abell clusters using luminosities from the Einstein Cluster Survey. The sample is sufficiently large that we can determine the luminosity function for each richness class separately with sufficient precision to study and compare the different luminosity functions. We find that, within each richness class, the range of X-ray luminosity is quite large and spans nearly a factor of 25. Characterizing the luminosity function for each richness class with a Schechter function, we find that the characteristic X-ray luminosity, L(sub *), scales with richness class as (L(sub *) varies as N(sub*)(exp gamma), where N(sub *) is the corrected, mean number of galaxies in a richness class, and the best-fitting exponent is gamma = 1.3 +/- 0.4. Finally, our analysis suggests that there is a lower limit to the X-ray luminosity of clusters which is determined by the integrated emission of the cluster member galaxies, and this also scales with richness class. The present sample forms a baseline for testing cosmological evolution of Abell-like clusters when an appropriate high-redshift cluster sample becomes available.

  6. Effects of cluster location and cluster distribution on performance on the traveling salesman problem.

    PubMed

    MacGregor, James N

    2015-10-01

    Research on human performance in solving traveling salesman problems typically uses point sets as stimuli, and most models have proposed a processing stage at which stimulus dots are clustered. However, few empirical studies have investigated the effects of clustering on performance. In one recent study, researchers compared the effects of clustered, random, and regular stimuli, and concluded that clustering facilitates performance (Dry, Preiss, & Wagemans, 2012). Another study suggested that these results may have been influenced by the location rather than the degree of clustering (MacGregor, 2013). Two experiments are reported that mark an attempt to disentangle these factors. The first experiment tested several combinations of degree of clustering and cluster location, and revealed mixed evidence that clustering influences performance. In a second experiment, both factors were varied independently, showing that they interact. The results are discussed in terms of the importance of clustering effects, in particular, and perceptual factors, in general, during performance of the traveling salesman problem.

  7. How do components of evidence-based psychological treatment cluster in practice? A survey and cluster analysis.

    PubMed

    Gifford, Elizabeth V; Tavakoli, Sara; Weingardt, Kenneth R; Finney, John W; Pierson, Heather M; Rosen, Craig S; Hagedorn, Hildi J; Cook, Joan M; Curran, Geoff M

    2012-01-01

    Evidence-based psychological treatments (EBPTs) are clusters of interventions, but it is unclear how providers actually implement these clusters in practice. A disaggregated measure of EBPTs was developed to characterize clinicians' component-level evidence-based practices and to examine relationships among these practices. Survey items captured components of evidence-based treatments based on treatment integrity measures. The Web-based survey was conducted with 75 U.S. Department of Veterans Affairs (VA) substance use disorder (SUD) practitioners and 149 non-VA community-based SUD practitioners. Clinician's self-designated treatment orientations were positively related to their endorsement of those EBPT components; however, clinicians used components from a variety of EBPTs. Hierarchical cluster analysis indicated that clinicians combined and organized interventions from cognitive-behavioral therapy, the community reinforcement approach, motivational interviewing, structured family and couples therapy, 12-step facilitation, and contingency management into clusters including empathy and support, treatment engagement and activation, abstinence initiation, and recovery maintenance. Understanding how clinicians use EBPT components may lead to improved evidence-based practice dissemination and implementation. Published by Elsevier Inc.

  8. Socioeconomic risk factors for cholera in different transmission settings: An analysis of the data of a cluster randomized trial in Bangladesh.

    PubMed

    Saha, Amit; Hayen, Andrew; Ali, Mohammad; Rosewell, Alexander; Clemens, John D; Raina MacIntyre, C; Qadri, Firdausi

    2017-09-05

    Cholera remains a threat globally, and socioeconomic factors play an important role in transmission of the disease. We assessed socioeconomic risk factors for cholera in vaccinated and non-vaccinated communities to understand whether the socioeconomic risk factors differ by transmission patterns for cholera. We used data from a cluster randomized control trial conducted in Dhaka, Bangladesh. There were 90 geographic clusters; 30 in each of the three arms of the study: vaccine (VAC), vaccine plus behavioural change (VBC), and non-intervention. The data were analysed for the three populations: (1) vaccinees in the vaccinated communities (VAC and VBC arms), (2) non-vaccinated individuals in the vaccinated communities and (3) all individuals in the non-vaccinated communities (non-intervention arm). A generalized estimating equation with logit link function was used to evaluate the risk factors for cholera among these different populations adjusting for household level correlation in the data. A total of 528 cholera and 226 cholera with severe dehydration (CSD) in 268,896 persons were observed during the two-year follow-up. For population 1, the cholera risk was not associated with any socioeconomic factors; however CSD was less likely to occur among individuals living in a household having ≤4 members (aOR=0.55, 95% CI=0.32-0.96). Among population 2, younger participants and individuals reporting diarrhoea during registration were more likely to have cholera. Females and individuals reporting diarrhoea during registration were at increased risk of CSD. Among population 3, individuals living in a household without a concrete floor, in an area with high population density, closer to the study hospital, or not treating drinking water were at significantly higher risk for both cholera and CSD. The profile of socioeconomic factors associated with cholera varies by individuals' vaccination status as well as the transmission setting. In a vaccinated community where

  9. Descriptive Statistics and Cluster Analysis for Extreme Rainfall in Java Island

    NASA Astrophysics Data System (ADS)

    E Komalasari, K.; Pawitan, H.; Faqih, A.

    2017-03-01

    This study aims to describe regional pattern of extreme rainfall based on maximum daily rainfall for period 1983 to 2012 in Java Island. Descriptive statistics analysis was performed to obtain centralization, variation and distribution of maximum precipitation data. Mean and median are utilized to measure central tendency data while Inter Quartile Range (IQR) and standard deviation are utilized to measure variation of data. In addition, skewness and kurtosis used to obtain shape the distribution of rainfall data. Cluster analysis using squared euclidean distance and ward method is applied to perform regional grouping. Result of this study show that mean (average) of maximum daily rainfall in Java Region during period 1983-2012 is around 80-181mm with median between 75-160mm and standard deviation between 17 to 82. Cluster analysis produces four clusters and show that western area of Java tent to have a higher annual maxima of daily rainfall than northern area, and have more variety of annual maximum value.

  10. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

    PubMed

    Castro-Mondragon, Jaime Abraham; Jaeger, Sébastien; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2017-07-27

    Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Predicting healthcare outcomes in prematurely born infants using cluster analysis.

    PubMed

    MacBean, Victoria; Lunt, Alan; Drysdale, Simon B; Yarzi, Muska N; Rafferty, Gerrard F; Greenough, Anne

    2018-05-23

    Prematurely born infants are at high risk of respiratory morbidity following neonatal unit discharge, though prediction of outcomes is challenging. We have tested the hypothesis that cluster analysis would identify discrete groups of prematurely born infants with differing respiratory outcomes during infancy. A total of 168 infants (median (IQR) gestational age 33 (31-34) weeks) were recruited in the neonatal period from consecutive births in a tertiary neonatal unit. The baseline characteristics of the infants were used to classify them into hierarchical agglomerative clusters. Rates of viral lower respiratory tract infections (LRTIs) were recorded for 151 infants in the first year after birth. Infants could be classified according to birth weight and duration of neonatal invasive mechanical ventilation (MV) into three clusters. Cluster one (MV ≤5 days) had few LRTIs. Clusters two and three (both MV ≥6 days, but BW ≥or <882 g respectively), had significantly higher LRTI rates. Cluster two had a higher proportion of infants experiencing respiratory syncytial virus LRTIs (P = 0.01) and cluster three a higher proportion of rhinovirus LRTIs (P < 0.001) CONCLUSIONS: Readily available clinical data allowed classification of prematurely born infants into one of three distinct groups with differing subsequent respiratory morbidity in infancy. © 2018 Wiley Periodicals, Inc.

  12. Seizure clustering.

    PubMed

    Haut, Sheryl R

    2006-02-01

    Seizure clusters, also known as repetitive or serial seizures, occur commonly in epilepsy. Clustering implies that the occurrence of one seizure may influence the probability of a subsequent seizure; thus, the investigation of the clustering phenomenon yields insights into both specific mechanisms of seizure clustering and more general concepts of seizure occurrence. Seizure clustering has been defined clinically as a number of seizures per unit time and, statistically, as a deviation from a random distribution, or interseizure interval dependence. This review explores the pathophysiology, epidemiology, and clinical implications of clustering, as well as other periodic patterns of seizure occurrence. Risk factors for experiencing clusters and potential precipitants of clustering are also addressed.

  13. Study on Adaptive Parameter Determination of Cluster Analysis in Urban Management Cases

    NASA Astrophysics Data System (ADS)

    Fu, J. Y.; Jing, C. F.; Du, M. Y.; Fu, Y. L.; Dai, P. P.

    2017-09-01

    The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object's highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data's spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  14. Cluster analysis of S. Cerevisiae nucleosome binding sites

    NASA Astrophysics Data System (ADS)

    Suvorova, Y.; Korotkov, E.

    2017-12-01

    It is well known that major part of a eukaryotic genome is wrapped around histone proteins forming nucleosomes. It was also demonstrated that the DNA sequence itself is playing an important role in the nucleosome positioning process. In this work, a cluster analysis of 67 517 nucleosome binding sites from the S. Cerevisiae genome was carried out. The classification method is based on the self-adjusting dinucleotides position weight matrix. As a result, 135 significant clusters were discovered that contain 43225 sequences (which constitutes 64% of the initial set). The meaning of the found classes is discussed, as well as the possibility of the further usage.

  15. Analyzing Protein Clusters on the Plasma Membrane: Application of Spatial Statistical Analysis Methods on Super-Resolution Microscopy Images.

    PubMed

    Paparelli, Laura; Corthout, Nikky; Pavie, Benjamin; Annaert, Wim; Munck, Sebastian

    2016-01-01

    The spatial distribution of proteins within the cell affects their capability to interact with other molecules and directly influences cellular processes and signaling. At the plasma membrane, multiple factors drive protein compartmentalization into specialized functional domains, leading to the formation of clusters in which intermolecule interactions are facilitated. Therefore, quantifying protein distributions is a necessity for understanding their regulation and function. The recent advent of super-resolution microscopy has opened up the possibility of imaging protein distributions at the nanometer scale. In parallel, new spatial analysis methods have been developed to quantify distribution patterns in super-resolution images. In this chapter, we provide an overview of super-resolution microscopy and summarize the factors influencing protein arrangements on the plasma membrane. Finally, we highlight methods for analyzing clusterization of plasma membrane proteins, including examples of their applications.

  16. HICOSMO - X-ray analysis of a complete sample of galaxy clusters

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T.

    2017-10-01

    Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

  17. Cluster analysis of Pinus taiwanensis for its ex situ conservation in China.

    PubMed

    Gao, X; Shi, L; Wu, Z

    2015-06-01

    Pinus taiwanensis Hayata is one of the most famous sights in the Huangshan Scenic Resort, China, because of its strong adaptability and ability to survive; however, this endemic species is currently under threat in China. Relationships between different P. taiwanensis populations have been well-documented; however, few studies have been conducted on how to protect this rare pine. In the present study, we propose the ex situ conservation of this species using geographical information system (GIS) cluster and genetic diversity analyses. The GIS cluster method was conducted as a preliminary analysis for establishing a sampling site category based on climatic factors. Genetic diversity was analyzed using morphological and genetic traits. By combining geographical information with genetic data, we demonstrate that growing conditions, morphological traits, and the genetic make-up of the population in the Huangshan Scenic Resort were most similar to conditions on Tianmu Mountain. Therefore, we suggest that Tianmu Mountain is the best choice for the ex situ conservation of P. taiwanensis. Our results provide a molecular basis for the sustainable management, utilization, and conservation of this species in Huangshan Scenic Resort.

  18. Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data.

    PubMed

    McParland, D; Phillips, C M; Brennan, L; Roche, H M; Gormley, I C

    2017-12-10

    The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes ('healthy' and 'at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. A hierarchical clustering scheme approach to assessment of IP-network traffic using detrended fluctuation analysis

    NASA Astrophysics Data System (ADS)

    Takuma, Takehisa; Masugi, Masao

    2009-03-01

    This paper presents an approach to the assessment of IP-network traffic in terms of the time variation of self-similarity. To get a comprehensive view in analyzing the degree of long-range dependence (LRD) of IP-network traffic, we use a hierarchical clustering scheme, which provides a way to classify high-dimensional data with a tree-like structure. Also, in the LRD-based analysis, we employ detrended fluctuation analysis (DFA), which is applicable to the analysis of long-range power-law correlations or LRD in non-stationary time-series signals. Based on sequential measurements of IP-network traffic at two locations, this paper derives corresponding values for the LRD-related parameter α that reflects the degree of LRD of measured data. In performing the hierarchical clustering scheme, we use three parameters: the α value, average throughput, and the proportion of network traffic that exceeds 80% of network bandwidth for each measured data set. We visually confirm that the traffic data can be classified in accordance with the network traffic properties, resulting in that the combined depiction of the LRD and other factors can give us an effective assessment of network conditions at different times.

  20. Risk factors associated with cluster size of Mycobacterium tuberculosis (Mtb) of different RFLP lineages in Brazil.

    PubMed

    Peres, Renata Lyrio; Vinhas, Solange Alves; Ribeiro, Fabíola Karla Correa; Palaci, Moisés; do Prado, Thiago Nascimento; Reis-Santos, Bárbara; Zandonade, Eliana; Suffys, Philip Noel; Golub, Jonathan E; Riley, Lee W; Maciel, Ethel Leonor

    2018-02-08

    Tuberculosis (TB) transmission is influenced by patient-related risk, environment and bacteriological factors. We determined the risk factors associated with cluster size of IS6110 RFLP based genotypes of Mycobacterium tuberculosis (Mtb) isolates from Vitoria, Espirito Santo, Brazil. Cross-sectional study of new TB cases identified in the metropolitan area of Vitoria, Brazil between 2000 and 2010. Mtb isolates were genotyped by the IS6110 RFLP, spoligotyping and RD Rio . The isolates were classified according to genotype cluster sizes by three genotyping methods and associated patient epidemiologic characteristics. Regression Model was performed to identify factors associated with cluster size. Among 959 Mtb isolates, 461 (48%) cases had an isolate that belonged to an RFLP cluster, and six clusters with ten or more isolates were identified. Of the isolates spoligotyped, 448 (52%) were classified as LAM and 412 (48%) as non-LAM. Our regression model found that 6-9 isolates/RFLP cluster were more likely belong to the LAM family, having the RD Rio genotype and to be smear-positive (adjusted OR = 1.17, 95% CI 1.08-1.26; adjusted OR = 1.25, 95% CI 1.14-1.37; crude OR = 2.68, 95% IC 1.13-6.34; respectively) and living in a Serra city neighborhood decrease the risk of being in the 6-9 isolates/RFLP cluster (adjusted OR = 0.29, 95% CI, 0.10-0.84), than in the others groups. Individuals aged 21 to 30, 31 to 40 and > 50 years were less likely of belonging the 2-5 isolates/RFLP cluster than unique patterns compared to individuals < 20 years of age (adjusted OR = 0.49, 95% CI 0.28-0.85, OR = 0.43 95% CI 0.24-0.77and OR = 0. 49, 95% CI 0.26-0.91), respectively. The extrapulmonary disease was less likely to occur in those infected with strains in the 2-5 isolates/cluster group (adjustment OR = 0.45, 95% CI 0.24-0.85) than unique patterns. We found that a large proportion of new TB infections in Vitoria is caused by prevalent Mtb genotypes

  1. Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

    PubMed

    Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

    2017-07-01

    The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.

  2. Arrangement of the Clostridium baratii F7 Toxin Gene Cluster with Identification of a σ Factor That Recognizes the Botulinum Toxin Gene Cluster Promoters

    DOE PAGES

    Dover, Nir; Barash, Jason R.; Burke, Julianne N.; ...

    2014-05-22

    Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bontmore » gene that is part of a toxin gene cluster that includes several accessory genes. In this paper, we sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ 70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. Finally, this TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.« less

  3. KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

    PubMed

    Laetsch, Dominik R; Blaxter, Mark L

    2017-10-05

    The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.

  4. Clustering of risk factors and the risk of incident cardiovascular disease in Asian and Caucasian populations: results from the Asia Pacific Cohort Studies Collaboration

    PubMed Central

    Peters, Sanne A E; Wang, Xin; Lam, Tai-Hing; Kim, Hyeon Chang; Ho, Suzanne; Ninomiya, Toshiharu; Knuiman, Matthew; Vaartjes, Ilonca; Bots, Michael L; Woodward, Mark

    2018-01-01

    Objective To assess the relationship between risk factor clusters and cardiovascular disease (CVD) incidence in Asian and Caucasian populations and to estimate the burden of CVD attributable to each cluster. Setting Asia Pacific Cohort Studies Collaboration. Participants Individual participant data from 34 population-based cohorts, involving 314 024 participants without a history of CVD at baseline. Outcome measures Clusters were 11 possible combinations of four individual risk factors (current smoking, overweight, blood pressure (BP) and total cholesterol). Cox regression models were used to obtain adjusted HRs and 95% CIs for CVD associated with individual risk factors and risk factor clusters. Population-attributable fractions (PAFs) were calculated. Results During a mean follow-up of 7 years, 6203 CVD events were recorded. The ranking of HRs and PAFs was similar for Australia and New Zealand (ANZ) and Asia; clusters including BP consistently showed the highest HRs and PAFs. The BP–smoking cluster had the highest HR for people with two risk factors: 4.13 (3.56 to 4.80) for Asia and 3.07 (2.23 to 4.23) for ANZ. Corresponding PAFs were 24% and 11%, respectively. For individuals with three risk factors, the BP–smoking–cholesterol cluster had the highest HR (4.67 (3.92 to 5.57) for Asia and 3.49 (2.69 to 4.53) for ANZ). Corresponding PAFs were 13% and 10%. Conclusions Risk factor clusters act similarly on CVD risk in Asian and Caucasian populations. Clusters including elevated BP were associated with the highest excess risk of CVD. PMID:29511013

  5. Bladder Carcinoma Data with Clinical Risk Factors and Molecular Markers: A Cluster Analysis

    PubMed Central

    Redondo-Gonzalez, Enrique; de Castro, Leandro Nunes; Moreno-Sierra, Jesús; Maestro de las Casas, María Luisa; Vera-Gonzalez, Vicente; Ferrari, Daniel Gomes; Corchado, Juan Manuel

    2015-01-01

    Bladder cancer occurs in the epithelial lining of the urinary bladder and is amongst the most common types of cancer in humans, killing thousands of people a year. This paper is based on the hypothesis that the use of clinical and histopathological data together with information about the concentration of various molecular markers in patients is useful for the prediction of outcomes and the design of treatments of nonmuscle invasive bladder carcinoma (NMIBC). A population of 45 patients with a new diagnosis of NMIBC was selected. Patients with benign prostatic hyperplasia (BPH), muscle invasive bladder carcinoma (MIBC), carcinoma in situ (CIS), and NMIBC recurrent tumors were not included due to their different clinical behavior. Clinical history was obtained by means of anamnesis and physical examination, and preoperative imaging and urine cytology were carried out for all patients. Then, patients underwent conventional transurethral resection (TURBT) and some proteomic analyses quantified the biomarkers (p53, neu, and EGFR). A postoperative follow-up was performed to detect relapse and progression. Clusterings were performed to find groups with clinical, molecular markers, histopathological prognostic factors, and statistics about recurrence, progression, and overall survival of patients with NMIBC. Four groups were found according to tumor sizes, risk of relapse or progression, and biological behavior. Outlier patients were also detected and categorized according to their clinical characters and biological behavior. PMID:25866762

  6. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2015-01-01

    Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745

  7. A Test for Cluster Bias: Detecting Violations of Measurement Invariance across Clusters in Multilevel Data

    ERIC Educational Resources Information Center

    Jak, Suzanne; Oort, Frans J.; Dolan, Conor V.

    2013-01-01

    We present a test for cluster bias, which can be used to detect violations of measurement invariance across clusters in 2-level data. We show how measurement invariance assumptions across clusters imply measurement invariance across levels in a 2-level factor model. Cluster bias is investigated by testing whether the within-level factor loadings…

  8. Near real-time space-time cluster analysis for detection of enteric disease outbreaks in a community setting.

    PubMed

    Glatman-Freedman, Aharona; Kaufman, Zalman; Kopel, Eran; Bassal, Ravit; Taran, Diana; Valinsky, Lea; Agmon, Vered; Shpriz, Manor; Cohen, Daniel; Anis, Emilia; Shohat, Tamy

    2016-08-01

    To enhance timely surveillance of bacterial enteric pathogens, space-time cluster analysis was introduced in Israel in May 2013. Stool isolation data of Salmonella, Shigella, and Campylobacter from patients of a large Health Maintenance Organization were analyzed weekly by ArcGIS and SaTScan, and cluster results were sent promptly to local departments of health (LDOHs). During eighteen months, we identified 52 Shigella sonnei clusters, two Salmonella clusters, and no Campylobacter clusters. S. sonnei clusters lasted from one to 33 days and included three to 30 individuals. Thirty-one (60%) of the S. sonnei clusters were known to LDOHs prior to cluster analysis. Clusters not previously known by the LDOHs prompted epidemiologic investigations. In 31 of the 37 (84%) confirmed clusters, educational institutes (nursery schools, kindergartens, and a primary school) were involved. Cluster analysis demonstrated capability to complement enteric disease surveillance. Scaling up the system can further enhance timely detection and control of outbreaks. Copyright © 2016 The British Infection Association. Published by Elsevier Ltd. All rights reserved.

  9. The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions

    NASA Astrophysics Data System (ADS)

    Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.

    2018-04-01

    The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.

  10. Dynamic multifactor clustering of financial networks

    NASA Astrophysics Data System (ADS)

    Ross, Gordon J.

    2014-02-01

    We investigate the tendency for financial instruments to form clusters when there are multiple factors influencing the correlation structure. Specifically, we consider a stock portfolio which contains companies from different industrial sectors, located in several different countries. Both sector membership and geography combine to create a complex clustering structure where companies seem to first be divided based on sector, with geographical subclusters emerging within each industrial sector. We argue that standard techniques for detecting overlapping clusters and communities are not able to capture this type of structure and show how robust regression techniques can instead be used to remove the influence of both sector and geography from the correlation matrix separately. Our analysis reveals that prior to the 2008 financial crisis, companies did not tend to form clusters based on geography. This changed immediately following the crisis, with geography becoming a more important determinant of clustering structure.

  11. Sensory over responsivity and obsessive compulsive symptoms: A cluster analysis.

    PubMed

    Ben-Sasson, Ayelet; Podoly, Tamar Yonit

    2017-02-01

    Several studies have examined the sensory component in Obsesseive Compulsive Disorder (OCD) and described an OCD subtype which has a unique profile, and that Sensory Phenomena (SP) is a significant component of this subtype. SP has some commonalities with Sensory Over Responsivity (SOR) and might be in part a characteristic of this subtype. Although there are some studies that have examined SOR and its relation to Obsessive Compulsive Symptoms (OCS), literature lacks sufficient data on this interplay. First to further examine the correlations between OCS and SOR, and to explore the correlations between SOR modalities (i.e. smell, touch, etc.) and OCS subscales (i.e. washing, ordering, etc.). Second, to investigate the cluster analysis of SOR and OCS dimensions in adults, that is, to classify the sample using the sensory scores to find whether a sensory OCD subtype can be specified. Our third goal was to explore the psychometric features of a new sensory questionnaire: the Sensory Perception Quotient (SPQ). A sample of non clinical adults (n=350) was recruited via e-mail, social media and social networks. Participants completed questionnaires for measuring SOR, OCS, and anxiety. SOR and OCI-F scores were moderately significantly correlated (n=274), significant correlations between all SOR modalities and OCS subscales were found with no specific higher correlation between one modality to one OCS subscale. Cluster analysis revealed four distinct clusters: (1) No OC and SOR symptoms (NONE; n=100), (2) High OC and SOR symptoms (BOTH; n=28), (3) Moderate OC symptoms (OCS; n=63), (4) Moderate SOR symptoms (SOR; n=83). The BOTH cluster had significantly higher anxiety levels than the other clusters, and shared OC subscales scores with the OCS cluster. The BOTH cluster also reported higher SOR scores across tactile, vision, taste and olfactory modalities. The SPQ was found reliable and suitable to detect SOR, the sample SPQ scores was normally distributed (n=350). SOR is a

  12. Development of innovative methods for risk assessment in high-rise construction based on clustering of risk factors

    NASA Astrophysics Data System (ADS)

    Okolelova, Ella; Shibaeva, Marina; Shalnev, Oleg

    2018-03-01

    The article analyses risks in high-rise construction in terms of investment value with account of the maximum probable loss in case of risk event. The authors scrutinized the risks of high-rise construction in regions with various geographic, climatic and socio-economic conditions that may influence the project environment. Risk classification is presented in general terms, that includes aggregated characteristics of risks being common for many regions. Cluster analysis tools, that allow considering generalized groups of risk depending on their qualitative and quantitative features, were used in order to model the influence of the risk factors on the implementation of investment project. For convenience of further calculations, each type of risk is assigned a separate code with the number of the cluster and the subtype of risk. This approach and the coding of risk factors makes it possible to build a risk matrix, which greatly facilitates the task of determining the degree of impact of risks. The authors clarified and expanded the concept of the price risk, which is defined as the expected value of the event, 105 which extends the capabilities of the model, allows estimating an interval of the probability of occurrence and also using other probabilistic methods of calculation.

  13. Microglia Morphological Categorization in a Rat Model of Neuroinflammation by Hierarchical Cluster and Principal Components Analysis

    PubMed Central

    Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.

    2017-01-01

    morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor. PMID:28848398

  14. Microglia Morphological Categorization in a Rat Model of Neuroinflammation by Hierarchical Cluster and Principal Components Analysis.

    PubMed

    Fernández-Arjona, María Del Mar; Grondona, Jesús M; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D

    2017-01-01

    morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor.

  15. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    PubMed

    Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A

    2015-01-01

    One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  16. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    PubMed

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-05

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (P<0.001), independent of age, height, weight, and running speed. When these two groups were compared to PFP runners, one cluster exhibited greater while the other exhibited reduced peak knee abduction angles (P<0.05). The variability observed in running patterns across this sample could be the result of different gait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    PubMed Central

    Liu, Jingxian; Wu, Kefeng

    2017-01-01

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

  18. Person mobility in the design and analysis of cluster-randomized cohort prevention trials.

    PubMed

    Vuchinich, Sam; Flay, Brian R; Aber, Lawrence; Bickman, Leonard

    2012-06-01

    Person mobility is an inescapable fact of life for most cluster-randomized (e.g., schools, hospitals, clinic, cities, state) cohort prevention trials. Mobility rates are an important substantive consideration in estimating the effects of an intervention. In cluster-randomized trials, mobility rates are often correlated with ethnicity, poverty and other variables associated with disparity. This raises the possibility that estimated intervention effects may generalize to only the least mobile segments of a population and, thus, create a threat to external validity. Such mobility can also create threats to the internal validity of conclusions from randomized trials. Researchers must decide how to deal with persons who leave study clusters during a trial (dropouts), persons and clusters that do not comply with an assigned intervention, and persons who enter clusters during a trial (late entrants), in addition to the persons who remain for the duration of a trial (stayers). Statistical techniques alone cannot solve the key issues of internal and external validity raised by the phenomenon of person mobility. This commentary presents a systematic, Campbellian-type analysis of person mobility in cluster-randomized cohort prevention trials. It describes four approaches for dealing with dropouts, late entrants and stayers with respect to data collection, analysis and generalizability. The questions at issue are: 1) From whom should data be collected at each wave of data collection? 2) Which cases should be included in the analyses of an intervention effect? and 3) To what populations can trial results be generalized? The conclusions lead to recommendations for the design and analysis of future cluster-randomized cohort prevention trials.

  19. A nonparametric clustering technique which estimates the number of clusters

    NASA Technical Reports Server (NTRS)

    Ramey, D. B.

    1983-01-01

    In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.

  20. FLOCK cluster analysis of mast cell event clustering by high-sensitivity flow cytometry predicts systemic mastocytosis.

    PubMed

    Dorfman, David M; LaPlante, Charlotte D; Pozdnyakova, Olga; Li, Betty

    2015-11-01

    In our high-sensitivity flow cytometric approach for systemic mastocytosis (SM), we identified mast cell event clustering as a new diagnostic criterion for the disease. To objectively characterize mast cell gated event distributions, we performed cluster analysis using FLOCK, a computational approach to identify cell subsets in multidimensional flow cytometry data in an unbiased, automated fashion. FLOCK identified discrete mast cell populations in most cases of SM (56/75 [75%]) but only a minority of non-SM cases (17/124 [14%]). FLOCK-identified mast cell populations accounted for 2.46% of total cells on average in SM cases and 0.09% of total cells on average in non-SM cases (P < .0001) and were predictive of SM, with a sensitivity of 75%, a specificity of 86%, a positive predictive value of 76%, and a negative predictive value of 85%. FLOCK analysis provides useful diagnostic information for evaluating patients with suspected SM, and may be useful for the analysis of other hematopoietic neoplasms. Copyright© by the American Society for Clinical Pathology.

  1. Cluster Analysis of International Information and Social Development.

    ERIC Educational Resources Information Center

    Lau, Jesus

    1990-01-01

    Analyzes information activities in relation to socioeconomic characteristics in low, middle, and highly developed economies for the years 1960 and 1977 through the use of cluster analysis. Results of data from 31 countries suggest that information development is achieved mainly by countries that have also achieved social development. (26…

  2. Breast cancer and symptom clusters during radiotherapy.

    PubMed

    Matthews, Ellyn E; Schmiege, Sarah J; Cook, Paul F; Sousa, Karen H

    2012-01-01

    Symptom clusters assessment shifts the clinical focus from a specific symptom to the patient's experience as a whole. Few studies have examined breast cancer symptom clusters during treatment, and fewer studies have addressed symptom clusters during radiation therapy (RT). The theoretical underpinning of this study is the Symptoms Experience Model. Research is needed to identify antecedents and consequences of cancer-related symptom clusters. The present study was intended to determine the clustering of symptoms during RT in women with breast cancer and significant correlations among the symptoms, individual characteristics, and mood. A secondary data analysis from a descriptive correlational study of 93 women at weeks 3 to 7 of RT from centers in the mid-Atlantic region of the United States, Symptom Distress Scale, the subscales of the Positive and Negative Affect Scale, Life Orientation Test, and Self-transcendence Scale were completed. Confirmatory factor analysis revealed symptoms grouped into 3 distinct clusters: pain-insomnia-fatigue, cognitive disturbance-outlook, and gastrointestinal. The pain-insomnia-fatigue and cognitive disturbance-outlook clusters were associated with individual characteristics, optimism, self-transcendence, and positive and negative mood. The gastrointestinal cluster correlated significantly only with positive mood. This study provides insight into symptoms that group together and the relationship of symptom clusters to antecedents and mood. These findings underscore the need to define and standardize the measurement of symptom clusters and understand variability in concurrent symptoms. Attention to symptom clusters shifts the clinical focus from a specific symptom to the patient's experience as a whole and helps identify the most effective interventions.

  3. Identifying Patient Attitudinal Clusters Associated with Asthma Control: The European REALISE Survey.

    PubMed

    van der Molen, Thys; Fletcher, Monica; Price, David

    Asthma is a highly heterogeneous disease that can be classified into different clinical phenotypes, and treatment may be tailored accordingly. However, factors beyond purely clinical traits, such as patient attitudes and behaviors, can also have a marked impact on treatment outcomes. The objective of this study was to further analyze data from the REcognise Asthma and LInk to Symptoms and Experience (REALISE) Europe survey, to identify distinct patient groups sharing common attitudes toward asthma and its management. Factor analysis of respondent data (N = 7,930) from the REALISE Europe survey consolidated the 34 attitudinal variables provided by the study population into a set of 8 summary factors. Cluster analyses were used to identify patient clusters that showed similar attitudes and behaviors toward each of the 8 summary factors. Five distinct patient clusters were identified and named according to the key characteristics comprising that cluster: "Confident and self-managing," "Confident and accepting of their asthma," "Confident but dependent on others," "Concerned but confident in their health care professional (HCP)," and "Not confident in themselves or their HCP." Clusters showed clear variability in attributes such as degree of confidence in managing their asthma, use of reliever and preventer medication, and level of asthma control. The 5 patient clusters identified in this analysis displayed distinctly different personal attitudes that would require different approaches in the consultation room certainly for asthma but probably also for other chronic diseases. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Independent association of clustered metabolic risk factors with cardiorespiratory fitness in youth aged 11-17 years.

    PubMed

    Machado-Rodrigues, Aristides M; Leite, Neiva; Coelho-e-Silva, Manuel J; Martins, Raul A; Valente-dos-Santos, João; Mascarenhas, Luís P G; Boguszewski, Margaret C S; Padez, Cristina; Malina, Robert M

    2014-01-01

    Although the prevalence of metabolic syndrome (MetS) has increased in youth, the potential independent contribution of cardiorespiratory fitness (CRF) to the clustering of metabolic risk factors has received relatively little attention. This study evaluated associations between the clustering of metabolic risk factors and CRF in a sample of youth. Height, weight, BMI, fasting glucose, insulin, HDL-cholesterol, triglycerides and blood pressures were measured in a cross-sectional sample of 924 youth (402 males, 522 females) of 11-17 years. CRF was assessed using the 20-metre shuttle run test. Physical activity (PA) was measured with a 3-day diary. Outcome variables were statistically normalized and expressed as Z-scores. A MetS risk score was computed as the mean of the Z-scores. Multiple linear regression was used to test associations between CRF and metabolic risk, adjusted for age, sex, BMI, PA and parental education. CRF was inversely associated with MetS after adjustment for potential confounders. After adjusting for BMI, the relationship between CRF and metabolic risk has substantially improved. CRF was independently associated with the clustering of metabolic risk factors in youth of 11-17 years of age.

  5. Factors driving stable growth of He clusters in W: first-principles study

    NASA Astrophysics Data System (ADS)

    Feng, Y. J.; Xin, T. Y.; Xu, Q.; Wang, Y. X.

    2018-07-01

    The evolution of helium (He) bubbles is responsible for the surface morphology variation and subsequent degradation of the properties of plasma-facing materials (PFMs) in nuclear fusion reactors. These severe problems unquestionably trace back to the behavior of He in PFMs, which is closely associated with the interaction between He and the matrix. In this paper, we decomposed the binding energy of the He cluster into three parts, those from W–W, W–He, and He–He interactions, using density functional theory. As a result, we clearly identified the main factors that determine a steplike decrease in the binding energy with increasing number of He atoms, which explains the process of self-trapping and athermal vacancy generation during He cluster growth in the PFM tungsten. The three interactions were found to synergetically shape the features of the steplike decrease in the binding energy. Fairly strong He–He repulsive forces at a short distance, which stem from antibonding states between He atoms, need to be released when additional He atoms are continuously bonded to the He cluster. This causes the steplike feature in the binding energy. The bonding states between W and He atoms in principle facilitate the decreasing trend of the binding energy. The decrease in binding energy with increasing number of He atoms implies that He clusters can grow stably.

  6. Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

    PubMed

    Anandakrishnan, Ramu; Onufriev, Alexey

    2008-03-01

    In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.

  7. Impact of comprehensive cardiovascular risk reduction programme on risk factor clustering associated with elevated blood pressure in an Indian industrial population.

    PubMed

    Jeemon, Panniyammakal; Prabhakaran, Dorairaj; Goenka, Shifalika; Ramakrishnan, Lakshmy; Padmanabhan, Sandosh; Huffman, Mark; Joshi, Prashant; Sivasankaran, Sivasubramonian; Mohan, B V M; Ahmed, F; Ramanathan, Meera; Ahuja, R; Sinha, Nakul; Thankappan, K R; Reddy, K S

    2012-04-01

    Cardiovascular risk factors clustering associated with blood pressure (BP) has not been studied in the Indian population. This study was aimed at assessing the clustering effect of cardiovascular risk factors with suboptimal BP in Indian population as also the impact of risk reduction interventions. Data from 10543 individuals collected in a nation-wide surveillance programme in India were analysed. The burden of risk factors clustering with blood pressure and coronary heart disease (CHD) was assessed. The impact of a risk reduction programmme on risk factors clustering was prospectively studied in a sub-group. Mean age of participants was 40.9 ± 11.0 yr. A significant linear increase in number of risk factors with increasing blood pressure, irrespective of stratifying using different risk factor thresholds was observed. While hypertension occurred in isolation in 2.6 per cent of the total population, co-existence of hypertension and >3 risk factors was observed in 12.3 per cent population. A comprehensive risk reduction programme significantly reduced the mean number of additional risk factors in the intervention population across the blood pressure groups, while it continued to be high in the control arm without interventions (both within group and between group P<0.001). The proportion of 'low risk phenotype' increased from 13.4 to 19.9 per cent in the intervention population and it was decreased from 27.8 to 10.6 per cent in the control population (P<0.001). The proportion of individuals with hypertension and three more risk factors decreased from 10.6 to 4.7 per cent in the intervention arm while it was increased from 13.3 to 17.8 per cent in the control arm (P<0.001). Our findings showed that cardiovascular risk factors clustered together with elevated blood pressure and a risk reduction programme significantly reduced the risk factors burden.

  8. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    PubMed Central

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  9. Convex clustering: an attractive alternative to hierarchical clustering.

    PubMed

    Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth

    2015-05-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.

  10. The application of cluster analysis in the intercomparison of loop structures in RNA.

    PubMed

    Huang, Hung-Chung; Nagaswamy, Uma; Fox, George E

    2005-04-01

    We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.

  11. The application of cluster analysis in the intercomparison of loop structures in RNA

    PubMed Central

    HUANG, HUNG-CHUNG; NAGASWAMY, UMA; FOX, GEORGE E.

    2005-01-01

    We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence. PMID:15769871

  12. Minimum number of clusters and comparison of analysis methods for cross sectional stepped wedge cluster randomised trials with binary outcomes: A simulation study.

    PubMed

    Barker, Daniel; D'Este, Catherine; Campbell, Michael J; McElduff, Patrick

    2017-03-09

    Stepped wedge cluster randomised trials frequently involve a relatively small number of clusters. The most common frameworks used to analyse data from these types of trials are generalised estimating equations and generalised linear mixed models. A topic of much research into these methods has been their application to cluster randomised trial data and, in particular, the number of clusters required to make reasonable inferences about the intervention effect. However, for stepped wedge trials, which have been claimed by many researchers to have a statistical power advantage over the parallel cluster randomised trial, the minimum number of clusters required has not been investigated. We conducted a simulation study where we considered the most commonly used methods suggested in the literature to analyse cross-sectional stepped wedge cluster randomised trial data. We compared the per cent bias, the type I error rate and power of these methods in a stepped wedge trial setting with a binary outcome, where there are few clusters available and when the appropriate adjustment for a time trend is made, which by design may be confounding the intervention effect. We found that the generalised linear mixed modelling approach is the most consistent when few clusters are available. We also found that none of the common analysis methods for stepped wedge trials were both unbiased and maintained a 5% type I error rate when there were only three clusters. Of the commonly used analysis approaches, we recommend the generalised linear mixed model for small stepped wedge trials with binary outcomes. We also suggest that in a stepped wedge design with three steps, at least two clusters be randomised at each step, to ensure that the intervention effect estimator maintains the nominal 5% significance level and is also reasonably unbiased.

  13. Nearest clusters based partial least squares discriminant analysis for the classification of spectral data.

    PubMed

    Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar

    2018-06-07

    Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Subgroups of physically abusive parents based on cluster analysis of parenting behavior and affect.

    PubMed

    Haskett, Mary E; Smith Scott, Susan; Sabourin Ward, Caryn

    2004-10-01

    Cluster analysis of observed parenting and self-reported discipline was used to categorize 83 abusive parents into subgroups. A 2-cluster solution received support for validity. Cluster 1 parents were relatively warm, positive, sensitive, and engaged during interactions with their children, whereas Cluster 2 parents were relatively negative, disengaged or intrusive, and insensitive. Further, clusters differed in emotional health, parenting stress, perceptions of children, and problem solving. Children of parents in the 2 clusters differed on several indexes of social adjustment. Cluster 1 parents were similar to nonabusive parents (n = 66) on parenting and related constructs, but Cluster 2 parents differed from nonabusive parents on all clustering variables and many validation variables. Results highlight clinically relevant diversity in parenting practices and functioning among abusive parents. ((c) 2004 APA, all rights reserved).

  15. Coping Patterns of African American Adolescents: A Confirmatory Factor Analysis and Cluster Analysis of the Children's Coping Strategies Checklist

    ERIC Educational Resources Information Center

    Gaylord-Harden, Noni K.; Gipson, Polly; Mance, GiShawn; Grant, Kathryn E.

    2008-01-01

    The current study examined patterns of coping strategies in a sample of 497 low-income urban African American adolescents (mean age = 12.61 years). Results of confirmatory factor analysis indicated that the 4-factor structure of the Children's Coping Strategies Checklist (T. S. Ayers, I. N. Sandler, S. G. West, & M. W. Roosa, 1996) was not…

  16. Stream gradient Hotspot and Cluster Analysis (SL-HCA) for improving the longitudinal profiles metrics

    NASA Astrophysics Data System (ADS)

    Troiani, Francesco; Piacentini, Daniela; Seta Marta, Della

    2016-04-01

    Many researches successfully focused on stream longitudinal profiles analysis through Stream Length-gradient (SL) index for detecting, at different spatial scales, either tectonic structures or hillslope processes. The analysis and interpretation of spatial variability of SL values, both at a regional and local scale, is often complicated due to the concomitance of different factors generating SL anomalies, including the bedrock composition. The creation of lithologically-filtered SL maps is often problematic in areas where homogeneously surveyed geological maps, with a sufficient resolution are unavailable. Moreover, both the SL map classification and the unbiased anomaly detection are rather difficult. For instance, which is the best threshold to define the anomalous SL values? Further, is there a minimum along-channel extent of anomalous SL values for objectively defining over-steeped segments on long-profiles? This research investigates the relevance and potential of a new approach based on Hotspot and Cluster Analysis of SL values (SL-HCA) for detecting knickzones on long-profiles at a regional scale and for fine-tuning the interpretation of their geological-geomorphological meaning. We developed this procedure within a 2800 km2-wide area located in the mountainous sector of the Northern Apennines of Italy. The Getis-Ord Gi∗ statistic is applied for the SL-HCA approach. The value of SL, calculated starting from a 5x5 m Digital Elevation Model, is used as weighting factor and the Gi∗ index is calculated for each 50 m-long channel segment for the whole fluvial system. The outcomes indicate that high positive Gi∗ values imply the clustering of SL anomalies, thus the occurrence of knickzones on the stream long-profiles. Results show that high and very high Gi* values (i.e. values beyond two standard deviations from the mean) correlate well with the principal knickzones detected with existent lithologically-filtered SL maps. Field checks and remote sensing

  17. High-dimensional cluster analysis with the Masked EM Algorithm

    PubMed Central

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  18. Preliminary Cluster Analysis For Several Representatives Of Genus Kerivoula (Chiroptera: Vespertilionidae) in Borneo

    NASA Astrophysics Data System (ADS)

    Hasan, Noor Haliza; Abdullah, M. T.

    2008-01-01

    The aim of the study is to use cluster analysis on morphometric parameters within the genus Kerivoula to produce a dendrogram and to determine the suitability of this method to describe the relationship among species within this genus. A total of 15 adult male individuals from genus Kerivoula taken from sampling trips around Borneo and specimens kept at the zoological museum of Universiti Malaysia Sarawak were examined. A total of 27 characters using dental, skull and external body measurements were recorded. Clustering analysis illustrated the grouping and morphometric relationships between the species of this genus. It has clearly separated each species from each other despite the overlapping of measurements of some species within the genus. Cluster analysis provides an alternative approach to make a preliminary identification of a species.

  19. Phenotypes of asthma in low-income children and adolescents: cluster analysis.

    PubMed

    Cabral, Anna Lucia Barros; Sousa, Andrey Wirgues; Mendes, Felipe Augusto Rodrigues; Carvalho, Celso Ricardo Fernandes de

    2017-01-01

    Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. We included 306 children and adolescents (6-18 years of age) with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Three clusters were defined for our population. Cluster 1 (n = 94) included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87) included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108) included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability. Estudos que caracterizam fenótipos de asma predominantemente incluem adultos ou foram realizados em crianças e adolescentes de países desenvolvidos; portanto, sua aplicabilidade em outras populações, tais como as de países em desenvolvimento, permanece indeterminada. Nosso

  20. Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis.

    PubMed

    Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

    2017-09-27

    A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.

  1. Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis

    PubMed Central

    Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

    2017-01-01

    A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation. PMID:28953231

  2. Language Learner Motivational Types: A Cluster Analysis Study

    ERIC Educational Resources Information Center

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  3. Clusters and Factors Associated with Complementary Basic Education in Tanzania Mainland

    ERIC Educational Resources Information Center

    Edwin, Paul; Amina, Msengwa S.; Godwin, Naimani M.

    2017-01-01

    Complimentary Basic Education in Tanzania (COBET) is a community-based programme initiated in 1999 to provide formal education system opportunity to over aged children or children above school age. The COBET program was analyzed using secondary data collected from 21 regions from 2008 to 2012. Cluster analysis was applied to classify the 21…

  4. Clustering Financial Time Series by Network Community Analysis

    NASA Astrophysics Data System (ADS)

    Piccardi, Carlo; Calatroni, Lisa; Bertoni, Fabio

    In this paper, we describe a method for clustering financial time series which is based on community analysis, a recently developed approach for partitioning the nodes of a network (graph). A network with N nodes is associated to the set of N time series. The weight of the link (i, j), which quantifies the similarity between the two corresponding time series, is defined according to a metric based on symbolic time series analysis, which has recently proved effective in the context of financial time series. Then, searching for network communities allows one to identify groups of nodes (and then time series) with strong similarity. A quantitative assessment of the significance of the obtained partition is also provided. The method is applied to two distinct case-studies concerning the US and Italy Stock Exchange, respectively. In the US case, the stability of the partitions over time is also thoroughly investigated. The results favorably compare with those obtained with the standard tools typically used for clustering financial time series, such as the minimal spanning tree and the hierarchical tree.

  5. Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

    ERIC Educational Resources Information Center

    Raker, Jeffrey R.; Holme, Thomas A.

    2014-01-01

    A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…

  6. Fuzzy cluster analysis of air quality in Beijing district

    NASA Astrophysics Data System (ADS)

    Liu, Hongkai

    2018-02-01

    The principle of fuzzy clustering analysis is applied in this article, by using the method of transitive closure, the main air pollutants in 17 districts of Beijing from 2014 to 2016 were classified. The results of the analysis reflects the nearly three year’s changes of the main air pollutants in Beijing. This can provide the scientific for atmospheric governance in the Beijing area and digital support.

  7. Cardiovascular reactivity patterns and pathways to hypertension: a multivariate cluster analysis.

    PubMed

    Brindle, R C; Ginty, A T; Jones, A; Phillips, A C; Roseboom, T J; Carroll, D; Painter, R C; de Rooij, S R

    2016-12-01

    Substantial evidence links exaggerated mental stress induced blood pressure reactivity to future hypertension, but the results for heart rate reactivity are less clear. For this reason multivariate cluster analysis was carried out to examine the relationship between heart rate and blood pressure reactivity patterns and hypertension in a large prospective cohort (age range 55-60 years). Four clusters emerged with statistically different systolic and diastolic blood pressure and heart rate reactivity patterns. Cluster 1 was characterised by a relatively exaggerated blood pressure and heart rate response while the blood pressure and heart rate responses of cluster 2 were relatively modest and in line with the sample mean. Cluster 3 was characterised by blunted cardiovascular stress reactivity across all variables and cluster 4, by an exaggerated blood pressure response and modest heart rate response. Membership to cluster 4 conferred an increased risk of hypertension at 5-year follow-up (hazard ratio=2.98 (95% CI: 1.50-5.90), P<0.01) that survived adjustment for a host of potential confounding variables. These results suggest that the cardiac reactivity plays a potentially important role in the link between blood pressure reactivity and hypertension and support the use of multivariate approaches to stress psychophysiology.

  8. Application of Geostatistical Methods and Machine Learning for spatio-temporal Earthquake Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Schaefer, A. M.; Daniell, J. E.; Wenzel, F.

    2014-12-01

    Earthquake clustering tends to be an increasingly important part of general earthquake research especially in terms of seismic hazard assessment and earthquake forecasting and prediction approaches. The distinct identification and definition of foreshocks, aftershocks, mainshocks and secondary mainshocks is taken into account using a point based spatio-temporal clustering algorithm originating from the field of classic machine learning. This can be further applied for declustering purposes to separate background seismicity from triggered seismicity. The results are interpreted and processed to assemble 3D-(x,y,t) earthquake clustering maps which are based on smoothed seismicity records in space and time. In addition, multi-dimensional Gaussian functions are used to capture clustering parameters for spatial distribution and dominant orientations. Clusters are further processed using methodologies originating from geostatistics, which have been mostly applied and developed in mining projects during the last decades. A 2.5D variogram analysis is applied to identify spatio-temporal homogeneity in terms of earthquake density and energy output. The results are mitigated using Kriging to provide an accurate mapping solution for clustering features. As a case study, seismic data of New Zealand and the United States is used, covering events since the 1950s, from which an earthquake cluster catalogue is assembled for most of the major events, including a detailed analysis of the Landers and Christchurch sequences.

  9. Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

    NASA Astrophysics Data System (ADS)

    Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

    2018-04-01

    Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.

  10. Transcriptome Analysis of Aspergillus flavus Reveals veA-Dependent Regulation of Secondary Metabolite Gene Clusters, Including the Novel Aflavarin Cluster

    PubMed Central

    Cary, J. W.; Han, Z.; Yin, Y.; Lohmar, J. M.; Shantappa, S.; Harris-Coward, P. Y.; Mack, B.; Ehrlich, K. C.; Wei, Q.; Arroyo-Manzanares, N.; Uka, V.; Vanhaecke, L.; Bhatnagar, D.; Yu, J.; Nierman, W. C.; Johns, M. A.; Sorensen, D.; Shen, H.; De Saeger, S.; Diana Di Mavungu, J.

    2015-01-01

    The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. PMID:26209694

  11. Customized recommendations for production management clusters of North American automatic milking systems.

    PubMed

    Tremblay, Marlène; Hess, Justin P; Christenson, Brock M; McIntyre, Kolby K; Smink, Ben; van der Kamp, Arjen J; de Jong, Lisanne G; Döpfer, Dörte

    2016-07-01

    Automatic milking systems (AMS) are implemented in a variety of situations and environments. Consequently, there is a need to characterize individual farming practices and regional challenges to streamline management advice and objectives for producers. Benchmarking is often used in the dairy industry to compare farms by computing percentile ranks of the production values of groups of farms. Grouping for conventional benchmarking is commonly limited to the use of a few factors such as farms' geographic region or breed of cattle. We hypothesized that herds' production data and management information could be clustered in a meaningful way using cluster analysis and that this clustering approach would yield better peer groups of farms than benchmarking methods based on criteria such as country, region, breed, or breed and region. By applying mixed latent-class model-based cluster analysis to 529 North American AMS dairy farms with respect to 18 significant risk factors, 6 clusters were identified. Each cluster (i.e., peer group) represented unique management styles, challenges, and production patterns. When compared with peer groups based on criteria similar to the conventional benchmarking standards, the 6 clusters better predicted milk produced (kilograms) per robot per day. Each cluster represented a unique management and production pattern that requires specialized advice. For example, cluster 1 farms were those that recently installed AMS robots, whereas cluster 3 farms (the most northern farms) fed high amounts of concentrates through the robot to compensate for low-energy feed in the bunk. In addition to general recommendations for farms within a cluster, individual farms can generate their own specific goals by comparing themselves to farms within their cluster. This is very comparable to benchmarking but adds the specific characteristics of the peer group, resulting in better farm management advice. The improvement that cluster analysis allows for is

  12. Stressful jobs and non-stressful jobs: a cluster analysis of office jobs.

    PubMed

    Carayon, P

    1994-02-01

    The purpose of the study was to determine if office jobs could be characterized by a small number of combinations of stressors that could be related to job-title information and self-report of psychological strain. Two-hundred-and-sixty-two office workers from three public service organizations provided data on nine job stressors and seven indicators of psychological strain. Using cluster analysis on the nine stressors, office jobs were classified into three clusters. The first cluster included jobs with high skill utilization, task clarity, job control and social support and low future ambiguity, but also high on job demands such as quantitative work-load, attention and work pressure. The second cluster included jobs with high demands and future ambiguity and low skill utilization, task clarity, job control and social support. The third cluster was intermediary between the first two clusters. The three clusters were related to job-title information. The second cluster was the highest on a range of psychological strain indicators, while the other two clusters were high on certain strain indicators but low on others. The study showed that office jobs could be characterized by a small number of combinations of stressors that were related to job-title information and psychological strain.

  13. Clustering analysis of moving target signatures

    NASA Astrophysics Data System (ADS)

    Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto

    2010-04-01

    Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.

  14. Understanding clusters of risk factors across different environmental and social contexts for the prediction of injuries among Canadian youth.

    PubMed

    Russell, K; Davison, C; King, N; Pike, I; Pickett, W

    2016-05-01

    Among Canadian youth, injury is the most common reason for presentation to the emergency department. Youth who commonly engage in multiple risk-taking behaviours are at greater risk for injury, but is it unknown if this phenomenon is more pronounced in different contexts. We aimed to study relationships between risk-taking behaviours and injury, and variations in such relationships between different environmental and social contexts, among youth in Canada. Risk-taking behaviour and injury outcome data were collected from grade 9 to 10 students using the 2009-2010 (Cycle 6) of the Health Behaviour in School-Aged Children Survey (n=10,429). Principal components analysis was used to identify clusters of risk-taking behaviours. Within each identified cluster, the degree of risk-taking was categorized into quartiles from lowest to highest engagement in the behaviours. Risk ratios with 95% confidence intervals were calculated to determine the association between the risk of any injury and the degree of risk-taking behaviour specific to the cluster. Clusters were then examined across home, school, neighbourhood and sport contexts. Four clusters of risk-taking behaviour were identified which were labelled as "gateway substance use", "hard drugs and weapons", "overt risk-taking", and "physical activity". Each cluster was related to injury occurrence in a graded fashion. Clusters of risk behaviour were most strongly associated with injuries sustained in neighbourhood settings, and expectedly, increasing physical activity behaviours were associated with increased risk of sport injuries and injuries occurring at school. This study furthers understanding of clustered risk-taking phenomena that put youth at increasing levels of injury risk. Higher risks for injury and associated gradients were observed in less structured contexts such as neighbourhoods. In contrast, clustered physical activity behaviours were most related to school injury or sport injury and were more likely to

  15. Nationwide analysis on the impact of socioeconomic land use factors and incidence of urothelial carcinoma.

    PubMed

    Brandt, Maximilian P; Gust, Kilian M; Mani, Jens; Vallo, Stefan; Höfner, Thomas; Borgmann, Hendrik; Tsaur, Igor; Thomas, Christian; Haferkamp, Axel; Herrmann, Eva; Bartsch, Georg

    2018-02-01

    Incidence rates for urothelial carcinoma (UC) have been reported to differ between countries within the European Union (EU). Besides occupational exposure to chemicals, other substances such as tobacco and nitrite in groundwater have been identified as risk factors for UC. We investigated if regional differences in UC incidence rates are associated with agricultural, industrial and residential land use. Newly diagnosed cases of UC between 2003 and 2010 were included. Information within 364 administrative districts of Germany from 2004 for land use factors were obtained and calculated as a proportion of the total area of the respective administrative district and as a smoothed proportion. Furthermore, information on smoking habits was included in our analysis. Kulldorff spatial clustering was used to detect different clusters. A negative binomial model was used to test the spatial association between UC incidence as a ratio of observed versus expected incidence rates, land use and smoking habits. We identified 437,847,834 person years with 171,086 cases of UC. Cluster analysis revealed areas with higher incidence of UC than others (p=0.0002). Multivariate analysis including significant pairwise interactions showed that the environmental factors were independently associated with UC (p<0.001). The RR was 1.066 (95% CI 1.052-1.080), 1.066 (95% CI 1.042-1.089) and 1.067 (95% CI 1.045-1.093) for agricultural, industrial and residential areas, respectively, and 0.996 (95% CI 0.869-0.999) for the proportion of never smokers. This study displays regional differences in incidence of UC in Germany. Additionally, results suggest that socioeconomic factors based on agricultural, industrial and residential land use may be associated with UC incidence rates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

    PubMed

    Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

    2017-12-01

    To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  17. Characterization and virulence clustering analysis of extraintestinal pathogenic Escherichia coli isolated from swine in China.

    PubMed

    Zhu, Yinchu; Dong, Wenyang; Ma, Jiale; Yuan, Lvfeng; Hejair, Hassan M A; Pan, Zihao; Liu, Guangjin; Yao, Huochun

    2017-04-08

    Swine extraintestinal pathogenic Escherichia coli (ExPEC) is an important pathogen that leads to economic and welfare costs in the swine industry worldwide, and is occurring with increasing frequency in China. By far, various virulence factors have been recognized in ExPEC. Here, we investigated the virulence genotypes and clonal structure of collected strains to improve the knowledge of phylogenetic traits of porcine ExPECs in China. We isolated 64 Chinese porcine ExPEC strains from 2013 to 14 in China. By multiplex PCR, the distribution of isolates belonging to phylogenetic groups B1, B2, A and D was 9.4%, 10.9%, 57.8% and 21.9%, respectively. Nineteen virulence-related genes were detected by PCR assay; ompA, fimH, vat, traT and iutA were highly prevalent. Virulence-related genes were remarkably more prevalent in group B2 than in groups A, B1 and D; notably, usp, cnf1, hlyD, papA and ibeA were only found in group B2 strains. Genotyping analysis was performed and four clusters of strains (named I to IV) were identified. Cluster IV contained all isolates from group B2 and Cluster IV isolates had the strongest pathogenicity in a mouse infection model. As phylogenetic group B2 and D ExPEC isolates are generally considered virulent, multilocus sequence typing (MLST) analysis was performed for these isolates to further investigate genetic relationships. Two novel sequence types, ST5170 and ST5171, were discovered. Among the nine clonal complexes identified among our group B2 and D isolates, CC12 and CC95 have been indicated to have high zoonotic pathogenicity. The distinction between group B2 and non-B2 isolates in virulence and genotype accorded with MLST analysis. This study reveals significant genetic diversity among ExPEC isolates and helps us to better understand their pathogenesis. Importantly, our data suggest group B2 (Cluster IV) strains have the highest risk of causing animal disease and illustrate the correlation between genotype and virulence.

  18. [Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].

    PubMed

    Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige

    2015-12-01

    We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.

  19. clusterProfiler: an R package for comparing biological themes among gene clusters.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

    2012-05-01

    Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.

  20. A QUANTITATIVE ANALYSIS OF DISTANT OPEN CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Janes, Kenneth A.; Hoq, Sadia

    2011-03-15

    The oldest open star clusters are important for tracing the history of the Galactic disk, but many of the more distant clusters are heavily reddened and projected against the rich stellar background of the Galaxy. We have undertaken an investigation of several distant clusters (Berkeley 19, Berkeley 44, King 25, NGC 6802, NGC 6827, Berkeley 52, Berkeley 56, NGC 7142, NGC 7245, and King 9) to develop procedures for separating probable cluster members from the background field. We next created a simple quantitative approach for finding approximate cluster distances, reddenings, and ages. We first conclude that with the possible exceptionmore » of King 25 they are probably all physical clusters. We also find that for these distant clusters our typical errors are about {+-}0.07 in E(B - V), {+-}0.15 in log(age), and {+-}0.25 in (m - M){sub o}. The clusters range in age from 470 Myr to 7 Gyr and range from 7.1 to 16.4 kpc from the Galactic center.« less

  1. Multilevel Analysis of Trachomatous Trichiasis and Corneal Opacity in Nigeria: The Role of Environmental and Climatic Risk Factors on the Distribution of Disease.

    PubMed

    Smith, Jennifer L; Sivasubramaniam, Selvaraj; Rabiu, Mansur M; Kyari, Fatima; Solomon, Anthony W; Gilbert, Clare

    2015-01-01

    The distribution of trachoma in Nigeria is spatially heterogeneous, with large-scale trends observed across the country and more local variation within areas. Relative contributions of individual and cluster-level risk factors to the geographic distribution of disease remain largely unknown. The primary aim of this analysis is to assess the relationship between climatic factors and trachomatous trichiasis (TT) and/or corneal opacity (CO) due to trachoma in Nigeria, while accounting for the effects of individual risk factors and spatial correlation. In addition, we explore the relative importance of variation in the risk of trichiasis and/or corneal opacity (TT/CO) at different levels. Data from the 2007 National Blindness and Visual Impairment Survey were used for this analysis, which included a nationally representative sample of adults aged 40 years and above. Complete data were available from 304 clusters selected using a multi-stage stratified cluster-random sampling strategy. All participants (13,543 individuals) were interviewed and examined by an ophthalmologist for the presence or absence of TT and CO. In addition to field-collected data, remotely sensed climatic data were extracted for each cluster and used to fit Bayesian hierarchical logistic models to disease outcome. The risk of TT/CO was associated with factors at both the individual and cluster levels, with approximately 14% of the total variation attributed to the cluster level. Beyond established individual risk factors (age, gender and occupation), there was strong evidence that environmental/climatic factors at the cluster-level (lower precipitation, higher land surface temperature, higher mean annual temperature and rural classification) were also associated with a greater risk of TT/CO. This study establishes the importance of large-scale risk factors in the geographical distribution of TT/CO in Nigeria, supporting anecdotal evidence that environmental conditions are associated with increased

  2. Severe or life-threatening asthma exacerbation: patient heterogeneity identified by cluster analysis.

    PubMed

    Sekiya, K; Nakatani, E; Fukutomi, Y; Kaneda, H; Iikura, M; Yoshida, M; Takahashi, K; Tomii, K; Nishikawa, M; Kaneko, N; Sugino, Y; Shinkai, M; Ueda, T; Tanikawa, Y; Shirai, T; Hirabayashi, M; Aoki, T; Kato, T; Iizuka, K; Homma, S; Taniguchi, M; Tanaka, H

    2016-08-01

    Severe or life-threatening asthma exacerbation is one of the worst outcomes of asthma because of the risk of death. To date, few studies have explored the potential heterogeneity of this condition. To examine the clinical characteristics and heterogeneity of patients with severe or life-threatening asthma exacerbation. This was a multicentre, prospective study of patients with severe or life-threatening asthma exacerbation and pulse oxygen saturation < 90% who were admitted to 17 institutions across Japan. Cluster analysis was performed using variables from patient- and physician-orientated structured questionnaires. Analysis of data from 175 patients with severe or life-threatening asthma exacerbation revealed five distinct clusters. Cluster 1 (n = 27) was younger-onset asthma with severe symptoms at baseline, including limitation of activities, a higher frequency of treatment with oral corticosteroids and short-acting beta-agonists, and a higher frequency of asthma hospitalizations in the past year. Cluster 2 (n = 35) was predominantly composed of elderly females, with the highest frequency of comorbid, chronic hyperplastic rhinosinusitis/nasal polyposis, and a long disease duration. Cluster 3 (n = 40) was allergic asthma without inhaled corticosteroid use at baseline. Patients in this cluster had a higher frequency of atopy, including allergic rhinitis and furred pet hypersensitivity, and a better prognosis during hospitalization compared with the other clusters. Cluster 4 (n = 34) was characterized by elderly males with concomitant chronic obstructive pulmonary disease (COPD). Although cluster 5 (n = 39) had very mild symptoms at baseline according to the patient questionnaires, 41% had previously been hospitalized for asthma. This study demonstrated that significant heterogeneity exists among patients with severe or life-threatening asthma exacerbation. Differences were observed in the severity of asthma symptoms and use of inhaled corticosteroids at baseline

  3. A Model for Protostellar Cluster Luminosities and the Impact on the CO–H2 Conversion Factor

    NASA Astrophysics Data System (ADS)

    Gaches, Brandt A. L.; Offner, Stella S. R.

    2018-02-01

    We construct a semianalytic model to study the effect of far-ultraviolet (FUV) radiation on gas chemistry from embedded protostars. We use the protostellar luminosity function (PLF) formalism of Offner & McKee to calculate the total, FUV, and ionizing cluster luminosity for various protostellar accretion histories and cluster sizes. We2 compare the model predictions with surveys of Gould Belt star-forming regions and find that the tapered turbulent core model matches best the mean luminosities and the spread in the data. We combine the cluster model with the photodissociation region astrochemistry code, 3D-PDR, to compute the impact of the FUV luminosity from embedded protostars on the CO-to-H2 conversion factor, X CO, as a function of cluster size, gas mass, and star formation efficiency. We find that X CO has a weak dependence on the FUV radiation from embedded sources for large clusters owing to high cloud optical depths. In smaller and more efficient clusters the embedded FUV increases X CO to levels consistent with the average Milky Way values. The internal physical and chemical structures of the cloud are significantly altered, and X CO depends strongly on the protostellar cluster mass for small efficient clouds.

  4. Clustering ENTLN sferics to improve TGF temporal analysis

    NASA Astrophysics Data System (ADS)

    Pradhan, E.; Briggs, M. S.; Stanbro, M.; Cramer, E.; Heckman, S.; Roberts, O.

    2017-12-01

    Using TGFs detected with Fermi Gamma-ray Burst Monitor (GBM) and simultaneous radio sferics detected by Earth Network Total Lightning Network (ENTLN), we establish a temporal co-relation between them. The first step is to find ENTLN strokes that that are closely associated to GBM TGFs. We then identify all the related strokes in the lightning flash that the TGF-associated-stroke belongs to. After trying several algorithms, we found out that the DBSCAN clustering algorithm was best for clustering related ENTLN strokes into flashes. The operation of DBSCAN was optimized using a single seperation measure that combined time and distance seperation. Previous analysis found that these strokes show three timescales with respect to the gamma-ray time. We will use the improved identification of flashes to research this.

  5. Cluster: Mission Overview and End-of-Life Analysis

    NASA Technical Reports Server (NTRS)

    Pallaschke, S.; Munoz, I.; Rodriquez-Canabal, J.; Sieg, D.; Yde, J. J.

    2007-01-01

    The Cluster mission is part of the scientific programme of the European Space Agency (ESA) and its purpose is the analysis of the Earth's magnetosphere. The Cluster project consists of four satellites. The selected polar orbit has a shape of 4.0 and 19.2 Re which is required for performing measurements near the cusp and the tail of the magnetosphere. When crossing these regions the satellites form a constellation which in most of the cases so far has been a regular tetrahedron. The satellite operations are carried out by the European Space Operations Centre (ESOC) at Darmstadt, Germany. The paper outlines the future orbit evolution and the envisaged operations from a Flight Dynamics point of view. In addition a brief summary of the LEOP and routine operations is included beforehand.

  6. A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis

    PubMed Central

    2013-01-01

    Background Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Results Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. Conclusion There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic

  7. Chemical factor analysis of skin cancer FTIR-FEW spectroscopic data

    NASA Astrophysics Data System (ADS)

    Bruch, Reinhard F.; Sukuta, Sydney

    2002-03-01

    Chemical Factor Analysis (CFA) algorithms were applied to transform complex Fourier transform infrared fiberoptical evanescent wave (FTIR-FEW) normal and malignant skin tissue spectra into factor spaces for analysis and classification. The factor space approach classified melanoma beyond prior pathological classifications related to specific biochemical alterations to health states in cluster diagrams allowing diagnosis with more biochemical specificity, resolving biochemical component spectra and employing health state eigenvector angular configurations as disease state sensors. This study demonstrated a wealth of new information from in vivo FTIR-FEW spectral tissue data, without extensive a priori information or clinically invasive procedures. In particular, we employed a variety of methods used in CFA to select the rank of spectroscopic data sets of normal benign and cancerous skin tissue. We used the Malinowski indicator function (IND), significance level and F-Tests to rank our data matrices. Normal skin tissue, melanoma and benign tumors were modeled by four, two and seven principal abstract factors, respectively. We also showed that the spectrum of the first eigenvalue was equivalent to the mean spectrum. The graphical depiction of angular disparities between the first abstract factors can be adopted as a new way to characterize and diagnose melanoma cancer.

  8. Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hu, Lin; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu; Hammond, Karl D.

    We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes ofmore » helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.« less

  9. Cluster analysis to estimate the risk of preeclampsia in the high-risk Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) study.

    PubMed

    Villa, Pia M; Marttinen, Pekka; Gillberg, Jussi; Lokki, A Inkeri; Majander, Kerttu; Ordén, Maija-Riitta; Taipale, Pekka; Pesonen, Anukatriina; Räikkönen, Katri; Hämäläinen, Esa; Kajantie, Eero; Laivuori, Hannele

    2017-01-01

    Preeclampsia is divided into early-onset (delivery before 34 weeks of gestation) and late-onset (delivery at or after 34 weeks) subtypes, which may rise from different etiopathogenic backgrounds. Early-onset disease is associated with placental dysfunction. Late-onset disease develops predominantly due to metabolic disturbances, obesity, diabetes, lipid dysfunction, and inflammation, which affect endothelial function. Our aim was to use cluster analysis to investigate clinical factors predicting the onset and severity of preeclampsia in a cohort of women with known clinical risk factors. We recruited 903 pregnant women with risk factors for preeclampsia at gestational weeks 12+0-13+6. Each individual outcome diagnosis was independently verified from medical records. We applied a Bayesian clustering algorithm to classify the study participants to clusters based on their particular risk factor combination. For each cluster, we computed the risk ratio of each disease outcome, relative to the risk in the general population. The risk of preeclampsia increased exponentially with respect to the number of risk factors. Our analysis revealed 25 number of clusters. Preeclampsia in a previous pregnancy (n = 138) increased the risk of preeclampsia 8.1 fold (95% confidence interval (CI) 5.7-11.2) compared to a general population of pregnant women. Having a small for gestational age infant (n = 57) in a previous pregnancy increased the risk of early-onset preeclampsia 17.5 fold (95%CI 2.1-60.5). Cluster of those two risk factors together (n = 21) increased the risk of severe preeclampsia to 23.8-fold (95%CI 5.1-60.6), intermediate onset (delivery between 34+0-36+6 weeks of gestation) to 25.1-fold (95%CI 3.1-79.9) and preterm preeclampsia (delivery before 37+0 weeks of gestation) to 16.4-fold (95%CI 2.0-52.4). Body mass index over 30 kg/m2 (n = 228) as a sole risk factor increased the risk of preeclampsia to 2.1-fold (95%CI 1.1-3.6). Together with preeclampsia in an earlier

  10. Space-Time Analysis of Testicular Cancer Clusters Using Residential Histories: A Case-Control Study in Denmark

    PubMed Central

    Sloan, Chantel D.; Nordsborg, Rikke B.; Jacquez, Geoffrey M.; Raaschou-Nielsen, Ole; Meliker, Jaymie R.

    2015-01-01

    Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population. PMID

  11. Space-time analysis of testicular cancer clusters using residential histories: a case-control study in Denmark.

    PubMed

    Sloan, Chantel D; Nordsborg, Rikke B; Jacquez, Geoffrey M; Raaschou-Nielsen, Ole; Meliker, Jaymie R

    2015-01-01

    Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population.

  12. Speeding up the Consensus Clustering methodology for microarray data analysis

    PubMed Central

    2011-01-01

    Background The inference of the number of clusters in a dataset, a fundamental problem in Statistics, Data Analysis and Classification, is usually addressed via internal validation measures. The stated problem is quite difficult, in particular for microarrays, since the inferred prediction must be sensible enough to capture the inherent biological structure in a dataset, e.g., functionally related genes. Despite the rich literature present in that area, the identification of an internal validation measure that is both fast and precise has proved to be elusive. In order to partially fill this gap, we propose a speed-up of Consensus (Consensus Clustering), a methodology whose purpose is the provision of a prediction of the number of clusters in a dataset, together with a dissimilarity matrix (the consensus matrix) that can be used by clustering algorithms. As detailed in the remainder of the paper, Consensus is a natural candidate for a speed-up. Results Since the time-precision performance of Consensus depends on two parameters, our first task is to show that a simple adjustment of the parameters is not enough to obtain a good precision-time trade-off. Our second task is to provide a fast approximation algorithm for Consensus. That is, the closely related algorithm FC (Fast Consensus) that would have the same precision as Consensus with a substantially better time performance. The performance of FC has been assessed via extensive experiments on twelve benchmark datasets that summarize key features of microarray applications, such as cancer studies, gene expression with up and down patterns, and a full spectrum of dimensionality up to over a thousand. Based on their outcome, compared with previous benchmarking results available in the literature, FC turns out to be among the fastest internal validation methods, while retaining the same outstanding precision of Consensus. Moreover, it also provides a consensus matrix that can be used as a dissimilarity matrix

  13. Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

    NASA Astrophysics Data System (ADS)

    Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

    2017-08-01

    Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.

  14. Going beyond Clustering in MD Trajectory Analysis: An Application to Villin Headpiece Folding

    PubMed Central

    Rajan, Aruna; Freddolino, Peter L.; Schulten, Klaus

    2010-01-01

    Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous. PMID:20419160

  15. Going beyond clustering in MD trajectory analysis: an application to villin headpiece folding.

    PubMed

    Rajan, Aruna; Freddolino, Peter L; Schulten, Klaus

    2010-04-15

    Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.

  16. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    ERIC Educational Resources Information Center

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  17. [Clustering patterns of behavioral risk factors linked to chronic disease among young adults in two localities in Bogota, Colombia: importance of sex differences].

    PubMed

    Gómez Gutiérrez, Luis Fernando; Lucumí Cuesta, Diego Iván; Girón Vargas, Sandra Lorena; Espinosa García, Gladys

    2004-01-01

    The characterization of clustering behavioral risk factors may be used as a guideline for interventions aimed at preventing chronic diseases. This study determined the clustering patterns of some behavioral risk factors in young adults aged 18 to 29 years and established the factors associated with having two or more of them. Patterns of clustering by gender were established in four behavioral risk factors (low consumption of fruits and vegetables, physical inactivity in leisure time, current tobacco consumption and acute alcohol consumption), in 1465 young adults participants through a multistage probabilistic sample. Regression models identified the sociodemografic variables associated with having two or more of the aforementioned behavioral risk factors. Having one, 32.9% two and 17.7% three or four. Acute alcohol consumption was the risk factor most frequent in the combined risk factor patterns among males; physical inactivity during leisure time being the most frequent among females. Among the females, having two or more behavioral risk factors was linked to be separated or divorced, this having been linked to work having been the main activity over the past 30 days among males. The combinations of behavioral risk factors studied and the factors associated with clustering show different patterns among males and females. These findings stressed the need of designing interventions sensitive to gender differences.

  18. Weighing the Giants - I. Weak-lensing masses for 51 massive galaxy clusters: project overview, data analysis methods and cluster images

    NASA Astrophysics Data System (ADS)

    von der Linden, Anja; Allen, Mark T.; Applegate, Douglas E.; Kelly, Patrick L.; Allen, Steven W.; Ebeling, Harald; Burchat, Patricia R.; Burke, David L.; Donovan, David; Morris, R. Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam

    2014-03-01

    This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15 ≲ zCl ≲ 0.7, in order to calibrate X-ray and other mass proxies for cosmological cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the `blind' nature of the analysis to avoid confirmation bias. Our target clusters are drawn from X-ray catalogues based on the ROSAT All-Sky Survey, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru Telescope and Canada-France-Hawaii Telescope for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photometric redshift estimates of lensed galaxies. In this paper, we describe the cluster sample and observations, and detail the processing of the SuprimeCam data to yield high-quality images suitable for robust weak-lensing shape measurements and precision photometry. For each cluster, we present wide-field three-colour optical images and maps of the weak-lensing mass distribution, the optical light distribution and the X-ray emission. These provide insights into the large-scale structure in which the clusters are embedded. We measure the offsets between X-ray flux centroids and the brightest cluster galaxies in the clusters, finding these to be small in general, with a median of 20 kpc. For offsets ≲100 kpc, weak-lensing mass measurements centred on the brightest cluster galaxies agree well with values determined relative to the X-ray centroids; miscentring is therefore not a significant source of systematic

  19. The X-ray cluster survey with eRosita: forecasts for cosmology, cluster physics and primordial non-Gaussianity

    NASA Astrophysics Data System (ADS)

    Pillepich, Annalisa; Porciani, Cristiano; Reiprich, Thomas H.

    2012-05-01

    Starting in late 2013, the eRosita telescope will survey the X-ray sky with unprecedented sensitivity. Assuming a detection limit of 50 photons in the (0.5-2.0) keV energy band with a typical exposure time of 1.6 ks, we predict that eRosita will detect ˜9.3 × 104 clusters of galaxies more massive than 5 × 1013 h-1 M⊙, with the currently planned all-sky survey. Their median redshift will be z≃ 0.35. We perform a Fisher-matrix analysis to forecast the constraining power of ? on the Λ cold dark matter (ΛCDM) cosmology and, simultaneously, on the X-ray scaling relations for galaxy clusters. Special attention is devoted to the possibility of detecting primordial non-Gaussianity. We consider two experimental probes: the number counts and the angular clustering of a photon-count limited sample of clusters. We discuss how the cluster sample should be split to optimize the analysis and we show that redshift information of the individual clusters is vital to break the strong degeneracies among the model parameters. For example, performing a 'tomographic' analysis based on photometric-redshift estimates and combining one- and two-point statistics will give marginal 1σ errors of Δσ8≃ 0.036 and ΔΩm≃ 0.012 without priors, and improve the current estimates on the slope of the luminosity-mass relation by a factor of 3. Regarding primordial non-Gaussianity, ? clusters alone will give ΔfNL≃ 9, 36 and 144 for the local, orthogonal and equilateral model, respectively. Measuring redshifts with spectroscopic accuracy would further tighten the constraints by nearly 40 per cent (barring fNL which displays smaller improvements). Finally, combining ? data with the analysis of temperature anisotropies in the cosmic microwave background by the Planck satellite should give sensational constraints on both the cosmology and the properties of the intracluster medium.

  20. [The users of centers for AIDS information and prevention in the Comunidad Valenciana, Spain: a study based on cluster analysis].

    PubMed

    González Aracil, J; Ruiz Pérez, I; Aviñó Rico, M J; Hernández Aguado, I

    1999-01-01

    To measure the usefulness of multiple correspondence analysis (MCA) and cluster analysis applied to the epidemiological research of HIV infection. The specific are to explore the relationships between the different variables that characterize the users of the AIDS Information and Prevention Center (CIPS) and to identify clusters of characteristics which in terms of the attendance to these centers, could be considered similar. The clinical history the CIPS in the Valencian region in Spain was used as data source. The target population target were intravenous drug users (IDUSs) attending these centers between 1987 and 1994 (n = 6211). Information about socio-demographic and HIV type I infection-related variables (drug use and sexual behaviour) was collected by means of a semistructured questionnaire. A MCA was carried out to obtain a group of quantitative factors that were used in a cluster analysis. A 44.8% HIV type I prevalence was found. Five factors were detected by MCA that explain 51.14% of the total variability, of which sex, age and the usual sexual partner were the variables best explained. Cluster analysis allowed to describe 5 different subgroups of CIPS users according to their socio-demographics characteristics, risk behaviours and serologic status. It is necessary to highlight the categories 1 and 2, which collect the serologic status and the most relevant characteristics of HIV infection. Category I contains users with a negative serology and characterized by being mainly single adolescent men, with a low educational level; they stated that they have no steady sexual partner, do not share syringes and have been intravenous drug users between 3 and 10 years. They mainly come from the city of Alicante. Category 2 contains mainly people that are HIV positive and older. They also share syringes and have been intravenous drug users for a longer time; they have a higher education level and most of them come from the city of Valencia. The proposed method of

  1. The identification of credit card encoders by hierarchical cluster analysis of the jitters of magnetic stripes.

    PubMed

    Leung, S C; Fung, W K; Wong, K H

    1999-01-01

    The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.

  2. Classification of aquifer vulnerability using K-means cluster analysis

    NASA Astrophysics Data System (ADS)

    Javadi, S.; Hashemy, S. M.; Mohammadi, K.; Howard, K. W. F.; Neshat, A.

    2017-06-01

    Groundwater is one of the main sources of drinking and agricultural water in arid and semi-arid regions but is becoming increasingly threatened by contamination. Vulnerability mapping has been used for many years as an effective tool for assessing the potential for aquifer pollution and the most common method of intrinsic vulnerability assessment is DRASTIC (Depth to water table, net Recharge, Aquifer media, Soil media, Topography, Impact of vadose zone and hydraulic Conductivity). An underlying problem with the DRASTIC approach relates to the subjectivity involved in selecting relative weightings for each of the DRASTIC factors and assigning rating values to ranges or media types within each factor. In this study, a clustering technique is introduced that removes some of the subjectivity associated with the indexing method. It creates a vulnerability map that does not rely on fixed weights and ratings and, thereby provides a more objective representation of the system's physical characteristics. This methodology was applied to an aquifer in Iran and compared with the standard DRASTIC approach using the water quality parameters nitrate, chloride and total dissolved solids (TDS) as surrogate indicators of aquifer vulnerability. The proposed method required only four of DRASTIC's seven factors - depth to groundwater, hydraulic conductivity, recharge value and the nature of the vadose zone, to produce a superior result. For nitrate, chloride, and TDS, respectively, the clustering approach delivered Pearson correlation coefficients that were 15, 22 and 5 percentage points higher than those obtained for the DRASTIC method.

  3. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    ERIC Educational Resources Information Center

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  4. Differences Between Ward's and UPGMA Methods of Cluster Analysis: Implications for School Psychology.

    ERIC Educational Resources Information Center

    Hale, Robert L.; Dougherty, Donna

    1988-01-01

    Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…

  5. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    PubMed

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  6. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches

    PubMed Central

    Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683

  7. Countries population determination to test rice crisis indicator at national level using k-means cluster analysis

    NASA Astrophysics Data System (ADS)

    Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.

    2017-01-01

    This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.

  8. AMOEBA clustering revisited. [cluster analysis, classification, and image display program

    NASA Technical Reports Server (NTRS)

    Bryant, Jack

    1990-01-01

    A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.

  9. Optimizing disinfection by-product monitoring points in a distribution system using cluster analysis.

    PubMed

    Delpla, Ianis; Florea, Mihai; Pelletier, Geneviève; Rodriguez, Manuel J

    2018-06-04

    Trihalomethanes (THMs) and Haloacetic Acids (HAAs) are the main groups detected in drinking water and are consequently strictly regulated. However, the increasing quantity of data for disinfection byproducts (DBPs) produced from research projects and regulatory programs remains largely unexploited, despite a great potential for its use in optimizing drinking water quality monitoring to meet specific objectives. In this work, we developed a procedure to optimize locations and periods for DBPs monitoring based on a set of monitoring scenarios using the cluster analysis technique. The optimization procedure used a robust set of spatio-temporal monitoring results on DBPs (THMs and HAAs) generated from intensive sampling campaigns conducted in a residential sector of a water distribution system. Results shows that cluster analysis allows for the classification of water quality in different groups of THMs and HAAs according to their similarities, and the identification of locations presenting water quality concerns. By using cluster analysis with different monitoring objectives, this work provides a set of monitoring solutions and a comparison between various monitoring scenarios for decision-making purposes. Finally, it was demonstrated that the data from intensive monitoring of free chlorine residual and water temperature as DBP proxy parameters, when processed using cluster analysis, could also help identify the optimal sampling points and periods for regulatory THMs and HAAs monitoring. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Identification of different nutritional status groups in institutionalized elderly people by cluster analysis.

    PubMed

    López-Contreras, María José; López, Maria Ángeles; Canteras, Manuel; Candela, María Emilia; Zamora, Salvador; Pérez-Llamas, Francisca

    2014-03-01

    To apply a cluster analysis to groups of individuals of similar characteristics in an attempt to identify undernutrition or the risk of undernutrition in this population. A cross-sectional study. Seven public nursing homes in the province of Murcia, on the Mediterranean coast of Spain. 205 subjects aged 65 and older (131 women and 74 men). Dietary intake (energy and nutrients), anthropometric (body mass index, skinfold thickness, mid-arm muscle circumference, mid-arm muscle area, corrected arm muscle area, waist to hip ratio) and biochemical and haematological (serum albumin, transferrin, total cholesterol, total lymphocyte count). Variables were analyzed by cluster analysis. The results of the cluster analysis, including intake, anthropometric and analytical data showed that, of the 205 elderly subjects, 66 (32.2%) were over - weight/obese, 72 (35.1%) had an adequate nutritional status and 67 (32.7%) were undernourished or at risk of undernutrition. The undernourished or at risk of undernutrition group showed the lowest values for dietary intake and the anthropometric and analytical parameters measured. Our study shows that cluster analysis is a useful statistical method for assessing the nutritional status of institutionalized elderly populations. In contrast, use of the specific reference values frequently described in the literature might fail to detect real cases of undernourishment or those at risk of undernutrition. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.

  11. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    PubMed

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Cluster signal-to-noise analysis for evaluation of the information content in an image.

    PubMed

    Weerawanich, Warangkana; Shimizu, Mayumi; Takeshita, Yohei; Okamura, Kazutoshi; Yoshida, Shoko; Yoshiura, Kazunori

    2018-01-01

    (1) To develop an observer-free method of analysing image quality related to the observer performance in the detection task and (2) to analyse observer behaviour patterns in the detection of small mass changes in cone-beam CT images. 13 observers detected holes in a Teflon phantom in cone-beam CT images. Using the same images, we developed a new method, cluster signal-to-noise analysis, to detect the holes by applying various cut-off values using ImageJ and reconstructing cluster signal-to-noise curves. We then evaluated the correlation between cluster signal-to-noise analysis and the observer performance test. We measured the background noise in each image to evaluate the relationship with false positive rates (FPRs) of the observers. Correlations between mean FPRs and intra- and interobserver variations were also evaluated. Moreover, we calculated true positive rates (TPRs) and accuracies from background noise and evaluated their correlations with TPRs from observers. Cluster signal-to-noise curves were derived in cluster signal-to-noise analysis. They yield the detection of signals (true holes) related to noise (false holes). This method correlated highly with the observer performance test (R 2 = 0.9296). In noisy images, increasing background noise resulted in higher FPRs and larger intra- and interobserver variations. TPRs and accuracies calculated from background noise had high correlation with actual TPRs from observers; R 2 was 0.9244 and 0.9338, respectively. Cluster signal-to-noise analysis can simulate the detection performance of observers and thus replace the observer performance test in the evaluation of image quality. Erroneous decision-making increased with increasing background noise.

  13. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    ERIC Educational Resources Information Center

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  14. Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

    NASA Astrophysics Data System (ADS)

    Takahashi, A.; Hashimoto, M.

    2015-12-01

    Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.

  15. Steven's orbital reduction factor in ionic clusters

    NASA Astrophysics Data System (ADS)

    Gajek, Z.; Mulak, J.

    1985-11-01

    General expressions for reduction coefficients of matrix elements of angular momentum operator in ionic clusters or molecular systems have been derived. The reduction in this approach results from overlap and covalency effects and plays an important role in the reconciling of magnetic and spectroscopic experimental data. The formulated expressions make possible a phenomenological description of the effect with two independent parameters for typical equidistant clusters. Some detailed calculations also suggest the possibility of a one-parameter description. The results of these calculations for some ionic uranium compounds are presented as an example.

  16. Molecular clustering of patients with diabetes and pulmonary tuberculosis: A systematic review and meta-analysis.

    PubMed

    Blanco-Guillot, Francles; Delgado-Sánchez, Guadalupe; Mongua-Rodríguez, Norma; Cruz-Hervert, Pablo; Ferreyra-Reyes, Leticia; Ferreira-Guerrero, Elizabeth; Yanes-Lane, Mercedes; Montero-Campos, Rogelio; Bobadilla-Del-Valle, Miriam; Torres-González, Pedro; Ponce-de-León, Alfredo; Sifuentes-Osornio, José; Garcia-Garcia, Lourdes

    2017-01-01

    Many studies have explored the relationship between diabetes mellitus (DM) and tuberculosis (TB) demonstrating increased risk of TB among patients with DM and poor prognosis of patients suffering from the association of DM/TB. Owing to a paucity of studies addressing this question, it remains unclear whether patients with DM and TB are more likely than TB patients without DM to be grouped into molecular clusters defined according to the genotype of the infecting Mycobacterium tuberculosis bacillus. That is, whether there is convincing molecular epidemiological evidence for TB transmission among DM patients. Objective: We performed a systematic review and meta-analysis to quantitatively evaluate the propensity for patients with DM and pulmonary TB (PTB) to cluster according to the genotype of the infecting M. tuberculosis bacillus. We conducted a systematic search in MEDLINE and LILACS from 1990 to June, 2016 with the following combinations of key words "tuberculosis AND transmission" OR "tuberculosis diabetes mellitus" OR "Mycobacterium tuberculosis molecular epidemiology" OR "RFLP-IS6110" OR "Spoligotyping" OR "MIRU-VNTR". Studies were included if they met the following criteria: (i) studies based on populations from defined geographical areas; (ii) use of genotyping by IS6110- restriction fragment length polymorphism (RFLP) analysis and spoligotyping or mycobacterial interspersed repetitive unit-variable number of tandem repeats (MIRU-VNTR) or other amplification methods to identify molecular clustering; (iii) genotyping and analysis of 50 or more cases of PTB; (iv) study duration of 11 months or more; (v) identification of quantitative risk factors for molecular clustering including DM; (vi) > 60% coverage of the study population; and (vii) patients with PTB confirmed bacteriologically. The exclusion criteria were: (i) Extrapulmonary TB; (ii) TB caused by nontuberculous mycobacteria; (iii) patients with PTB and HIV; (iv) pediatric PTB patients; (v) TB in closed

  17. Molecular clustering of patients with diabetes and pulmonary tuberculosis: A systematic review and meta-analysis

    PubMed Central

    Blanco-Guillot, Francles; Delgado-Sánchez, Guadalupe; Mongua-Rodríguez, Norma; Cruz-Hervert, Pablo; Ferreyra-Reyes, Leticia; Ferreira-Guerrero, Elizabeth; Yanes-Lane, Mercedes; Montero-Campos, Rogelio; Bobadilla-del-Valle, Miriam; Torres-González, Pedro; Ponce-de-León, Alfredo; Sifuentes-Osornio, José; Garcia-Garcia, Lourdes

    2017-01-01

    Introduction Many studies have explored the relationship between diabetes mellitus (DM) and tuberculosis (TB) demonstrating increased risk of TB among patients with DM and poor prognosis of patients suffering from the association of DM/TB. Owing to a paucity of studies addressing this question, it remains unclear whether patients with DM and TB are more likely than TB patients without DM to be grouped into molecular clusters defined according to the genotype of the infecting Mycobacterium tuberculosis bacillus. That is, whether there is convincing molecular epidemiological evidence for TB transmission among DM patients. Objective: We performed a systematic review and meta-analysis to quantitatively evaluate the propensity for patients with DM and pulmonary TB (PTB) to cluster according to the genotype of the infecting M. tuberculosis bacillus. Materials and methods We conducted a systematic search in MEDLINE and LILACS from 1990 to June, 2016 with the following combinations of key words “tuberculosis AND transmission” OR “tuberculosis diabetes mellitus” OR “Mycobacterium tuberculosis molecular epidemiology” OR “RFLP-IS6110” OR “Spoligotyping” OR “MIRU-VNTR”. Studies were included if they met the following criteria: (i) studies based on populations from defined geographical areas; (ii) use of genotyping by IS6110- restriction fragment length polymorphism (RFLP) analysis and spoligotyping or mycobacterial interspersed repetitive unit-variable number of tandem repeats (MIRU-VNTR) or other amplification methods to identify molecular clustering; (iii) genotyping and analysis of 50 or more cases of PTB; (iv) study duration of 11 months or more; (v) identification of quantitative risk factors for molecular clustering including DM; (vi) > 60% coverage of the study population; and (vii) patients with PTB confirmed bacteriologically. The exclusion criteria were: (i) Extrapulmonary TB; (ii) TB caused by nontuberculous mycobacteria; (iii) patients with

  18. Data Clustering

    NASA Astrophysics Data System (ADS)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

  19. RNA-seq analysis identifies an intricate regulatory network controlling cluster root development in white lupin

    PubMed Central

    2014-01-01

    Background Highly adapted plant species are able to alter their root architecture to improve nutrient uptake and thrive in environments with limited nutrient supply. Cluster roots (CRs) are specialised structures of dense lateral roots formed by several plant species for the effective mining of nutrient rich soil patches through a combination of increased surface area and exudation of carboxylates. White lupin is becoming a model-species allowing for the discovery of gene networks involved in CR development. A greater understanding of the underlying molecular mechanisms driving these developmental processes is important for the generation of smarter plants for a world with diminishing resources to improve food security. Results RNA-seq analyses for three developmental stages of the CR formed under phosphorus-limited conditions and two of non-cluster roots have been performed for white lupin. In total 133,045,174 high-quality paired-end reads were used for a de novo assembly of the root transcriptome and merged with LAGI01 (Lupinus albus gene index) to generate an improved LAGI02 with 65,097 functionally annotated contigs. This was followed by comparative gene expression analysis. We show marked differences in the transcriptional response across the various cluster root stages to adjust to phosphate limitation by increasing uptake capacity and adjusting metabolic pathways. Several transcription factors such as PLT, SCR, PHB, PHV or AUX/IAA with a known role in the control of meristem activity and developmental processes show an increased expression in the tip of the CR. Genes involved in hormonal responses (PIN, LAX, YUC) and cell cycle control (CYCA/B, CDK) are also differentially expressed. In addition, we identify primary transcripts of miRNAs with established function in the root meristem. Conclusions Our gene expression analysis shows an intricate network of transcription factors and plant hormones controlling CR initiation and formation. In addition

  20. Sirenomelia in Argentina: Prevalence, geographic clusters and temporal trends analysis.

    PubMed

    Groisman, Boris; Liascovich, Rosa; Gili, Juan Antonio; Barbero, Pablo; Bidondo, María Paz

    2016-07-01

    Sirenomelia is a severe malformation of the lower body characterized by a single medial lower limb and a variable combination of visceral abnormalities. Given that Sirenomelia is a very rare birth defect, epidemiological studies are scarce. The aim of this study is to evaluate prevalence, geographic clusters and time trends of sirenomelia in Argentina, using data from the National Network of Congenital Anomalies of Argentina (RENAC) from November 2009 until December 2014. This is a descriptive study using data from the RENAC, a hospital-based surveillance system for newborns affected with major morphological congenital anomalies. We calculated sirenomelia prevalence throughout the period, searched for geographical clusters, and evaluated time trends. The prevalence of confirmed cases of sirenomelia throughout the period was 2.35 per 100,000 births. Cluster analysis showed no statistically significant geographical aggregates. Time-trends analysis showed that the prevalence was higher in years 2009 to 2010. The observed prevalence was higher than the observed in previous epidemiological studies in other geographic regions. We observed a likely real increase in the initial period of our study. We used strict diagnostic criteria, excluding cases that only had clinical diagnosis of sirenomelia. Therefore, real prevalence could be even higher. This study did not show any geographic clusters. Because etiology of sirenomelia has not yet been established, studies of epidemiological features of this defect may contribute to define its causes. Birth Defects Research (Part A) 106:604-611, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  1. Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.

    PubMed

    Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço

    2017-11-01

    The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.

  2. Social Media Use and Depression and Anxiety Symptoms: A Cluster Analysis.

    PubMed

    Shensa, Ariel; Sidani, Jaime E; Dew, Mary Amanda; Escobar-Viera, César G; Primack, Brian A

    2018-03-01

    Individuals use social media with varying quantity, emotional, and behavioral at- tachment that may have differential associations with mental health outcomes. In this study, we sought to identify distinct patterns of social media use (SMU) and to assess associations between those patterns and depression and anxiety symptoms. In October 2014, a nationally-representative sample of 1730 US adults ages 19 to 32 completed an online survey. Cluster analysis was used to identify patterns of SMU. Depression and anxiety were measured using respective 4-item Patient-Reported Outcome Measurement Information System (PROMIS) scales. Multivariable logistic regression models were used to assess associations between clus- ter membership and depression and anxiety. Cluster analysis yielded a 5-cluster solu- tion. Participants were characterized as "Wired," "Connected," "Diffuse Dabblers," "Concentrated Dabblers," and "Unplugged." Membership in 2 clusters - "Wired" and "Connected" - increased the odds of elevated depression and anxiety symptoms (AOR = 2.7, 95% CI = 1.5-4.7; AOR = 3.7, 95% CI = 2.1-6.5, respectively, and AOR = 2.0, 95% CI = 1.3-3.2; AOR = 2.0, 95% CI = 1.3-3.1, respectively). SMU pattern characterization of a large population suggests 2 pat- terns are associated with risk for depression and anxiety. Developing educational interventions that address use patterns rather than single aspects of SMU (eg, quantity) would likely be useful.

  3. Mechanisms of Thrombocytopenia During Septic Shock: A Multiplex Cluster Analysis of Endogenous Sepsis Mediators.

    PubMed

    Bedet, Alexandre; Razazi, Keyvan; Boissier, Florence; Surenaud, Mathieu; Hue, Sophie; Giraudier, Stéphane; Brun-Buisson, Christian; Mekontso Dessap, Armand

    2018-06-01

    Thrombocytopenia is a common feature of sepsis and may involve various mechanisms often related to the inflammatory response. This study aimed at evaluating factors associated with thrombocytopenia during human septic shock. In particular, we used a multiplex analysis to assess the role of endogenous sepsis mediators. Prospective, observational study. Thrombocytopenia was defined as an absolute platelet count <100 G/L or a 50% relative decrease in platelet count during the first week of septic shock. Plasma concentrations of 27 endogenous mediators involved in sepsis and platelet pathophysiology were assessed at day-1 using a multi-analyte Milliplex human cytokine kit. Patients with underlying diseases at risk of thrombocytopenia (hematological malignancies, chemotherapy, cirrhosis, and chronic heart failure) were excluded. Thrombocytopenia occurred in 33 (55%) of 60 patients assessed. Patients with thrombocytopenia were more prone to present with extrapulmonary infections and bacteremia. Disseminated intravascular coagulation was frequent (81%) in these patients. Unbiased hierarchical clustering identified five different clusters of sepsis mediators, including one with markers of platelet activation (e.g., thrombospondin-1) positively associated with platelet count, one with markers of inflammation (e.g., tumor necrosis factor alpha and heat shock protein 70), and endothelial dysfunction (e.g., intercellular adhesion molecule-1 and vascular cell adhesion molecule-1) negatively associated with platelet count, and another involving growth factors of thrombopoiesis (e.g., thrombopoietin), also negatively associated with platelet count. Surrogates of hemodilution (e.g., hypoprotidemia and higher fluid balance) were also associated with thrombocytopenia. Multiple mechanisms seemed involved in thrombocytopenia during septic shock, including endothelial dysfunction/coagulopathy, hemodilution, and altered thrombopoiesis.

  4. Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

    NASA Astrophysics Data System (ADS)

    Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

    2012-12-01

    The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system

  5. Cluster Analysis of Longidorus Species (Nematoda: Longidoridae), a New Approach in Species Identification

    PubMed Central

    Ye, Weimin; Robbins, R. T.

    2004-01-01

    Hierarchical cluster analysis based on female morphometric character means including body length, distance from vulva opening to anterior end, head width, odontostyle length, esophagus length, body width, tail length, and tail width were used to examine the morphometric relationships and create dendrograms for (i) 62 populations belonging to 9 Longidorus species from Arkansas, (ii) 137 published Longidorus species, and (iii) 137 published Longidorus species plus 86 populations of 16 Longidorus species from Arkansas and various other locations by using JMP 4.02 software (SAS Institute, Cary, NC). Cluster analysis dendograms visually illustrated the grouping and morphometric relationships of the species and populations. It provided a computerized statistical approach to assist by helping to identify and distinguish species, by indicating morphometric relationships among species, and by assisting with new species diagnosis. The preliminary species identification can be accomplished by running cluster analysis for unknown species together with the data matrix of known published Longidorus species. PMID:19262809

  6. On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.

    ERIC Educational Resources Information Center

    Carter, Randy L.; And Others

    1989-01-01

    The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…

  7. Dwarf Galaxies in the Coma Cluster. II. Photometry and Analysis

    NASA Astrophysics Data System (ADS)

    Secker, J.; Harris, W. E.; Plummer, J. D.

    1997-12-01

    We use the data set derived in our previous paper (Secker & Harris 1997) to study the dwarf galaxy population in the central =~ 700 arcmin(2) of the Coma cluster, the majority of which are early-type dwarf elliptical (dE) galaxies. Analysis of the statistically-decontaminated dE galaxy sequence in the color-magnitude diagram reveals that the mean dE color at R = 18.0 mag is (B-R) =~ 1.4 mag, but that a highly significant trend of color with magnitude exists (Delta (B-R)/Delta R = -0.056+/-0.002 mag) in the sense that fainter dEs are bluer and thus presumably more metal-poor. The mean color of the faintest dEs in our sample is (B-R) =~ 1.15 mag, consistent with a color measurement of the diffuse intracluster light in the Coma core. This intracluster light could then have originated from the tidal disruption of faint dEs in the cluster core. The total galaxy luminosity function (LF) is well modeled as the sum of a log-normal distribution for the giant galaxies, and a Schechter function for the dE galaxies with a faint-end slope alpha = -1.41+/-0.05. This value of alpha is consistent with those measured for the Virgo and Fornax clusters. The spatial distribution of the faint dE galaxies (19.0 < R <= 22.5 mag) is well fit by a standard King model with a central surface density of Sigma_0 = 1.17 dEs arcmin(-2) and a core radius R_c = 22.15 arcmin ( =~ 0.46h(-1) Mpc). This core is significantly larger than the R_c = 13.71 arcmin ( =~ 0.29h(-1) Mpc) found for the cluster giants and the brighter dEs (R <= 19.0 mag), again consistent with the idea that faint dEs in the dense core have been disrupted. Finally, we find that most dEs belong to the general Coma cluster potential rather than as satellites of individual giant galaxies: An analysis of the number counts around 10 cluster giants reveals that they each have on average 4+/- 1 dE companions within a projected radius of 13.9h(-1) kpc. (SECTION: Galaxies)

  8. Clustering of temperamental and cognitive risk factors for anxiety in a college sample of late adolescents.

    PubMed

    Viana, Andres G; Gratz, Kim L; Bierman, Karen L

    2013-01-01

    Temperamental vulnerabilities (e.g., behavioral inhibition, anxiety sensitivity) and cognitive biases (e.g., interpretive and judgment biases) may exacerbate feelings of stress and anxiety, particularly among late adolescents during the early years of college. The goal of the present study was to apply person-centered analyses to explore possible heterogeneity in the patterns of these four risk factors in late adolescence, and to examine associations with several anxiety outcomes (i.e., worry, anxiety symptoms, and trait anxiety). Cluster analyses in a college sample of 855 late adolescents revealed a Low-Risk group, along with four reliable clusters with distinct profiles of risk factors and anxiety outcomes (Inhibited, Sensitive, Cognitively-Biased, and Multi-Risk). Of the risk profiles, Multi-Risk youth experienced the highest levels of anxiety outcomes, whereas Inhibited youth experienced the lowest levels of anxiety outcomes. Sensitive and Cognitively-Biased youth experienced comparable levels of anxiety-related outcomes, despite different constellations of risk factors. Implications for interventions and future research are discussed.

  9. Hierarchical Spatio-temporal Visual Analysis of Cluster Evolution in Electrocorticography Data

    DOE PAGES

    Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...

    2016-10-02

    Here, we present ECoG ClusterFlow, a novel interactive visual analysis tool for the exploration of high-resolution Electrocorticography (ECoG) data. Our system detects and visualizes dynamic high-level structures, such as communities, using the time-varying spatial connectivity network derived from the high-resolution ECoG data. ECoG ClusterFlow provides a multi-scale visualization of the spatio-temporal patterns underlying the time-varying communities using two views: 1) an overview summarizing the evolution of clusters over time and 2) a hierarchical glyph-based technique that uses data aggregation and small multiples techniques to visualize the propagation of clusters in their spatial domain. ECoG ClusterFlow makes it possible 1) tomore » compare the spatio-temporal evolution patterns across various time intervals, 2) to compare the temporal information at varying levels of granularity, and 3) to investigate the evolution of spatial patterns without occluding the spatial context information. Lastly, we present case studies done in collaboration with neuroscientists on our team for both simulated and real epileptic seizure data aimed at evaluating the effectiveness of our approach.« less

  10. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    DTIC Science & Technology

    2014-12-01

    regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value

  11. Mapping Informative Clusters in a Hierarchial Framework of fMRI Multivariate Analysis

    PubMed Central

    Xu, Rui; Zhen, Zonglei; Liu, Jia

    2010-01-01

    Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies. PMID:21152081

  12. The methodology of multi-viewpoint clustering analysis

    NASA Technical Reports Server (NTRS)

    Mehrotra, Mala; Wild, Chris

    1993-01-01

    One of the greatest challenges facing the software engineering community is the ability to produce large and complex computer systems, such as ground support systems for unmanned scientific missions, that are reliable and cost effective. In order to build and maintain these systems, it is important that the knowledge in the system be suitably abstracted, structured, and otherwise clustered in a manner which facilitates its understanding, manipulation, testing, and utilization. Development of complex mission-critical systems will require the ability to abstract overall concepts in the system at various levels of detail and to consider the system from different points of view. Multi-ViewPoint - Clustering Analysis MVP-CA methodology has been developed to provide multiple views of large, complicated systems. MVP-CA provides an ability to discover significant structures by providing an automated mechanism to structure both hierarchically (from detail to abstract) and orthogonally (from different perspectives). We propose to integrate MVP/CA into an overall software engineering life cycle to support the development and evolution of complex mission critical systems.

  13. Model-free data analysis for source separation based on Non-Negative Matrix Factorization and k-means clustering (NMFk)

    NASA Astrophysics Data System (ADS)

    Vesselinov, V. V.; Alexandrov, B.

    2014-12-01

    The identification of the physical sources causing spatial and temporal fluctuations of state variables such as river stage levels and aquifer hydraulic heads is challenging. The fluctuations can be caused by variations in natural and anthropogenic sources such as precipitation events, infiltration, groundwater pumping, barometric pressures, etc. The source identification and separation can be crucial for conceptualization of the hydrological conditions and characterization of system properties. If the original signals that cause the observed state-variable transients can be successfully "unmixed", decoupled physics models may then be applied to analyze the propagation of each signal independently. We propose a new model-free inverse analysis of transient data based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS) coupled with k-means clustering algorithm, which we call NMFk. NMFk is capable of identifying a set of unique sources from a set of experimentally measured mixed signals, without any information about the sources, their transients, and the physical mechanisms and properties controlling the signal propagation through the system. A classical BSS conundrum is the so-called "cocktail-party" problem where several microphones are recording the sounds in a ballroom (music, conversations, noise, etc.). Each of the microphones is recording a mixture of the sounds. The goal of BSS is to "unmix'" and reconstruct the original sounds from the microphone records. Similarly to the "cocktail-party" problem, our model-freee analysis only requires information about state-variable transients at a number of observation points, m, where m > r, and r is the number of unknown unique sources causing the observed fluctuations. We apply the analysis on a dataset from the Los Alamos National Laboratory (LANL) site. We identify and estimate the impact and sources are barometric pressure and water-supply pumping effects. We also estimate the

  14. Multi-scale visual analysis of time-varying electrocorticography data via clustering of brain regions

    DOE PAGES

    Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...

    2017-06-06

    There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less

  15. Functional analysis of the upstream regulatory region of chicken miR-17-92 cluster.

    PubMed

    Cheng, Min; Zhang, Wen-jian; Xing, Tian-yu; Yan, Xiao-hong; Li, Yu-mao; Li, Hui; Wang, Ning

    2016-08-01

    miR-17-92 cluster plays important roles in cell proliferation, differentiation, apoptosis, animal development and tumorigenesis. The transcriptional regulation of miR-17-92 cluster has been extensively studied in mammals, but not in birds. To date, avian miR-17-92 cluster genomic structure has not been fully determined. The promoter location and sequence of miR-17-92 cluster have not been determined, due to the existence of a genomic gap sequence upstream of miR-17-92 cluster in all the birds whose genomes have been sequenced. In this study, genome walking was used to close the genomic gap upstream of chicken miR-17-92 cluster. In addition, bioinformatics analysis, reporter gene assay and truncation mutagenesis were used to investigate functional role of the genomic gap sequence. Genome walking analysis showed that the gap region was 1704 bp long, and its GC content was 80.11%. Bioinformatics analysis showed that in the gap region, there was a 200 bp conserved sequence among the tested 10 species (Gallus gallus, Homo sapiens, Pan troglodytes, Bos taurus, Sus scrofa, Rattus norvegicus, Mus musculus, Possum, Danio rerio, Rana nigromaculata), which is core promoter region of mammalian miR-17-92 host gene (MIR17HG). Promoter luciferase reporter gene vector of the gap region was constructed and reporter assay was performed. The result showed that the promoter activity of pGL3-cMIR17HG (-4228/-2506) was 417 times than that of negative control (empty pGL3 basic vector), suggesting that chicken miR-17-92 cluster promoter exists in the gap region. To further gain insight into the promoter structure, two different truncations for the cloned gap sequence were generated by PCR. One had a truncation of 448 bp at the 5'-end and the other had a truncation of 894 bp at the 3'-end. Further reporter analysis showed that compared with the promoter activity of pGL3-cMIR17HG (-4228/-2506), the reporter activities of the 5'-end truncation and the 3'-end truncation were reduced by 19

  16. Paternal age related schizophrenia (PARS): Latent subgroups detected by k-means clustering analysis.

    PubMed

    Lee, Hyejoo; Malaspina, Dolores; Ahn, Hongshik; Perrin, Mary; Opler, Mark G; Kleinhaus, Karine; Harlap, Susan; Goetz, Raymond; Antonius, Daniel

    2011-05-01

    Paternal age related schizophrenia (PARS) has been proposed as a subgroup of schizophrenia with distinct etiology, pathophysiology and symptoms. This study uses a k-means clustering analysis approach to generate hypotheses about differences between PARS and other cases of schizophrenia. We studied PARS (operationally defined as not having any family history of schizophrenia among first and second-degree relatives and fathers' age at birth ≥ 35 years) in a series of schizophrenia cases recruited from a research unit. Data were available on demographic variables, symptoms (Positive and Negative Syndrome Scale; PANSS), cognitive tests (Wechsler Adult Intelligence Scale-Revised; WAIS-R) and olfaction (University of Pennsylvania Smell Identification Test; UPSIT). We conducted a series of k-means clustering analyses to identify clusters of cases containing high concentrations of PARS. Two analyses generated clusters with high concentrations of PARS cases. The first analysis (N=136; PARS=34) revealed a cluster containing 83% PARS cases, in which the patients showed a significant discrepancy between verbal and performance intelligence. The mean paternal and maternal ages were 41 and 33, respectively. The second analysis (N=123; PARS=30) revealed a cluster containing 71% PARS cases, of which 93% were females; the mean age of onset of psychosis, at 17.2, was significantly early. These results strengthen the evidence that PARS cases differ from other patients with schizophrenia. Hypothesis-generating findings suggest that features of PARS may include a discrepancy between verbal and performance intelligence, and in females, an early age of onset. These findings provide a rationale for separating these phenotypes from others in future clinical, genetic and pathophysiologic studies of schizophrenia and in considering responses to treatment. Copyright © 2011 Elsevier B.V. All rights reserved.

  17. Standardized Effect Size Measures for Mediation Analysis in Cluster-Randomized Trials

    ERIC Educational Resources Information Center

    Stapleton, Laura M.; Pituch, Keenan A.; Dion, Eric

    2015-01-01

    This article presents 3 standardized effect size measures to use when sharing results of an analysis of mediation of treatment effects for cluster-randomized trials. The authors discuss 3 examples of mediation analysis (upper-level mediation, cross-level mediation, and cross-level mediation with a contextual effect) with demonstration of the…

  18. Hierarchical cluster analysis of labour market regulations and population health: a taxonomy of low- and middle-income countries

    PubMed Central

    2012-01-01

    Background An important contribution of the social determinants of health perspective has been to inquire about non-medical determinants of population health. Among these, labour market regulations are of vital significance. In this study, we investigate the labour market regulations among low- and middle-income countries (LMICs) and propose a labour market taxonomy to further understand population health in a global context. Methods Using Gross National Product per capita, we classify 113 countries into either low-income (n = 71) or middle-income (n = 42) strata. Principal component analysis of three standardized indicators of labour market inequality and poverty is used to construct 2 factor scores. Factor score reliability is evaluated with Cronbach's alpha. Using these scores, we conduct a hierarchical cluster analysis to produce a labour market taxonomy, conduct zero-order correlations, and create box plots to test their associations with adult mortality, healthy life expectancy, infant mortality, maternal mortality, neonatal mortality, under-5 mortality, and years of life lost to communicable and non-communicable diseases. Labour market and health data are retrieved from the International Labour Organization's Key Indicators of Labour Markets and World Health Organization's Statistical Information System. Results Six labour market clusters emerged: Residual (n = 16), Emerging (n = 16), Informal (n = 10), Post-Communist (n = 18), Less Successful Informal (n = 22), and Insecure (n = 31). Primary findings indicate: (i) labour market poverty and population health is correlated in both LMICs; (ii) association between labour market inequality and health indicators is significant only in low-income countries; (iii) Emerging (e.g., East Asian and Eastern European countries) and Insecure (e.g., sub-Saharan African nations) clusters are the most advantaged and disadvantaged, respectively, with the remaining clusters experiencing levels of population health consistent

  19. Hierarchical cluster analysis of labour market regulations and population health: a taxonomy of low- and middle-income countries.

    PubMed

    Muntaner, Carles; Chung, Haejoo; Benach, Joan; Ng, Edwin

    2012-04-18

    An important contribution of the social determinants of health perspective has been to inquire about non-medical determinants of population health. Among these, labour market regulations are of vital significance. In this study, we investigate the labour market regulations among low- and middle-income countries (LMICs) and propose a labour market taxonomy to further understand population health in a global context. Using Gross National Product per capita, we classify 113 countries into either low-income (n = 71) or middle-income (n = 42) strata. Principal component analysis of three standardized indicators of labour market inequality and poverty is used to construct 2 factor scores. Factor score reliability is evaluated with Cronbach's alpha. Using these scores, we conduct a hierarchical cluster analysis to produce a labour market taxonomy, conduct zero-order correlations, and create box plots to test their associations with adult mortality, healthy life expectancy, infant mortality, maternal mortality, neonatal mortality, under-5 mortality, and years of life lost to communicable and non-communicable diseases. Labour market and health data are retrieved from the International Labour Organization's Key Indicators of Labour Markets and World Health Organization's Statistical Information System. Six labour market clusters emerged: Residual (n = 16), Emerging (n = 16), Informal (n = 10), Post-Communist (n = 18), Less Successful Informal (n = 22), and Insecure (n = 31). Primary findings indicate: (i) labour market poverty and population health is correlated in both LMICs; (ii) association between labour market inequality and health indicators is significant only in low-income countries; (iii) Emerging (e.g., East Asian and Eastern European countries) and Insecure (e.g., sub-Saharan African nations) clusters are the most advantaged and disadvantaged, respectively, with the remaining clusters experiencing levels of population health consistent with their labour market

  20. Failure Mode Identification Through Clustering Analysis

    NASA Technical Reports Server (NTRS)

    Arunajadai, Srikesh G.; Stone, Robert B.; Tumer, Irem Y.; Clancy, Daniel (Technical Monitor)

    2002-01-01

    Research has shown that nearly 80% of the costs and problems are created in product development and that cost and quality are essentially designed into products in the conceptual stage. Currently, failure identification procedures (such as FMEA (Failure Modes and Effects Analysis), FMECA (Failure Modes, Effects and Criticality Analysis) and FTA (Fault Tree Analysis)) and design of experiments are being used for quality control and for the detection of potential failure modes during the detail design stage or post-product launch. Though all of these methods have their own advantages, they do not give information as to what are the predominant failures that a designer should focus on while designing a product. This work uses a functional approach to identify failure modes, which hypothesizes that similarities exist between different failure modes based on the functionality of the product/component. In this paper, a statistical clustering procedure is proposed to retrieve information on the set of predominant failures that a function experiences. The various stages of the methodology are illustrated using a hypothetical design example.

  1. Identifying models of HIV care and treatment service delivery in Tanzania, Uganda, and Zambia using cluster analysis and Delphi survey.

    PubMed

    Tsui, Sharon; Denison, Julie A; Kennedy, Caitlin E; Chang, Larry W; Koole, Olivier; Torpey, Kwasi; Van Praag, Eric; Farley, Jason; Ford, Nathan; Stuart, Leine; Wabwire-Mangen, Fred

    2017-12-06

    Organization of HIV care and treatment services, including clinic staffing and services, may shape clinical and financial outcomes, yet there has been little attempt to describe different models of HIV care in sub-Saharan Africa (SSA). Information about the relative benefits and drawbacks of different models could inform the scale-up of antiretroviral therapy (ART) and associated services in resource-limited settings (RLS), especially in light of expanded client populations with country adoption of WHO's test and treat recommendation. We characterized task-shifting/task-sharing practices in 19 diverse ART clinics in Tanzania, Uganda, and Zambia and used cluster analysis to identify unique models of service provision. We ran descriptive statistics to explore how the clusters varied by environmental factors and programmatic characteristics. Finally, we employed the Delphi Method to make systematic use of expert opinions to ensure that the cluster variables were meaningful in the context of actual task-shifting of ART services in SSA. The cluster analysis identified three task-shifting/task-sharing models. The main differences across models were the availability of medical doctors, the scope of clinical responsibility assigned to nurses, and the use of lay health care workers. Patterns of healthcare staffing in HIV service delivery were associated with different environmental factors (e.g., health facility levels, urban vs. rural settings) and programme characteristics (e.g., community ART distribution or integrated tuberculosis treatment on-site). Understanding the relative advantages and disadvantages of different models of care can help national programmes adapt to increased client load, select optimal adherence strategies within decentralized models of care, and identify differentiated models of care for clients to meet the growing needs of long-term ART patients who require more complicated treatment management.

  2. Combining Multiobjective Optimization and Cluster Analysis to Study Vocal Fold Functional Morphology

    PubMed Central

    Palaparthi, Anil; Riede, Tobias

    2017-01-01

    Morphological design and the relationship between form and function have great influence on the functionality of a biological organ. However, the simultaneous investigation of morphological diversity and function is difficult in complex natural systems. We have developed a multiobjective optimization (MOO) approach in association with cluster analysis to study the form-function relation in vocal folds. An evolutionary algorithm (NSGA-II) was used to integrate MOO with an existing finite element model of the laryngeal sound source. Vocal fold morphology parameters served as decision variables and acoustic requirements (fundamental frequency, sound pressure level) as objective functions. A two-layer and a three-layer vocal fold configuration were explored to produce the targeted acoustic requirements. The mutation and crossover parameters of the NSGA-II algorithm were chosen to maximize a hypervolume indicator. The results were expressed using cluster analysis and were validated against a brute force method. Results from the MOO and the brute force approaches were comparable. The MOO approach demonstrated greater resolution in the exploration of the morphological space. In association with cluster analysis, MOO can efficiently explore vocal fold functional morphology. PMID:24771563

  3. Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis

    DTIC Science & Technology

    2016-10-01

    AWARD NUMBER: W81XWH-15-2-0032 TITLE: Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis PRINCIPAL...4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis 5b...Public Release; Distribution Unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT The subject of the project is FY14 PRMRP Topic Area – Tinnitus . The broad

  4. Cluster analysis to estimate the risk of preeclampsia in the high-risk Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) study

    PubMed Central

    Marttinen, Pekka; Gillberg, Jussi; Lokki, A. Inkeri; Majander, Kerttu; Ordén, Maija-Riitta; Taipale, Pekka; Pesonen, Anukatriina; Räikkönen, Katri; Hämäläinen, Esa; Kajantie, Eero; Laivuori, Hannele

    2017-01-01

    Objectives Preeclampsia is divided into early-onset (delivery before 34 weeks of gestation) and late-onset (delivery at or after 34 weeks) subtypes, which may rise from different etiopathogenic backgrounds. Early-onset disease is associated with placental dysfunction. Late-onset disease develops predominantly due to metabolic disturbances, obesity, diabetes, lipid dysfunction, and inflammation, which affect endothelial function. Our aim was to use cluster analysis to investigate clinical factors predicting the onset and severity of preeclampsia in a cohort of women with known clinical risk factors. Methods We recruited 903 pregnant women with risk factors for preeclampsia at gestational weeks 12+0–13+6. Each individual outcome diagnosis was independently verified from medical records. We applied a Bayesian clustering algorithm to classify the study participants to clusters based on their particular risk factor combination. For each cluster, we computed the risk ratio of each disease outcome, relative to the risk in the general population. Results The risk of preeclampsia increased exponentially with respect to the number of risk factors. Our analysis revealed 25 number of clusters. Preeclampsia in a previous pregnancy (n = 138) increased the risk of preeclampsia 8.1 fold (95% confidence interval (CI) 5.7–11.2) compared to a general population of pregnant women. Having a small for gestational age infant (n = 57) in a previous pregnancy increased the risk of early-onset preeclampsia 17.5 fold (95%CI 2.1–60.5). Cluster of those two risk factors together (n = 21) increased the risk of severe preeclampsia to 23.8-fold (95%CI 5.1–60.6), intermediate onset (delivery between 34+0–36+6 weeks of gestation) to 25.1-fold (95%CI 3.1–79.9) and preterm preeclampsia (delivery before 37+0 weeks of gestation) to 16.4-fold (95%CI 2.0–52.4). Body mass index over 30 kg/m2 (n = 228) as a sole risk factor increased the risk of preeclampsia to 2.1-fold (95%CI 1.1–3

  5. A method of using cluster analysis to study statistical dependence in multivariate data

    NASA Technical Reports Server (NTRS)

    Borucki, W. J.; Card, D. H.; Lyle, G. C.

    1975-01-01

    A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.

  6. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    PubMed

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  7. Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

    NASA Astrophysics Data System (ADS)

    Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

    2015-03-01

    Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.

  8. Clustering of Dietary Patterns, Lifestyles, and Overweight among Spanish Children and Adolescents in the ANIBES Study

    PubMed Central

    Pérez-Rodrigo, Carmen; Gil, Ángel; González-Gross, Marcela; Ortega, Rosa M.; Serra-Majem, Lluis; Varela-Moreiras, Gregorio; Aranceta-Bartrina, Javier

    2015-01-01

    Weight gain has been associated with behaviors related to diet, sedentary lifestyle, and physical activity. We investigated dietary patterns and possible meaningful clustering of physical activity, sedentary behavior, and sleep time in Spanish children and adolescents and whether the identified clusters could be associated with overweight. Analysis was based on a subsample (n = 415) of the cross-sectional ANIBES study in Spain. We performed exploratory factor analysis and subsequent cluster analysis of dietary patterns, physical activity, sedentary behaviors, and sleep time. Logistic regression analysis was used to explore the association between the cluster solutions and overweight. Factor analysis identified four dietary patterns, one reflecting a profile closer to the traditional Mediterranean diet. Dietary patterns, physical activity behaviors, sedentary behaviors and sleep time on weekdays in Spanish children and adolescents clustered into two different groups. A low physical activity-poorer diet lifestyle pattern, which included a higher proportion of girls, and a high physical activity, low sedentary behavior, longer sleep duration, healthier diet lifestyle pattern. Although increased risk of being overweight was not significant, the Prevalence Ratios (PRs) for the low physical activity-poorer diet lifestyle pattern were >1 in children and in adolescents. The healthier lifestyle pattern included lower proportions of children and adolescents from low socioeconomic status backgrounds. PMID:26729155

  9. Prediction of line failure fault based on weighted fuzzy dynamic clustering and improved relational analysis

    NASA Astrophysics Data System (ADS)

    Meng, Xiaocheng; Che, Renfei; Gao, Shi; He, Juntao

    2018-04-01

    With the advent of large data age, power system research has entered a new stage. At present, the main application of large data in the power system is the early warning analysis of the power equipment, that is, by collecting the relevant historical fault data information, the system security is improved by predicting the early warning and failure rate of different kinds of equipment under certain relational factors. In this paper, a method of line failure rate warning is proposed. Firstly, fuzzy dynamic clustering is carried out based on the collected historical information. Considering the imbalance between the attributes, the coefficient of variation is given to the corresponding weights. And then use the weighted fuzzy clustering to deal with the data more effectively. Then, by analyzing the basic idea and basic properties of the relational analysis model theory, the gray relational model is improved by combining the slope and the Deng model. And the incremental composition and composition of the two sequences are also considered to the gray relational model to obtain the gray relational degree between the various samples. The failure rate is predicted according to the principle of weighting. Finally, the concrete process is expounded by an example, and the validity and superiority of the proposed method are verified.

  10. Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling

    NASA Astrophysics Data System (ADS)

    Rahman, Md. Habibur; Matin, M. A.; Salma, Umma

    2017-12-01

    The precipitation patterns of seventeen locations in Bangladesh from 1961 to 2014 were studied using a cluster analysis and metric multidimensional scaling. In doing so, the current research applies four major hierarchical clustering methods to precipitation in conjunction with different dissimilarity measures and metric multidimensional scaling. A variety of clustering algorithms were used to provide multiple clustering dendrograms for a mixture of distance measures. The dendrogram of pre-monsoon rainfall for the seventeen locations formed five clusters. The pre-monsoon precipitation data for the areas of Srimangal and Sylhet were located in two clusters across the combination of five dissimilarity measures and four hierarchical clustering algorithms. The single linkage algorithm with Euclidian and Manhattan distances, the average linkage algorithm with the Minkowski distance, and Ward's linkage algorithm provided similar results with regard to monsoon precipitation. The results of the post-monsoon and winter precipitation data are shown in different types of dendrograms with disparate combinations of sub-clusters. The schematic geometrical representations of the precipitation data using metric multidimensional scaling showed that the post-monsoon rainfall of Cox's Bazar was located far from those of the other locations. The results of a box-and-whisker plot, different clustering techniques, and metric multidimensional scaling indicated that the precipitation behaviour of Srimangal and Sylhet during the pre-monsoon season, Cox's Bazar and Sylhet during the monsoon season, Maijdi Court and Cox's Bazar during the post-monsoon season, and Cox's Bazar and Khulna during the winter differed from those at other locations in Bangladesh.

  11. Effects of additional data on Bayesian clustering.

    PubMed

    Yamazaki, Keisuke

    2017-10-01

    Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. A comparison of visual search strategies of elite and non-elite tennis players through cluster analysis.

    PubMed

    Murray, Nicholas P; Hunfalvay, Melissa

    2017-02-01

    Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.

  13. Time series clustering analysis of health-promoting behavior

    NASA Astrophysics Data System (ADS)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  14. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    PubMed

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  15. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review

    PubMed Central

    Morris, Tom; Gray, Laura

    2017-01-01

    Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637

  16. Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of LANDSAT digital data

    NASA Technical Reports Server (NTRS)

    Ballew, G.

    1977-01-01

    The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.

  17. Factors associated with utilization of antenatal care services in Balochistan province of Pakistan: An analysis of the Multiple Indicator Cluster Survey (MICS) 2010.

    PubMed

    Ghaffar, Abdul; Pongponich, Sathirakorn; Ghaffar, Najma; Mehmood, Tahir

    2015-01-01

    The study was conducted to identify factors affecting the utilization of Antenatal Care (ANC) in Balochistan Province, Pakistan. Data on ANC utilization, together with social and economic determinants, were derived from a Multiple Indicator Cluster Survey (MICS) conducted in Balochistan in 2010. The analysis was conducted including 2339 women who gave birth in last two years preceding the survey. The researchers established a model to identify influential factors contributing to the utilization of ANC by logistic regression; model selection was by Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Household wealth, education, health condition, age at first marriage, number of children and spouse violence justification were found to be significantly associated with ANC coverage. Literate mothers are 2.45 times more likely to have ANC, and women whose newborns showed symptoms of illness at birth that needed hospitalization are 0.47 times less likely to access ANC. Women with an increase in the number of surviving children are 1.07 times less likely to have ANC, and those who think their spouse violence is socially justified are 1.36 times less likely to have ANC. The results draw attention towards evidence based planning of factors associated with utilization of ANC in the Balochistan province. The study reveals that women from high wealth index and having education had more chances to get ANC. Factors like younger age of the women at first marriage, increased number of children, symptoms of any illness to neonates at birth that need hospitalization and women who justify spouse violence had less chances to get ANC. Among components of ANC urine sampling and having tetanus toxoid (TT) in the last pregnancy increased the frequency of visits. ANC from a doctor decreased the number of visits. There is dire need to reduce disparities for wealth index, education and urban/rural living.

  18. Factors associated with utilization of antenatal care services in Balochistan province of Pakistan: An analysis of the Multiple Indicator Cluster Survey (MICS) 2010

    PubMed Central

    Ghaffar, Abdul; Pongponich, Sathirakorn; Ghaffar, Najma; Mehmood, Tahir

    2015-01-01

    Objective: The study was conducted to identify factors affecting the utilization of Antenatal Care (ANC) in Balochistan Province, Pakistan. Methods: Data on ANC utilization, together with social and economic determinants, were derived from a Multiple Indicator Cluster Survey (MICS) conducted in Balochistan in 2010. The analysis was conducted including 2339 women who gave birth in last two years preceding the survey. The researchers established a model to identify influential factors contributing to the utilization of ANC by logistic regression; model selection was by Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Results: Household wealth, education, health condition, age at first marriage, number of children and spouse violence justification were found to be significantly associated with ANC coverage. Literate mothers are 2.45 times more likely to have ANC, and women whose newborns showed symptoms of illness at birth that needed hospitalization are 0.47 times less likely to access ANC. Women with an increase in the number of surviving children are 1.07 times less likely to have ANC, and those who think their spouse violence is socially justified are 1.36 times less likely to have ANC. The results draw attention towards evidence based planning of factors associated with utilization of ANC in the Balochistan province. Conclusion: The study reveals that women from high wealth index and having education had more chances to get ANC. Factors like younger age of the women at first marriage, increased number of children, symptoms of any illness to neonates at birth that need hospitalization and women who justify spouse violence had less chances to get ANC. Among components of ANC urine sampling and having tetanus toxoid (TT) in the last pregnancy increased the frequency of visits. ANC from a doctor decreased the number of visits. There is dire need to reduce disparities for wealth index, education and urban/rural living. PMID:26870113

  19. Competing Effects Between Screen Media Time and Physical Activity in Adolescent Girls: Clustering a Self-Organizing Maps Analysis.

    PubMed

    Valencia-Peris, Alexandra; Devís-Devís, José; García-Massó, Xavier; Lizandra, Jorge; Pérez-Gimeno, Esther; Peiró-Velert, Carmen

    2016-06-01

    Previous research shows contradictory findings on potential competing effects between sedentary screen media usage (SMU) and physical activity (PA). This study examined these effects on adolescent girls via self-organizing maps analysis focusing on 3 target profiles. A sample of 1,516 girls aged 12 to 18 years self-reported daily time engagement in PA (moderate and vigorous intensity) and in screen media activities (TV/video/DVD, computer, and videogames), separately and combined. Topological interrelationships from the 13 emerging maps indicated a moderate competing effect between physically active and sedentary SMU patterns. Higher SES and overweight status were linked to either active or inactive behaviors. Three target clusters were explored in more detail. Cluster 1, named temperate-media actives, showed capabilities of being active while engaging in a moderate level of SMU (TV/video/DVD mainly). In Cluster 2, named prudent-media inactives, and Cluster 3, compulsive-media inactives, a competing effect between SMU and PA emerged, being sedentary SMU behaviors responsible for a low involvement in active pursuits. SMU and PA emerge as both related and independent behaviors in girls, resulting in a moderate competing effect. Findings support the case for recommending the timing of PA and SMU for recreational purposes considering different profiles, sociodemographic factors and types of SMU.

  20. [Achene morphology cluster analysis of Taraxacum F. H. Wigg. from northeast China and molecule systematics evidence determined by SRAP].

    PubMed

    Li, Hai-juan; Zhao, Xin; Jia, Qing-fei; Li, Tian-lai; Ning, Wei

    2012-08-01

    The achenes morphological and micro-morphological characteristics of six species of genus Taraxacum from northeastern China as well as SRAP cluster analysis were observed for their classification evidences. The achenes were observed by microscope and EPMA. Cluster analysis was given on the basis of the size, shape, cone proportion, color and surface sculpture of achenes. The Taraxacum inter-species achene shape characteristic difference is obvious, particularly spinulose distribution and size, achene color and achene size; with the Taraxacum plant achene shape the cluster method T. antungense Kitag. and the T. urbanum Kitag. should combine for the identical kind; the achene morphology cluster analysis and the SRAP tagged molecule systematics's cluster result retrieves in the table with "the Chinese flora". The class group to divide the result is consistent. Taraxacum plant achene shape characteristic stable conservative, may carry on the inter-species division and the sibship analysis according to the achene shape characteristic combination difference; the achene morphology cluster analysis as well as the SRAP tagged molecule systematics confirmation support dandelion classification result of "the Chinese flora".